Artificial Intelligence research (AI) has achieved great empirical success in the last decade. One major reason has been the availability of fast GPU and CPU clusters that have made it possible to run experiments on very large scale (e.g., training models on millions of documents, or training an image classifier on hundred thousand images). Some recent AI directions (particular in architecture search) have been using notoriously large amount of resources. For example, Strubell et al. 2019 found that training a language model produces the same carbon emission as taking a trans-American flight.
While it is an interesting research direction to create computationally-efficient algorithms, the state at the end of 2019 is still that beating state-of-the-art (SOTA) justifies it all. Getting an empirical paper accepted without SOTA results is generally hard, and my speculation is that reviewers do not care as much about carbon emissions and FLOPs spent as improvement over state-of-the-art.
However, an interesting and overlooked side-story to this is that not everyone has benefited from this resource heavy research.
Research labs, particularly those in academia, without connections to well-funded industry, have massive disadvantage in playing this SOTA game.
A student in a randomly chosen academic lab will have to use a small shared GPU cluster to play the same number game as a person in industry with access to a cluster of 100 GPU machines. This is generally less of a concern for top-tier schools where many professors are well connected to industry (or otherwise well-funded) and therefore, are in a better position to provide computational resources for their students.
It is relatively easy to dismiss these concerns as unwarranted— that a good research should come from well-founded intuition rather than heavy computational use. I believe this sentiment is true in the long-run. Maybe in 10 years, when we understand the black-box nature of deep learning, we will be able to derive these models on whiteboard or find them in an ultra-fast way using basic principles – ResNet, LSTMs, Transformers will then magically appear as optimal solutions of some functional. That would be a great achievement. However, until that time comes, the problem remains as following:
- students in empirical research need to publish
- current reviewing practice such as use of leaderboards heavily favour beating SOTA
- current deep learning methods are not well understood to allow a whiteboard derivation of solutions. E.g., we don’t have a proof why ResNet work better. The need for residual connection is not apparent from a formal whiteboard argument.
- students from underfunded labs are at disadvantage
What can underfunded labs do?
- Do theoretical research or focus on understanding existing empirical solutions
One can avoid the computation-driven try-and-fail rat race by focusing on theoretical questions or understanding existing solutions. This is relatively under-explored and plenty of open questions remain that likely won’t be solved anytime soon.
- Think of existing problems in a different way
The community as a whole can get tunnel vision, resulting in a significant number of researchers working on the same problem. I remember the time when 3-4 similar papers came from well established labs concurrently on the same problem. Working on a different problem or analyzing the same problem in a different way (such as by questioning the assumptions or changing the setup) are advantageous to get out of this tunnel vision. There is more to contribute to the literature compared to solving a problem which will surely be solved by 10 other people in the next few months.
- Focus on the learning problem, or data-poor problems
Focus on sample-efficient learning and/or work with setups where one cannot simply scale the data (due to annotation cost, privacy or other reasons).
- Reach out to industry for resources
Research lab in universities can reach out to industry for collaboration or funding. For example, Microsoft supports various fellowships (investigator fellowship, phd fellowship, dissertation grant, Ada Lovelace fellowship). Facebook, Amazon and others also have similar fellowship programs.
What can the community do?
- Avoid reducing reviewing to SOTA measurements
A research paper is more than performance gain over prior work. Reducing reviewing to how many datasets it killed is harmful for long-term research. Consider for example two contributions: one which achieves 2% improvement over SOTA by tweaking the SOTA solution, and another work which takes a less travelled route for which previous performance was significantly less than SOTA and manages to make it come within 1% (but still less) of SOTA. Which one is more informative? The first one reaffirms belief that current SOTA techniques when mixed together work well. The second one says that we have grossly underestimated the performance of another route.
- Take into consideration FLOP, memory footprint, carbon emission etc. in reviewing
Traditionally, these statistics have either never been reported or if reported then ignored by reviewers. Conference guidelines should encourage authors to report these statistics. This doesn’t require any more experiments but a simple accounting. Reviewers should judge for themselves whether a 1% gain at the cost of 10x more compute is acceptance worthy.
- Ask if empirical gains are coming from technical novelty or extensive tuning
It is not rare to see a paper reporting 5% gain over SOTA but where the core idea is only contributing <1% to performance. A simple way to check this is to ask authors to perform ablations.
- Evaluate candidates on core research contribution than citations, no. of papers, etc.
Many AI students feel a lot of pressure from unable to match the productivity of large well-funded labs writing 50-100 papers a year (1-2 paper a week). Generally, most of these papers revolve around a few topic and do not contribute fundamentally in all 100 directions (that would be nearly impossible).
Okay, but what does it have to do with compute-intensive work? Firstly, I speculate that if students are not pressured to write many papers, they would rely more on fundamental principles (or taking a completely new route) rather than using brute force compute power to improve rank on leaderboards. Secondly, if there is no pressure to publish constantly, then students maybe encouraged to try different research problems and increase the diversity of their research.
Heavily tuned deep neural architecture trained with relatively simple SGD methods have revolutionized machine learning. Similarly, use of leaderboards have made it possible to perform controlled experiments which has huge impact. These two methods have resulted in a large amount of financial interest in machine learning and AI which is also benefitting other aspects of AI and ML (and for good). However, we might be starting to see some disadvantageous of over reliance on compute-intensive and leaderboard driven research.
For those who are still feeling underwhelmed by the SOTA competition can find solace in knowing that there is still lots of work to do in developing re-usable first principles, in understanding existing solutions, and taking the more rigorous albeit slow path towards better solutions. On the way, you may find solutions that have been proposed before by people with great intuition, or by exhaustion of computational resources. However, there is great benefit in discovering a new elegant path to the same solution.