Safety Precautions by Yasmin Dwiputri & Data Hazards Project
This year, the Nobel committee decided to award two prizes to research advanced by AI systems. The prize in Chemistry was given to David Baker of the University of Washington, as well as Demis Hassabis and John Jumper of Google DeepMind for developing an AI model capable of accurately predicting proteins’ complex structures. The Nobel Prize in Physics was awarded to John Hopfield of Princeton University and Geoffrey Hinton of the University of Toronto for their artificial neural networks and data pattern reconstruction work developed in the 1980s. This work gave rise to the machine learning revolution that began around 2010.
While both groups of recipients are recognized for outstanding contributions to their fields, what does this mean? Why did the Nobel Committee recognize AI-driven research in two separate categories this year? Perhaps it is to signal that models have proven adept enough to perform under the rigorous standards demanded of world-class scholarship. However, given the enormous amount of computational power and data necessitating these models, it may also reveal an olive branch to the technology companies underpinning their use.
Scientists across the academic world are embracing the use of AI for their research and, with it, the incredible risks of a nascent technology enabled by large corporations. Consumer class LLMs like Chat GPT are known to hallucinate, and models made for scientific inquiry are no different. More pressingly, machine learning systems are deeply opaque. This means that whatever outcomes scientists using these systems may find statistically significant, they know nothing of the underlying mechanisms that made them. If they are able to contribute to the model’s training, unraveling their behavior post-training is detached from certainty. This has not stopped the progression of AI-enabled research, with a 270% increase in related publications from 2010 (from 88,000 to 240,00 in 2024). Data generated from these research models attempts to capture the natural world indirectly, which threatens to compromise the interpretability and reliability of results pursued by academic disciplines that normalize their use.
A downstream impact of this momentum is more fundamental to the disciplines themselves: originality. LLMs offer something we cannot do ourselves, which is finding patterns in tremendous complexity. But that is still data that we assemble from the observable universe.
Original inference reaches for nuance beyond what data can reveal on its own. Researchers must employ unique rationale instead of relying upon raw pattern analysis to be groundbreaking, especially for achievements as high as the Nobel Prize. Their hypothesis testing must stretch a field’s known boundaries. While generative AI supports creativity, it often discourages true originality, creating a culture that values compelling results but stifles genuine discovery.
In the competitive and funding-scarce environment that is academia, reckless AI adoption into research is an urgent issue. But as powerful models struggle to guarantee basic causal relationships, to say nothing of the argument that these data-dependent systems stand to make the scientific method obsolete, any scientist ought to tread lightly for their own sake of credibility.