In March 2016, the Google-owned software company DeepMind put its game-playing algorithm AlphaGo up against Lee Sedol, one of the world’s best players at the ancient strategy game of Go. AlphaGo won. It was a milestone moment for AI – mastering a game of such creative complexity was widely regarded as an impossible task for a machine. At the time, I speculated (not very seriously) how long it would be until a future ‘AlphaChemistry’ would pick up a Nobel prize. In the end, it took about 8 years.
Demis Hassabis, who founded DeepMind and led the team behind AlphaGo, went on to build the protein structure prediction program AlphaFold with his Google colleague John Jumper, and for that work the pair picked up one half of this year’s Nobel prize in chemistry. The other half went to David Baker, for his work on designing new proteins from scratch. Both DeepMind and Baker’s lab are today at the forefront of using machine learning algorithms and AI in chemistry, and this year’s award has been seen as a recognition of the immense scientific potential of AI. Indeed, the physics prize was awarded specifically for developments in machine learning. Yet as the science itself shows, the true potential of these discoveries is probably something we haven’t predicted.
Interesting problems attract curious minds and talented people
David Baker’s first few words in the Nobel prize press conference were: ‘I stood on the shoulders of giants.’ That’s a truism in science that we generally take for granted, but it has seldom been so accurate. Not only did protein design and structure prediction rely on knowledge from across the sciences – computer science, neuroscience, biology, chemistry and more – but they also quite literally built upon a mountain of data generated by those scientists. And not just the giants – anyone who has ever solved a protein structure (even me, many years ago) added a pebble to that pile.
About 70 years ago, there was only one protein structure: myoglobin, solved using x-ray diffraction in 1957. That effort alone took a couple of decades of work and John Kendrew and Max Perutz shared a Nobel for doing it – one of the earliest awards for the structural biology science that has become such a strong theme in the chemistry prize. More structures, new techniques and many more Nobels followed and by about 20 years ago, when I was making my tiny contribution, the protein data bank contained around 24,000 protein structures. Now there are over 200,000. And it’s thanks to those decades of experimental work that AI models such as AlphaFold and Baker’s Rosetta became possible. In AlphaFold’s case, that has resulted in predicted structures for all 200 million known proteins.
The idea that we can build our understanding of biology stepwise upwards from its molecular basis is too limited
That’s an application very few of those previous generations could have imagined. Nor would many have predicted that these achievements would rely on computer science more than our understanding of biomolecular chemistry. So this work not only democratises science by making a huge volume of knowledge publicly available, but also by showing how vital scientific questions can be framed in new ways to be tackled by other fields. Interesting problems attract curious minds and talented people, and the great news for science is that those people don’t necessarily have to be experts in polypeptide chemistry or biophysics (although it does still help: Jumper and Baker are also protein chemistry experts).
Exactly what impact all this will have is impossible to tell, and it will likely be both more and less than we would predict. As Derek Lowe has discussed, the idea that drug discovery will be revolutionised is probably overselling things at the moment. Knowing protein structure is a good start, but it doesn’t tackle many of the really difficult aspects of that process. Similarly, there are many other questions that we still don’t have answers for. While we can predict protein folds, or design entirely new proteins, we don’t know how that folding process works, for example.
At each stage, we know a little more, and we see how much more is left to know. The likes of laureates such as Kendrew, Perutz and Dorothy Hodgkin took the first steps in showing how molecular structure is implicated in biological function. As the volume of structures has grown, to the point where we can predict every known protein, so has our understanding that the relationship is not so easily explained. The atomistic idea that we can (and should) build our understanding of biology stepwise upwards from its molecular basis is too limited.
The lesson of this year’s prize is that we seldom know how useful a discovery or invention is going to be. Nor can we say what questions will become important in future, and where the answers to those next questions will come from. But if we keep on adding pebbles to the pile, then tomorrow someone else can climb a little higher.
No comments yet