You think it’s a long way to the back of your screening libraries? That’s peanuts to chemical space, says Derek Lowe
I want to talk about chemical space – the universe of possible compounds – and I find that there is absolutely no way to do that without quoting Douglas Adams at the first opportunity. In The hitchhiker’s guide to the galaxy, Adams famously asserted, without fear of contradiction, that ‘Space is big. Really big. You just won’t believe how vastly, hugely, mind-boggling big it is.’ Drop the word ‘chemical’ in there, and the same description applies.
Jorge Luis Borges struck a different note in his short story The library of Babel. Like many of his famous works, it is an otherworldly thought experiment. It’s set in a seemingly endless library that contains every possible permutation of identically sized books in a given alphabet. Every bit of knowledge that can be written down is there. And every variation of it is in there, too: all the misspellings, all the slightly wrong versions, all the flat-out mistakes and all the gibberish. The description convincingly makes the case that the gibberish is by far the most prominent part of the collection. Anyone who’s ever worked with compound screening sets should read this, but they may find themselves shivering as they do.
So how big is our own library of Babel? Our alphabet is the periodic table, and our spelling and grammar are the rules of chemical bond formation and compound stability. There have been some computational forays into this question, most notably from Jean-Louis Reymond and his group at the University of Bern in Switzerland. They’ve enumerated a set of plausible molecules (GDB17) with up to 17 heavy atoms (C, N, O, S and a halogen), and have a database of over 166 billion of them. But along the way, they had to do a great deal of paring down just to hold things to that number.
In chemical space, we have an effectively endless frontier. And you can be there this afternoon
A startling percentage of their compound backbones consisted of three- and four-membered rings concatenated into structures that no one can be sure are even stable, and are certainly beyond the current powers of organic synthesis. Adding those in would make the compound set many orders of magnitude larger, since only 0.005% of the original compound graphs were even considered as starting points. Their molecule generation process eliminated enamines, imines, most acetals, most isolated alkene compounds, most thioethers, and many other classes to make the final set more relevant (your screening library will definitely contain compounds that aren’t in GDB17). It’s still just a fragment-space set, though, with molecular weights well under 300.
Even after all that, it’s 166 billion carefully curated small compounds, which needless to say dwarfs any possible fragment screening collection. What about larger compounds? The best guess for the number of plausible compounds up to molecular weight 500, letting many of those functional groups back in, is around 1060. That is a number that the human mind is not well equipped to handle. That collection, assembled into compound vials at, say, 10mg per vial, would exceed the amount of ordinary matter in the entire universe. Actually, it might exceed the amount of ordinary matter in roughly 10,000 universes – depending on how big those vials are. I am leaving out such trivial details as the weight of the associated rack storage systems, the influences of dark matter and dark energy, and the likelihood of the compound collection itself undergoing gravitational collapse to form a black hole.
So against these terrifying numbers, what do we have to offer? Every compound collection that has ever been made by man is trivial against that background. Cutting the sub-500 molecular weight set down brutally, GDB17-style, might manage to take things out of the realm of cosmology, but would still bring it nowhere near anything that’s manageable by humans. There are, I think, two reactions to this. One is despair, of course, which is always an option in research, but not a very useful one. The other is to see this as an opportunity. We are not going to run out of interesting and useful structures, and the uses that they could be put to are probably also beyond our imagining. In chemical space, we really do have an effectively endless frontier. And it’s right there in front of every chemist; you can be there this afternoon. Go and have a look!
Derek Lowe (@Dereklowe) is a medicinal chemist working on preclinical drug discovery in the US and blogs at In the pipeline
References
L Ruddigkeit et al, J. Chem. Inf. Model., 2012, 52, 2864 (DOI: 10.1021/ci300415d)
No comments yet