Scientists in Denmark have used artificial intelligence (AI) to resolve a key challenge in crystal-structure determination for the first time, and have made the system freely available for other researchers to use. The University of Copenhagen’s Anders Madsen and his colleagues have developed a deep-learning system called PhAI that works out hard-to-determine information about the phase of x-rays that crystals have diffracted.
Madsen tells Chemistry World that PhAI can solve the so-called ‘phase problem’ with lower-quality data than is needed for other methods and quickly arrives at high quality solutions. ‘We really get an almost perfect prediction,’ Madsen says.
Crystallography involves firing x-rays at solid chemical substances and collecting information on how they bounce off the clouds of electrons around each atom. Working backwards mathematically from these diffraction patterns creates a structural map of the substance in a crystal. Experiments provide information on one key property of a scattered x-ray’s waveform, its amplitude. However, scientists must puzzle out another key property they need to build the map: its phase.
Previous non-AI approaches to solving that problem use large amounts of measured x-ray diffraction data to infer what phase a wave is most likely to take. Madsen and his colleagues realised that deep learning is perfectly suited to building systems on large amounts of data like this. That’s because such systems are based on neural networks trained on massive datasets that then make inferences based on the patterns they’ve learned from those datasets.
PhAI is pronounced like the Greek letter phi that represents a wave’s phase in mathematical equations. The Copenhagen researchers trained it with an artificial dataset by simulating millions of crystal structures and their x-ray patterns. This is possible because it’s straightforward to accurately compute x-ray patterns of known structures, but hard to determine unknown structures from measured x-ray patterns. The inputs to and outputs from the AI are matrices of numbers including the amplitude and phase of each diffracted x-ray.
One more step
After training, the researchers could feed real data into PhAI, using a random number in place of the unknown x-ray phase. The system is designed to reinput and reprocess preceding output data until it reaches a stable estimate. However usually the first output is ‘very close to being a final set of phases’, says Madsen.
Existing phase determination methods require diffraction data with a resolution of 1.2Å, equivalent to detecting individual atoms. PhAI could solve the phase problem in about 80% of cases with lower resolution data of 2Å. This can be obtained with much less data, just 10 to 20% of that needed for higher resolution methods.
Developing PhAI took exploring up to 100 approaches, according to Madsen, and training it required four graphics processing chips running for a week. To do this economically the researchers limited its applicability to the most common crystallographic space groups, which covers over one third of organic substances. They also limited the training data to examples of crystals with a unit-cell length below 10Å. Despite this, the PhAI system could still accurately solve around 80% of structures with unit cells up to 20Å when tested against examples from the Cambridge Structural Database.
Using PhAI on such structures is already simple enough for people to do without specialist knowledge or needing to adjust parameters, says Madsen. ‘It can give a solution on a normal laptop within a minute or so,’ adds team member Anders Larsen. But because PhAI is not yet applicable to all crystal structures ‘there is a step before we are really in production mode,’ Madsen advises.
Lukáš Palatinus from the Institute of Physics of the Czech Academy of Sciences in Prague calls the study ‘a significant breakthrough in the field of structure determination, the biggest in at least two decades’. Palatinus has known about the work for a while, having seen initial results in conference presentations and a preprint paper. ‘From the first moment I saw the results, my impression was a mix of “This is really amazing!” and “It is so obvious this would work – why did we have to wait so long for someone to do it?”’ he says. ‘I am really excited to see the work officially published now.’
References
A S Larsen, T Rekis and A Ø Madsen, Science, 2024, DOI: 10.1126/science.adn2777
No comments yet