‘This is the first time, to the best of my knowledge, that a computer program predicts a synthesis, you go to the lab and – boom! – it works.’ This is Bartosz Grzybowski from the Ulsan National Institute of Science and Technology, South Korea and the Polish Academy of Sciences talking about his retrosynthetic tool Chematica. The software, which has been laboriously programmed with 50,000 chemical rules and can evaluate thousands of potential routes, has proven that it can design practical syntheses for eight compounds that provide a tough synthesis challenge.
The first time I heard about Chematica, I was a little bit in disbelief, but also really excited
Sarah Trice, MilliporeSigma
‘The first time I heard about Chematica, I was a little bit in disbelief, but also really excited, because [it could] change all of chemistry,’ says Sarah Trice, who works in cheminformatics technologies at MilliporeSigma, US, the company that recently bought Chematica. Trice and Grzybowski led the study together with Milan Mrksich from Northwestern University, US. ‘For me it represented a huge shift in how organic chemistry would be approached,’ adds Trice.
Usually, chemists rely on their experience and knowledge – their chemical intuition – to come up with a reaction sequence that builds a large molecule out of smaller building blocks. They must also take into account restrictions such as functional groups incompatible with certain reactions.
Doing retrosynthesis, Grzybowski explains, is like playing chess: there are a number of basic moves. During a game, each move opens up a new branch to a different outcome. But in organic synthesis ‘the number of basic moves – basic reaction types – is just ginormous, in the tens of thousands’, he says. After each synthetic step around 100 possible next steps become available, meaning the longer a route is the more dazzling the number of possibilities becomes.
But chemists are biased, explains Trice. ‘What’s been successful in the past tends to be what they go for,’ she says. Chematica is intended to eliminate this bias, Grzybowski stresses. The algorithm has been taught more than 50,000 rules over the last 15 years, as well as the means to navigate this enormous chemical space and devise meaningful reaction sequences.
Proving ground
Although this concept looked great, says Trice, ‘there was no data to support that it actually worked’. Now Chematica has proved its worth in the lab for the first time. The algorithm found viable routes to eight molecules: six small biologically active compounds, one blockbuster drug and one natural product.
Earlier routes to most of the six medicinal compounds had low yields and some had even evaded synthesis attempts altogether. Chematica devised routes with fewer than 10 reactions using only common reagents. Nevertheless, the team made their target compounds in higher yields – in some cases raising it from 1% to 60% – and spending less time and money in the lab than during previous attempts.
Some of the retrosynthetic disconnections the software found were unusual – such as a three-component aza-Henry reaction en route to a single-enantiomer quinolone-fused lactam. ‘The chemists were quite uncomfortable in some cases executing the pathways because they did not think that they would be successful based upon their gut instinct,’ Trice laughs. But Grzybowski says that ‘the rules of the game were these: you can’t change any retrosynthetic disconnections and you have to follow the general methodology’.
Conditional constraints
As Chematica doesn’t give precise conditions for each reaction, there is still some trial and error when it comes to optimisation. However, to reflect the time and financial constraints of industry, the team limited itself to five attempts on each reaction and a maximum of 70 hours to complete each route.
The blockbuster anti-arrhythmia drug dronedarone offered a different challenge: its synthesis is protected by 46 patents. Chematica figured out a way to circumvent the protected reactions with the same overall yield as the patented routes.
Richmond Sarpong and his group from the University of California, Berkeley, had the chance to test Chematica. ‘[We] were very impressed with [Chematica’s] capabilities,’ he says. The algorithm is particularly effective in finding sequential pericyclic reactions to address a particular structural motif, he adds. ‘I have found the program to be more capable in this regard as compared to human recognition.’
However, Sarpong points out, ‘like humans, [the program] is challenged in predictions involving architecturally complex molecules where stereoelectronic subtleties arise’. He tells Chemistry World that his team is currently working on a route Chematica suggested.
Chematica’s next big frontier is complex natural products, Grzybowski says. His team is just putting the finishing touches to a 15-step synthesis of a newly isolated plant alkaloid. ‘I never considered myself a very competent organic chemist. This is the first total synthesis of my life,’ he enthuses. ‘I think it has the potential to really help people who are not classically trained synthetic chemists,’ Trice adds.
Grzybowski is eager to point out that Chematica should not be seen as a threat to chemists’ expertise. As the algorithm can’t teach itself new reactions, ‘we will still need bold organic chemists ready to defy programs like Chematica [and] find unconventional ways to make molecules,’ agrees Mariola Tortosa, who synthesises natural products at the Autonomous University of Madrid, Spain. ‘However, the program has the ability to significantly accelerate the process [of retrosynthesis],’ she says, adding that she would definitely use it in her own lab.
MilliporeSigma is already working with industrial and academic partners to test the software, but hopes to release a commercial version later this year.
References
T Kluznik et al, Chem, 2018, 4, 1 (DOI: 10.1016/j.chempr.2018.02.002)
No comments yet