AI model finds medicinal compounds 1,000x faster
The most up-to-date computational models are slower and less accurate than a geometric deep-learning model.
Molecules are everywhere in the known universe. There are an infinite number of them. But how many of these molecules might have drug-like properties that could be used to make life-saving drugs? Millions? Billions? Trillions? The answer is ten thousand, or 1060. This huge number makes it harder to make drugs for diseases that spread quickly, like Covid-19, because it is much bigger than what existing drug design models can figure out. To give you an idea, the Milky Way has about 100 million stars, or 108.
In a paper that will be presented at the International Conference on Machine Learning (ICML), MIT researchers made a geometric deep-learning model called EquiBind that binds drug-like molecules to proteins 1,200 times faster than QuickVina2-W, which is one of the fastest computational molecular docking models already available. EquiBind is based on its predecessor, EquiDock, which binds two proteins together using a method developed by the late Octavian-Eugen Ganea, who was a postdoc at the MIT Computer Science and Artificial Intelligence Laboratory and Abdul Latif Jameel Clinic for Machine Learning in Health. Ganea also co-wrote the EquiBind paper.
Before a drug can be made, a process called "drug discovery" must be done to find promising molecules that are like drugs and can bind to or "dock" with certain protein targets. The binding drug, which is also called a ligand, can stop a protein from working once it has docked to it. If this happens to a bacterium's essential protein, it can kill the bacterium and keep the human body safe.
But the process of finding new drugs can be expensive, both in terms of money and time. It can take billions of dollars and over a decade of development and testing before a drug gets final approval from the Food and Drug Administration. Also, when drugs are tested on people, 90% of them fail because they don't work or have too many side effects. One way drug companies make up for the costs of drugs that don't work is by raising the prices of the ones that do.
At the moment, computers are used to find molecules that might be good drug candidates. Most of the best computational models use a lot of candidate samples and methods like scoring, ranking, and fine-tuning to find the best "fit" between the ligand and the protein.
Hannes Stark, the lead author of the paper and a first-year graduate student at MIT's Department of Electrical Engineering and Computer Science, compares typical methods for binding ligands to proteins to "trying to fit a key into a lock with a lot of keyholes." Most models score each "fit" before choosing the best one, which takes a lot of time. EquiBind, on the other hand, predicts the exact key location in a single step without knowing anything about the target pocket of the protein. This is called "blind docking."
EquiBind is different from most models because it doesn't need multiple tries to find a good place for the ligand in the protein. Instead, the model already has built-in geometric reasoning that helps it learn the physics of molecules and generalise to make better predictions when it meets new, unseen data.
When these results were made public, industry experts like Pat Walters, who is the chief data officer at Relay Therapeutics, took notice right away. Walters suggested that the team test their model on a drug and protein that are already used to treat lung cancer, leukaemia, and tumours in the digestive tract. Most traditional docking methods couldn't get the ligands that worked on those proteins to stick to them, but EquiBind did.
Walters says that EquiBind is a unique way to solve the docking problem because it uses both pose prediction and binding site identification. "This approach could change the field in new ways because it uses information from thousands of publicly available crystal structures."
"We were amazed that EquiBind was able to put it in the right pocket when all the other methods either got it wrong or only got one right," Stark says. "We were very happy with the results for this."
EquiBind has gotten a lot of feedback from professionals in the industry, which has helped the team think about how the computational model could be used in the real world. Stark hopes to find different perspectives at the ICML in July.
"I'm most interested in hearing ideas about how to make the model even better," he says. "I want to talk to those researchers about... to let them know what I think the next steps should be and to encourage them to use the model for their own papers and methods... Many researchers have already asked us if we think the model could help them with their problems.
This work was partly paid for by the Pharmaceutical Discovery and Synthesis consortium, the Jameel Clinic, the DTRA Discovery of Medical Countermeasures Against New and Emerging Threats programme, the DARPA Accelerated Molecular Discovery programme, the MIT-Takeda Fellowship, and the NSF Expeditions grant Collaborative Research: Understanding the World Through Code.
This work is dedicated to the memory of Octavian-Eugen Ganea, a brilliant scholar with a humble spirit who made important contributions to the study of geometric machine learning and helped a lot of students.