출처: GEN

AI-Driven Drug Discovery Straddles the Virtual and the Real

The AI approach—iteratively churning through masses of experimental and synthetic data—blurs the distinction between computation and biology

Empress Therapeutics uses its Chemilogics platform to understand how DNA encodes enzymes that make or modify metabolites, and to develop metabolites that show promise as small-molecule drugs. The company’s initial focus is on the metabolites produced by commensal bacteria. Such metabolites may lead to the discovery of what Empress calls Co-Evolved Medicines.

Patterns and commonalities that escape human notice simply because they are beyond human comprehension needn’t remain hidden. They can be revealed with the help of artificial intelligence (AI) and machine learning (ML), as has been demonstrated in many disciplines.

However, in the world of drug discovery, it seems that AI and ML have taken off only recently. Hardly a year has passed since Insilico Medicine’s INS018_055 became the first AI-generated drug to enter Phase II trials. Nonetheless, through the work of pioneering scientists, drug discovery is gaining on other disciplines in realizing the benefits of AI and ML. Indeed, as this article relates, there are several outstanding examples of scientists who are generating creative ideas and producing innovative solutions in AI-enabled drug discovery.

Creating drugs in computational space

Nicolas Tilmans, PhD
Founder and CEO, Anagenex

For many biological problems, the size and quality of the data sets available fundamentally limit the application of a lot of these state-of-the-art models.

“When you look at the kinds of datasets that exist in pharma, the biggest screening decks tend to be on the order of a single-digit million compounds,” Nicolas Tilmans, PhD, CEO and founder of Anagenex, told GEN. “If you look at what people tend to do with ChatGPT, you’re training on the entire internet.”

Jen Nwankwo, PhD
Co-founder and CEO, 1910 Genetics

At the very front end of the drug discovery process for Anagenex is a custom built several billion compound library tested in a lab under dozens of conditions. Anagenex scientists are generating huge amounts of data to feed an AI engine capable of designing small-molecule oncology medicines, specifically in the context of synthetic lethality.

At 1910 Genetics, the starting point is synthetic data. Jen Nwankwo, PhD, founder and CEO of 1910 Genetics says, “In the biological context, when tech people talk about synthetic data, they are talking about data that AI can create. You can find new ways to increase your corpus of data so that your ML models can have greater depth and breadth on which to perform.”

But designing molecules with AI comes with the major hurdle of making relevant molecules that can actually be synthesized and follow the rules of nature.

AI predictions often generate many false-positives. For example, when performing molecular docking predictions, if the parameters aren’t right, the software will fit a ligand into a binding site regardless of whether it will actually bind experimentally and have the predicted function. Even under the best of circumstances, purely computational approaches frequently miss the mark.

“We’ve seen models that create these crazy-looking chemical molecules that have carbon connected to five bonds—it’s organic chemistry 101,” Nwankwo remarks. “Or sometimes you have AI create molecules that look like they could exist in nature, but when you try to make them, you find that you have a 25% yield, or that you can’t even make them at all.”

Anchoring compounds in reality

The validation of AI-generated compounds cannot be done entirely in the computational ether. These compounds must also be validated experimentally. However, experimental validation is not purely for the sake of giving the green light to whatever candidates were originally predicted and moving them forward to clinical investigation. Experimental validation is essential for building a loop in which experimental data serves as new data for the AI tools and drives iterative improvements in successive predictions.

According to Tilmans, “ChatGPT applications work not only because they trained on the whole internet, but because they have a bunch of people who are doing this thing called reinforced learning with human feedback to make it so it doesn’t go nuts—well, ChatGPT still says crazy things, but less than it would otherwise as a result of just a lot of curated data and very large volumes of it.”

That’s why Tilmans believes that the only way to tackle these drug discovery problems is by having a really strong laboratory in addition to AI, and by having the laboratory and computational facilities work together like synergistic gears in a transmission.

Anagenex’s scientists experimentally test physical, real-world compounds using a mix of technologies. For example, the scientists feed DNA-encoded library (DEL) entries and affinity selection mass spectrometry (AS-MS) measurements into proprietary ML algorithms to design the next “evolved” generation of compounds. Then the scientists synthesize and test the compounds at the company’s Massachusetts laboratory. Tilmans asserts that in the span of a month, Anagenex can synthesize 100 million small molecules through AI-designed DELs.

“You can’t just be a computational company,” Tilmans insists. “About two-thirds of our employees are working in the laboratory. We have this initial set of two billion compounds that we have built ourselves that allows us to get a first idea of a big dataset, and then we can build it. We can refine at the order of millions of data points on the back end.”

At 1910 Genetics, scientists use a similar approach, one that involves what Nwankwo calls “wet lab proxy biological data.” Experiments are run that are surrogates for ground truth assays done at scale. For example, the most fundamental assay for measuring protein expression is a western blot, but no one can run enough western blots to generate the scale of data needed to train ML models.

In such a situation, scientists at 1910 Genetics consider how they could design a proxy asset that would, as Nwankwo puts it, get the company “about 70–80% of the way” to knowing whether the protein is expressed. “Ideally, it should be an assay that we can scale up using things like next-generation sequencing,” she continues. “We do that and generate even more data. We are talking about millions of data points per day using the proxy assay.”

To test these massive data sets and create an iterative loop between experimental data and synthetic data, 1910 Genetics built an automated laboratory in the Seaport District of Boston.

(중략)

한국제약바이오협회

국내외 기사

AI-Driven Drug Discovery Straddles the Virtual and the Real

AI-Driven Drug Discovery Straddles the Virtual and the Real

Creating drugs in computational space

Anchoring compounds in reality