FH23: MULTIMODAL GENERATIVE AI AND ROBOTICS FOR DRUG DISCOVERY AND AGEING RESEARCH

Ai Robotics

Frontiers Health returns on Wednesday 8th November, for a three-day meeting of minds in the Italian capital of Rome. On Thursday 9th November, however, the thematic session, ‘Science and biomedical research innovation’ will take place, during which the cutting-edge ways technology drives the life sciences industry forwards will be discursively analysed – from accelerating R&D with digital biomarkers and early disease progression or decentralised trials, to informing earlier diagnoses.

To find out more, pharmaphorum spoke with Alex Zhavoronkov, founder and CEO of Insilico Medicine, who will be giving the keynote address, ‘Multi-modal generative AI and robotics for drug discovery and development and ageing research’. Zhavoronkov provided guidance on the applications of AI to therapeutic drug discovery, small molecule generation, prediction of clinical trials, and ageing research, as well as insight into where his part in this fascinating field all began.

Age and training deep neural networks

With a background in computer science and management of information systems, Zhavoronkov began his career with semiconductor companies, working with ATI Technologies.

“After my career at ATI, I decided to dedicate the rest of my life to the study of ageing and discover new therapeutics that help us treat age-related diseases,” Zavoronkhov said. “I did my grad work at Johns Hopkins, worked for a number of biotechs, [and] in 2013, at the cusp of the deep learning revolution, I decided to form Insilico and focus entirely on deep learning for every step of drug discovery and development. We decided not to go small, but to go after a grand goal with a very limited amount of funding.”

It was in this manner that the company started to provide deep learning solutions to the pharmaceutical industry and focus on algorithm design, originally to identify novel targets.

“From the very beginning, we were an expert target hunter, which used a very special approach to target discovery,” he explained. “We actually started training deep neural networks on age. Age is a universal feature that everybody has and is very correlated with when you're going to die and develop diseases. Pretty much every data set that you can find in the public domain or private domain will have age.”

“You have age, I have age, everybody has age,” he continued. “If you train deep neural networks on very, very large numbers of samples – millions to predict age – deep neural networks learn the fundamental human biology, the changes that transpire in time. We started retraining those global networks that are trained on age on diseases.”

“Disease data sets are much rarer, especially where very well annotated,” Zhavoronkov noted. “Usually, the data looks like Swiss cheese, but when you use a pre-trained network that is trained on ageing, you can transfer some weights onto the network that is trained on diseases, and vice versa, and give a deep neural network a much more sophisticated view of human biology and also animal biology, and link previously disconnected data types with deep learning.”

Insilico closed its first small series A at the start of 2018 with $6 million. By 2019, the company decided to make its software available to others.

“In a generative space, we realised that, if you don't have a community of users, you would not be able to continuously improve the products, and also you would not be able to validate at scale, [so] we decided to launch software,” said Zhavoronkov. “If you are not selling software to others, and you are not covering the entire end-to-end drug discovery, you should not be calling yourself an AI drug discovery company […] If you don't have software on the market, you are just a regular biotech.”

“Once we released the software on the market, […] many of the big pharmaceutical companies started adopting it and also using our own models as the benchmark for their own AI,” he continued. “There was a huge gap in the industry where there were a couple of companies allowing you to use machine learning for target discovery, but nobody had the same scale and nobody had the same number of models implemented in software [as us].”

Also in 2019, Insilico decided to use its own software to develop for a handful of therapeutic programs from scratch, as a demo.

“It turned out that those programs were of such high quality, [that we] started demonstrating unprecedented progress through the different stages of preclinical research,” said Zhavoronkov. “Our investors were so impressed that they basically told us that we should start doing our own drug discovery as well, and also demonstrate to everybody that AI-generated molecules for absolutely novel targets could go to the clinic.”

“The community helps you validate every model, [it] helps make software better,” he explained. “If you have your own drug discovery program – one program, if you sell it – it can eclipse the entire total available market for software after one sale in a year. We started increasing the number of those programs, increasing the novelty of those programs [and], in 2021, we raised a pretty substantial round, over $250 million from expert by technology investors, financial investors who have deep understanding in the field and decided to deploy this capital to push those programs forward.”

“Our unique advantage at that time was that we were algorithm first and generative AI first,” Zhavoronkov posited. “Generative AI is not new, it just got consumerised after ChatGPT made it available to everybody. The fundamental research goes back to 2013-14, if you're talking about deep neural network-based generative approaches. Our publication history, our ability to demonstrate academic performance, played in our favour because investors knew us, customers knew us, and contract research organisations knew us.”

When it comes to definitions, Insilico Medicine doesn’t refer to itself as a biotech.

“We like to refer to [ourselves] not as biotech, but as AI biotech or AI tech bio because you actually have AI running the thing, running the show, but at the same time, you are a biotech with a very rapidly moving pipeline,” Zhavoronkov explained.

In 2022, the company nominated nine preclinical candidates in small molecules.

“Nine preclinical candidates: it's a very large number,” he commented. “Many pharmaceutical companies nominate four to five every year in small molecules using internal R&D. We basically doubled the R&D productivity of an average pharma within one company within one year. We still faced a lot of scepticism at that time because companies wanted to see that we could get a regulatory clearance to enter human clinical trials. [Well,] we managed to get four drugs into human clinical trials, and one passed phase one, so we got into phase two with a potentially blockbuster drug targeting a broad disease, targeting a chronic disease, and that requires oral administration and high comparative potency in order to go into phase two. We managed to do that, and this became a very famous program.”

“It was covered by the Financial Times; it was on the front cover. The former FDA commissioner went on CNBC and talked about it,” Zhavoronkov continued. “Right now, the devil is always in phase two, but it's ongoing. We have two parallel phase twos […] We decided to also ensure that we do it in two different markets as two separate trials. Then, some of those AI-generated molecules that we have started entering phase one human clinical trials.”
Recently, Insilico Medicine and Exelixis entered into an exclusive global license agreement for its target ISM3091, a potentially best-in-class USP1 inhibitor for $80 million upfront plus milestones.

“That's, I think, one of the biggest deals in our industry to date, where you actually sold an AI-generated asset and that shows you it's next level from partnering with pharmaceutical companies on faith,” Zhavoronkov said. “Many pharmaceutical companies, they partner with AI companies because of a previous relationship between CEOs or board members, or because it's an investor-incubated company, and they just had a good relationship with some of the senior management […] When you actually sell an AI-generated asset for a substantial upfront, it shows you that the level of quality of those molecules is very high. They can do best in class.”

Insilico trains its AI on available world data, the breadth of public domain and private sources, both, utilising the pre-trained AI for driving a fully automated lab.

“In 2021, we decided to venture into AI-powered laboratory robotics,” Zhavoronkov expanded. “I committed myself and the very substantial part of the team to building a self-driving lab, which can automatically perform experiments with minimal human intervention. Humans would come in just to change the reagents and maybe transport the trash out of the lab, but most of the work is done directly by robots without human intervention. Also, target discovery step would also be automatic.”

In addition to reducing and possibly removing human error in experimentation and ensuring consistency, this also for a reduction of human bias.

“Humans are genuinely biased when it comes to target selection,” he explained. “Most of the heads of the therapeutic areas already know the targets, and they usually try to fit the data to their hypothesis that they already have in their mind. Because one bet on a target can cost them $100 million, they usually don't want to have many bets. They try to bet on something that they already know and say it propagates into the laboratory experiments. We actually allow AI to do genuine scientific exploration of the biomedical samples that you throw into this lab and automatically identify targets and automatically test them. It generates massive amounts of data, but also reduces the biases that humans have.”

Insilico’s AI-automated lab was completed and launched at the end of 2022.

“Now, we generate massive amounts of proprietary data,” Zhavoronkov said. “We also generate massive amounts of proprietary targets and target hypotheses for a variety of diseases. Predominantly it's cancer, but also metabolism and ageing, [and] many age-related processes [and] chronic diseases.”
Zhavoronkov foresees two trends in particular that he believes will change the industry.

“One is multimodal transformers where you can train AI systems on very different data types at the same time to perform multiple different tasks at the same time and to generalise across multiple different data types and multiple different tasks,” he predicted. “For example, we recently published a paper called Precious 1-GPT, where we trained a multimodal transformer on very distinct data types – so, gene expression and methylation to perform multiple tasks, to predict age, for example – to help identify targets using different philosophies and provide basic interpretation why certain targets are implicated in diseases and why certain targets are implicated in disease, in ageing at the same time.”

“Now, we're continuing on this trend,” he said. “[Following,] Precious 2-GPT, a very substantial generative multimodal transformer that allows you to perform multiple tasks and also generate synthetic data with the desired properties, we're working on the next generation of multimodal generative for target discovery and chemistry, at the same time, Precious 3-GPT: that is, a multi-species, multi-omics, multi-tissue, multitasking transformer that can generalise across many different functions. Think of it as a medicinal chemist who also knows biology and at the same time knows how to work with medicine.”

“As the systems become better and as we spend a little bit more time training them and also reinforcing them for quality and for accuracy, the systems are expected to outperform everything that we've seen before,” Zhavoronkov explained. “We also started utilising a technology called reinforcement learning from expert feedback. We use expert teachers to teach those models to perform better and to ensure that we reward or punish those models for producing good or bad output.”

“You've seen that Chat GPT and Open AI to develop chat GPT-3.5 use the technology called reinforcement learning from human feedback where they paid [non-expert] humans,” he said. “We have the same approach, but in our industry, you cannot pay somebody $2 an hour to quickly look at the output and see if it's correct or not. In our industry, those people get hundreds of thousands of dollars to actually run companies.”
In short, Zhavoronkov believes “multimodal is the name of the game”.

“If you're not a multimodal, you are going to lose, and also multi-species,” he commented. “One of the abilities of multimodal transformers is to translate from species to species, and that's my big thing. We actually try to train on large amounts of data from mice, from monkeys, from dogs, and humans at the same time, so that in the future, we can reduce the need for animal experimentation to the very minimum.”

“You still need to do the animal experiments, but we want to ensure that it's translatable to humans,” he continued. “Actually, when you are looking at the basic mechanisms of ageing or age-related diseases in animal models and on humans at the same time, age is a great universal feature […but] their clocks just tick differently and faster at different levels. Many animals, they actually tick slower than ours. There’s different metabolism, but we have something to learn from those long-lived species.”

“We think that ageing research is another big trend: people realise more and more that biology doesn’t stand still,” he added. “If you are just looking at biology at this current time, it’s like working with pictures, instead of working with videos, because your life is a video; it’s a continuous process. It’s not just a snapshot of you today. If you want to understand human biology in time, you need to work with ageing because all of us degrade and die, and many of us die of terrible diseases.”

Additionally, Zhavoronkov foresees quantum computing as becoming a trend, also.

“In [terms of] generative chemistry, I think the first frontiers, the lowest-hanging fruit [is] quantum computing because some of the steps in generative can be accelerated on a real quantum,” he said. “I'm not talking about quantum chemistry simulations; I'm talking about real qubits. We've already put a couple of papers out explaining this concept and even showing some experimental work. I think that quantum systems are becoming much more scalable, more accessible, and useful for generative chemistry.”

Lastly, automation.

“Anything that can be done by a human and it's a repetitive and tedious task should be roboticised,” Zhavoronkov stated.
In 2014, however, deep learning was still considered “a very exotic field” with a limited talent pool. So, how did Insilico manage?

“We were blessed to have a collective of scientists who understood biology and, at the same time, deep learning,” Zhavoronkov admitted. “They comprised the core of the company, [including] my scientific co-founder, Alex Aliper. I co-supervised his PhD as well. Many other young scientists who joined us at that time were pretty much fresh from PhD, from postdocs, and they had more experience in deep learning than they had in drug discovery. We had to fill that gap. In the first two years of our existence, we were focusing on algorithms that allow us to solve certain problems within big pharma and also identify a bunch of targets.”

Then, at the end of 2015, Insilico Medicine realised that true value in drug discovery comes from “owning your own chemistry”.

“We decided to become an end-to-end company, going after biology and chemistry at the same time,” Zhavoronkov explained. “In 2014, generative adversarial networks were first published by Ian Goodfellow and Yoshua Bengio out of the University of Montreal. Then, Ian went to Google, and that set the generative AI revolution on fire. You probably remember deep fakes and very photorealistic images generated from text in 2015-16: we decided to try that approach for generation of novel molecular structures, novel molecules for the targets that we identify using biological AI.”

And so it was that they managed to publish their first paper on generative adversarial networks, applied to the generation of novel molecules with the desired biochemical properties, at the end of 2016 in a peer-reviewed journal. Following that, 2017 was “a streak of many theoretical papers”.

“At that time, all the pharmaceutical companies and pretty much all our peers were saying that deep learning is hype, generative AI is hype. It's overhyped. There is no experimental proof,” he said. “We decided to actually validate for the first time experimentally […] I had, at that time, probably about two rounds of synthesis in the bank. Synthesis is expensive. You can basically have two experimental runs. My first one resulted in some hits, but not exactly great. The second run, we achieved pretty substantial results. The companies that were doing our validation […] they actually saw the results and decided to invest.”