Mark III Systems Blog

AI in Early Drug Discovery 2019: Conference Trip Review from a Mark III Data Scientist’s Perspective

First, I wouldn’t consider myself an expert in drug discovery by any means.  However, I am extremely interested in how AI is being applied and will be applied to this field, as Mark III supports some of the largest and most innovative brands in healthcare, life sciences, biomedical engineering, and pharmaceuticals today.  Any technology that improves drug discovery can result in lower mortality, better quality of life, and more effective treatments of diseases.  This is a short review of what I learned at the conference.

-  The main goal of all of the talks was to show novel methods or provide case studies of how AI is being used to direct and target drug discovery.  Creating new drugs is incredibly expensive, so directing the process can save massive amounts of time, effort, and money. As one speaker put it "to drive the discovery and development of novel compounds more quickly, more efficiently, and with greater degree of downstream clinical success."  (Ron Alfa, MD, PhD, VP of Discovery and Product, Recursion Pharmaceuticals)

- Recurrent Neural Networks, particularly Long Short Term Memory (LSTMs), were very much front and center.

- Neural networks in general were a hot topic, particularly Generative Adversarial Networks (GANs).  Convolutional Neural Networks (CNNs) made their way in, too.

- GANs are being used to generate previously unknown molecules.  Many of these novel molecules were shown to actually antagonize other known molecules that are involved in disease.

- As in many fields, lack of data is a significant problem.  Data augmentation is critical to be able to use AI in drug discovery.  However, generation of the augmented data is highly challenging because in silico alteration of the conformation of molecules to use as augmented data can make significant changes to the molecule's potential interactions with other molecules, thus confounding results.

- One of the presenters showed an intriguing slide.  It showed 2 scatterplots of experimental data.  One plot used data taken from experiments found in the literature that were done by different groups.  Values clustered together nicely for experiments run for each paper.  However, when the values from different papers were combined on the plot, the clusters from different papers were often quite disparate even though they were doing the same experiments. There was a correlation, but it was fragmented.  In contrast, the plot with the data from the presenter's company showed a nice linear correlation.  The takeaway was that you have to be careful when making conclusions based on combining data from different groups who are doing the "same" experiment because the conditions could have been different, quality control may have differed, etc.  There is no substitute for verifying the work others have done in your own experimental setup.  This is different from work in other fields that are strictly computationally based since the results can be reproduced.

- On a personal level, I heard words I had not heard in a while - putative and moiety.  Both brought back good memories.  I also learned a new term  - "fractional dimension."