Integrating uncertainty into drug-target deep learning
Many drugs modulate more than one molecular target. Multitask deep neural networks learn to predict drug-target binding by example, yet public pharmacological training datasets are sparse, imbalanced, and approximate. Most annotated drug-like molecules have a single label (protein binding score) out of thousands. One protein has ten thousand molecules, another ten. Fewer than 7% of possible drug-protein pairs interact, but datasets reach 50% positive biases. The same drug-protein binding experiment, across labs, yields scores varying by orders of magnitude. To address this, we construct evaluation benchmarks reflecting drug discovery and screening scenarios, compare them to standard metrics, and develop training methods that incorporate uncertainty. Likewise, we prototype classifiers that operate on calculated small-molecule 3D conformational data instead of on conventional 2D feature vectors. The results highlight where data and feature uncertainty are a problem, but also how we can leverage uncertainty within training to improve predictions of novel drug-target relationships.
The Keiser lab combines machine learning and chemical biology methods to investigate how small molecules perturb protein networks to achieve their therapeutic effects. Michael Keiser joined the UCSF faculty in the Dept. of Pharmaceutical Chemistry and the Institute for Neurodegenerative Diseases as an Assistant Professor in 2014, with joint appointments in the Dept. of Bioengineering & Therapeutic Sciences and the Institute for Computational Health Sciences. Before this, he co-founded a startup bringing systems pharmacology methods to pharma and the US FDA. During his bioinformatics Ph.D. at UCSF as a NSF Fellow, Michael developed techniques to relate drugs and proteins from the statistical similarity of their ligands, such as the Similarity Ensemble Approach (SEA). He also holds B.Sc., B.A., and M.A. degrees from Stanford University.