AI for Drug Discovery
Energy-driven, decision-oriented computational methods integrating machine learning for pharmaceutical applications
Research Focus
Energy-Based Modeling
The AIMNet2 framework provides quantum-chemical accuracy for drug-like molecules, ions, and reactive intermediates—essential for accurate property prediction in pharmaceutical contexts. The AIMNet2-NSE extension handles open-shell systems like quinones and covalent inhibitor intermediates, compounds frequently encountered in medicinal chemistry but challenging for traditional computational methods.
Free Energy Simulations
Hybrid ML/molecular mechanics approaches significantly improve binding affinity predictions. Our methods reduce absolute binding free energy errors from 0.97 to 0.47 kcal/mol—a critical improvement for rank-ordering compounds in lead optimization campaigns. Active learning-guided lead optimization has achieved 20-fold efficiency gains compared to brute-force screening approaches.
Reaction Awareness
The AIMNet2-rxn framework evaluates millions of reaction pathways, supporting synthesis-aware discovery by assessing:
- Synthetic accessibility and retrosynthetic planning
- Strain effects in proposed molecules
- Metabolic stability predictions
- Covalent modification mechanisms
Active Learning Integration
Our models identify high-uncertainty regions to guide experimental validation selectively, aligning computational predictions with real-world resource constraints in drug development. This approach dramatically reduces the number of compounds requiring synthesis and testing.
Property Prediction
We develop machine learning models to predict key pharmaceutical properties including:
- ADMET properties: Absorption, Distribution, Metabolism, Excretion, and Toxicity
- Binding affinity: Predicting how strongly a molecule binds to its target protein
- Selectivity: Ensuring drugs bind to the intended target and not off-targets
- Synthetic accessibility: Estimating how difficult a molecule is to synthesize
- pKa prediction: Protein ionization states affecting binding and solubility
Notable Achievements
CACHE Challenge Success
In the CACHE Challenge (Critical Assessment of Computational Hit-finding Experiments) targeting the LRRK2 WD40 domain for Parkinson’s disease, our team achieved tied first-place with an 8.5% experimental hit rate—demonstrating the practical impact of our methods on real pharmaceutical targets.
Validated Computational Workflows
Our closed-loop workflows coupling predictive models with experimental validation have successfully identified novel inhibitors across multiple therapeutic targets, with compounds advancing to experimental testing at partner organizations.
Software Tools
Publicly available tools from our drug discovery research:
- Auto3D: Automatic 3D structure generation from SMILES notation
- pKa-ANI: Protein pKa prediction achieving mean absolute error under 0.5 pKa units
- AIMNet2: Neural network potential for accurate energy and property predictions
Collaborative Network
We work closely with:
- CMU Drug Discovery Platform
- UPMC Hillman Cancer Center
- Pharmaceutical industry partners (GSK, Pfizer, Genentech, and others)
- Academic collaborators worldwide
Impact
Our methods enable:
- Screening billions of molecules for drug-like properties
- Identifying promising candidates for experimental validation
- Reducing time for lead optimization from months to weeks
- Exploration of previously inaccessible chemical space
- Integration with automated synthesis platforms
Open Science
All our models and key datasets are released open-source to accelerate research globally. Visit our Software page for available tools and documentation.
Funding
National Institutes of Health
Grant:R01GM140467
Machine Learning for Drug Design and Optimization
2022-2027