Predicting drug affinity with genetic algorithms

What is Drug Discovery and Development?
How does medicine work?
A true story from pharmacology
| Drug | Target | Structure | Complex |
|---|---|---|---|
| Nevirapine | RT | ![]() |
|
| Saquinavir | Protease | ![]() |
Drug Discovery and Development
The process of bringing a drug from research to market.
Target to Hit
Target to Hit - Screening
![]() |
![]() |
![]() |
![]() |
| + | + | + | + |
![]() |
![]() |
![]() |
![]() |
| = | = | = | = |
| -10.24 kCal/mol | -32.90 kCal/mol | -3.03 kCal/mol | -80.21 kCal/mol |
Search a set of compounds (ligands) for those most likely to bind to the receptor at a given binding site.
Solving the search problem with HTS
Using Robots for HTS at NHS
https://www.biotek.com/resources/docs/Miniaturization_Bioscience_Tech.pdf
Solving the search problem with ML
Receptor / binding site data from PDB @ rcsb.org
X-ray Crystallography
Laue diffraction pattern collected at 14 ID using a tetrameric hemoglobin crystal from Scapharca Inequivalvis (courtesy of W. E. Royer, University of Massachusetts Medical School).
3V81.pdb
... ATOM 15495 CG2 ILE D 393 25.671 -30.671 33.000 1.00 35.23 C ATOM 15496 CD1 ILE D 393 26.353 -27.753 33.542 1.00 31.16 C ATOM 15497 N GLN D 394 22.074 -29.425 33.020 1.00 43.01 N ATOM 15498 CA GLN D 394 21.126 -29.496 31.914 1.00 44.92 C ATOM 15499 C GLN D 394 21.924 -29.442 30.622 1.00 45.70 C ATOM 15500 O GLN D 394 22.885 -28.681 30.525 1.00 42.47 O ATOM 15501 CB GLN D 394 20.068 -28.385 31.997 1.00 47.62 C ATOM 15502 CG GLN D 394 18.615 -28.837 31.693 1.00 58.82 C ATOM 15503 CD GLN D 394 18.037 -29.911 32.674 1.00 67.94 C ATOM 15504 OE1 GLN D 394 18.086 -29.763 33.900 1.00 64.07 O ATOM 15505 NE2 GLN D 394 17.466 -30.983 32.112 1.00 78.85 N ATOM 15506 N LYS D 395 21.536 -30.264 29.644 1.00 46.90 N ATOM 15507 CA LYS D 395 22.294 -30.406 28.394 1.00 44.85 C ATOM 15508 C LYS D 395 22.666 -29.059 27.782 1.00 43.49 C ATOM 15509 O LYS D 395 23.785 -28.849 27.323 1.00 41.30 O ...
Ligand data from PubChem Compound @ NCBI
saquinavir.sdf
...
2.9061 -4.9077 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 -3.3522 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 -4.3938 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
5.5386 3.4770 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
5.5386 0.7770 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
6.7966 3.6019 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
5.9996 3.6019 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 22 2 0 0 0 0
23 2 1 6 0 0 0
2 72 1 0 0 0 0
3 31 2 0 0 0 0
4 39 2 0 0 0 0
5 40 2 0 0 0 0
6 15 1 0 0 0 0
6 18 1 0 0 0 0
...
The revised search problem
Search a set of PubChem compounds for those most likely to bind to the receptor model at a given binding site.
Molecular Docking
How do we find the affinity for a complex?
+
=
? kCal/mol
Conformations
Representing the conformation
$$C_L = (x, y, z, q_w,q_x,q_y,q_z, t_1,t_2,\ldots,t_n)$$
$$energy(R, L, C_L) = E_{inter} + E_{intra}$$
Intermolecular Energy
$$E_{inter} = \sum_{i \in L} \sum_{j \in R}\Bigg[E_{PLP}(r_{ij}) + 332.0\frac{q_i q_j}{{4r_{ij}}^2}\Bigg]$$
Bond energy
Piecewise Linear Potential
$$E_{PLP}(r_{ij})$$
Electrostatic Coulomb Interaction
$$332.0\frac{q_i q_j}{{4r_{ij}}^2}$$
Intramolecular energy
$$E_{intra} = \sum_{i < j \in L} E_{PLP}(r_{ij}) + \sum_{b}A[1-cos(m\theta - \theta_0)] + E_{clash}$$
Torsional Energy
$$\sum_{b}A[1-cos(m\theta - \theta_0)]$$
| $$\theta_0$$ | $$m$$ | $$A$$ | |
| $$sp^2-sp^3$$ | $$0.0$$ | $$6$$ | $$1.5$$ |
| $$sp^3-sp^3$$ | $$\pi$$ | $$3$$ | $$3.0$$ |
Hybridization
Hybridization
Clash Penalty
$$E_{clash} = \left\{\begin{array}{11} 1000 & : r_{ij} < 2\\ 0 & : r_{ij} \ge 2 \end{array} \right.$$
$$energy\big(R, L, (x, y, z, q_w,q_x,q_y,q_z, t_1,t_2,\ldots,t_n)\big)\\ = \sum_{i \in L} \sum_{j \in R}\Bigg[E_{PLP}(r_{ij}) + 332.0\frac{q_i q_j}{{4r_{ij}}^2}\Bigg] + \\\sum_{i < j \in L} E_{PLP}(r_{ij}) + \sum_{b}A[1-cos(m\theta - \theta_0)]\\ + E_{clash}$$
$$\min_{C \in S_C} energy(R, L, C)$$
Minimizing a multidimensional function
$$C_L = (x, y, z, q_w,q_x,q_y,q_z, t_1,t_2,\ldots,t_n)$$
Saquinavir has 13 rotatable bonds
Using Genetic Algorithms for optimization
GAs mimic biological natural selection by repeatedly modifying a population of individual solutions to produce a final generation of optimized solutions.
Initialize Population
$$C = (x, y, z, q_w,q_x,q_y,q_z, t_1,\\t_2,\ldots,t_n)$$
$$P = \{C_1, C_2, \ldots, C_n\}$$
Evaluate Energy
$$E_n = energy(R, L, P_n)$$
Generate Offspring
For each member of population:
Let $P_\alpha$, $P_\beta$, $P_\delta$ be random members of the population
Let $c$ be the genetic crossover
Let $s$ be the scaling factor
Let $r \in [0, 1]$ be a randomly generated value
Generate Offspring
$c=0.9, s=0.5$
| $x$ | $y$ | $z$ | $q_w$ | $q_x$ | $q_y$ | $q_z$ | $t_1$ | |
| $r$ | 0.51 | 0.96 | 0.32 | 0.36 | 0.25 | 0.50 | 0.38 | 0.34 |
| $P_i$ | 0.01 | 0.97 | 0.41 | 0.94 | 0.48 | 0.57 | 0.66 | 0.09 |
| $P_{\alpha}$ | 0.71 | 0.48 | 0.35 | 0.52 | 0.01 | 0.96 | 0.22 | 0.03 |
| $P_{\beta}$ | 0.52 | 0.01 | 0.96 | 0.13 | 0.99 | 0.66 | 0.16 | 0.79 |
| $P_{\delta}$ | 0.42 | 0.81 | 0.05 | 0.32 | 0.37 | 0.00 | 0.73 | 0.37 |
| $O_i$ | 0.61 | 0.97 | 0.78 | 0.26 | 0.14 | 1.29 | -0.43 | 0.06 |
Determine Fitness
$$E^o_i = energy(R, L, O_i)$$
$$ P_i = \left\{\begin{array}{11} O_i & : E^o_i < E_i\\ P_i & : E^o_i \ge E_i \end{array} \right. $$
Terminate?
Let $v$ be variance
$$\overline{E} - \min(E) < v$$
or if we have reached the iteration limit.
Avoiding local minima
score: -199.62 kcal/mol score: -194.79 kcal/mol score: -192.91 kcal/mol score: -152.98 kcal/mol score: -146.35 kcal/mol score: -143.57 kcal/mol score: -139.40 kcal/mol score: -127.21 kcal/mol score: -74.22 kcal/mol score: -61.66 kcal/mol
Verifying results - RMSD
$$RMSD = \sqrt{\frac{1}{N}\sum_{i}^{i = N}\delta_i^2}$$
Verifying results
Verifying results
80% accuracy
Optimizing the optimizer
Limiting complexity
$$O\Bigg[n\Big(RL + \frac{L(L - 1)}{2} + b\Big)\Bigg]$$
$$O(nRL)$$
Gridmaps
$$O(nRL) \to O(RE + nL)$$
Performing a full screen
Supercomputing with AWS
Simple web tool
Results
Protein Flexibility
Quantum Mechanical Effects
Water
Deep Learning for Drug Discovery
Quantum Computing for Drug Discovery
Hardware-efficient Variational Quantum Eigensolver for Small Molecules and Quantum Magnets