Abstract:
Cytochrome P450s (CYP) are a diverse group of Heme-containing proteins found in all kingdoms
of life, that participate in vital life processes including oxidization of endogenous and exogenous
compounds. Of the 57 CYP isoforms, CYP3A4 is the most abundant isoform in humans. CYP3A4
is highly promiscuous in substrate specificity and allows the accommodation of compounds
diverse in size and structure, which leads to CYP3A4-mediated metabolism of up to 50% of all
marketed drugs. However, the ability of CYP3A4 to adjust two or more similar or different
molecules may also lead to adverse drug-drug interactions (DDIs), as the inhibition or induction
of CYP3A4 by one drug can lead to adverse effects in the in vivo metabolism of other drugs.
Pharmacokinetic issues due to the inhibition or induction of CYP isozymes are accredited for the
failure of nearly 80% of drugs during development. Therefore, it is important to analyze
cytochrome interactions before preclinical trials to ensure the success during the drug development
process. The current study aims to utilize supervised machine learning techniques and molecular
modeling strategies on publicly available CYP3A4 inhibition data to predict CYP inhibition
through the development of a predictive model and the identification of 3D features responsible
for CYP3A4 inhibition. Five models were built to predict CYP3A4 Inhibition on two refined
different datasets of CYP3A4 inhibitors: Support Vector Machine, Logistic Regression, Decision
Tree, Random Forest, and Multilayer Perceptron. The Support Vector Machine and Logistic
Regression models built on the more refined dataset outperformed all others, with accuracies of
98% and 96% indicating superior performance. Therefore, these two models built on the chosen
hyperparameters are suitable for the prediction of CYP3A4 inhibition in new chemical entities and
can assist in the drug developmental process. Additionally, all models in the more refined dataset
resulted in accuracies over 80% indicating the stabilities of the models on the data used and
highlighting the importance of the refined features and data refining in general over the use of
noisy raw data. The results draw attention to the importance of increased lipophilicity, vander
waals surface area on pharmacophoric points, number of aromatic and rotatable bonds, percentage
of Nitrogen atoms, topological distances between Nitrogen and Oxygen, and Nitrogen and Sulfur,
and overall negative charge on a molecule in CYP3A4 inhibition. Thus, this study assists in
understanding the key CYP3A4 interactions, prediction of CYP3A4 inhibition and the
optimization of the toxicological profiles of new chemical entities.