Synergizing Machine Learning, Conceptual Density Functional Theory, and Biochemistry: No-Code Explainable Predictive Models for Mutagenicity in Aromatic Amines

Rincon, Elizabeth; Chamorro, Eduardo

Abstract

This study synergizes machine learning (ML) with conceptual density functional theory (CDFT) to develop OECD-compliant predictive models for the mutagenic activity of aromatic amines (AAs) with a fully No-Code methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation (LOOCV), and three distinct data splits. Our research employs the GFN2-xTB method, known for its robustness and speed, to compute descriptors for procarcinogens and their activated metabolites in vacuum and aqueous phases. We evaluate the effectiveness of different theoretical definitions of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes, and the newly introduced Log QP descriptor to approximate Log P information. SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable predictive models with highly robust internal validation (Avg. Correct Classifications = 76% and Avg. Kappa = 0.29) and external validation (Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics, and the results were compared to those of a two hidden layer Multilayer Perceptron. The results indicate that the second CDP definition for the electrophilicity in both vacuum and aqueous phases and also the newly presented Log QP descriptors are the most important ones for predicting the mutagenic activity of AA (namely omega+Vac CDP2+, omega+Aq CDP2+, and LogQP1+Vac, respectively). The results indicate that metabolic activation, aqueous solvent properties, and the CDP electrophilicity schemes and Log QP should be considered when building predictive models for the mutagenic activity of AA. This study offers a replicable, No-Code approach to QSAR research, making high-level ML and CDFT applications accessible to a broader audience. Future work will expand these methods to other compound families, enhancing predictive capabilities in the study of mutagenic activities and other biological phenomena.

Más información

Título según WOS: ID WOS:001352488400001 Not found in local WOS DB
Título de la Revista: JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volumen: 64
Número: 22
Editorial: AMER CHEMICAL SOC
Fecha de publicación: 2024
Página de inicio: 8510
Página final: 8520
DOI:

10.1021/acs.jcim.4c01246

Notas: ISI