Exploring Machine Learning Algorithms and Protein Language Models Strategies to Develop Enzyme Classification Systems

Ruiz, FR; Herrera, LJ; Ortuno, F

Abstract

Discovering functionalities for unknown enzymes has been one of the most common bioinformatics tasks. Functional annotation methods based on phylogenetic properties have been the gold standard in every genome annotation process. However, these methods only succeed if the minimum requirements for expressing similarity or homology are met. Alternatively, machine learning and deep learning methods have proven helpful in this problem, developing functional classification systems in various bioinformatics tasks. Nevertheless, there needs to be a clear strategy for elaborating predictive models and how amino acid sequences should be represented. In this work, we address the problem of functional classification of enzyme sequences (EC number) via machine learning methods, exploring various alternatives for training predictive models and numerical representation methods. The results show that the best performances are achieved by applying representations based on pre-trained models. However, there needs to be a clear strategy to train models. Therefore, when exploring several alternatives, it is observed that the methods based on CNN architectures proposed in this work present a more outstanding facility for learning and pattern extraction in complex systems, achieving performances above 97% and with error rates lower than 0.05 of binary cross entropy. Finally, we discuss the strategies explored and analyze future work to develop integrated methods for functional classification and the discovery of new enzymes to support current bioinformatics tools.

Más información

Título según WOS: ID WOS:001313788200024 Not found in local WOS DB
Título de la Revista: BIOINFORMATICS AND BIOMEDICAL ENGINEERING, PT I, IWBBIO 2024
Volumen: 13919
Editorial: SPRINGER INTERNATIONAL PUBLISHING AG
Fecha de publicación: 2023
Página de inicio: 307
Página final: 319
DOI:

10.1007/978-3-031-34953-9_24

Notas: ISI