Exploring Machine Learning Algorithms and Numerical Representations Strategies to Develop Sequence-Based Predictive Models for Protein Networks

Medina-Ortiz, David; Salinas, Pedro; Cabas-Moras, Gabriel; Durán-Verdugo, Fabio; Olivera-Nappa, Álvaro; Uribe-Paredes, Roberto

Keywords: machine learning algorithms, Protein language models, Protein networks, Deep learning architectures, Protein discovery

Abstract

Predicting the affinity between two proteins is one of the most relevant challenges in bioinformatics and one of the most useful for biotechnological and pharmaceutical applications. Current prediction methods use the structural information of the interaction complexes. However, predicting the structure of proteins requires enormous computational costs. Machine learning methods emerge as an alternative to this bioinformatics challenge. There are predictive methods for protein affinity based on structural information. However, for linear information, there are no development guidelines for elaborating predictive models, being necessary to explore several alternatives for processing and developing predictive models. This work explores different options for building predictive protein interaction models via deep learning architectures and classical machine learning algorithms, evaluating numerical representation methods and transformation techniques to represent structural complexes using linear information. Six types of predictive tasks related to the affinity and mutational variant evaluations and their effect on the interaction complex were explored. We show that classical machine learning and convolutional network-based methods perform better than graph convolutional network methods for studying mutational variants. In contrast, graph-based methods perform better on affinity problems or association constants, using only the linear information of the protein sequences. Finally, we show an illustrative use case, expose how to use the developed models, discuss the limitations of the explored methods and comment on future development strategies for improving the studied processes.

Más información

Título según SCOPUS: ID SCOPUS_ID:85164963833 Not found in local SCOPUS DB
Título de la Revista: Lecture Notes in Computer Science
Volumen: 13956 LNCS
Editorial: Springer, Cham
Fecha de publicación: 2023
Página de inicio: 231
Página final: 244
DOI:

10.1007/978-3-031-36805-9_16

Notas: SCOPUS