Parallel and Distributed Protein Processing for 3D-protein Pattern Discovery and Clustering

Valdés-Jiménez; A.; Núñez-Vivanco; G.; Jiménez-González; D.

Keywords: Clustering and Discovering 3D, protein patterns; Hybrid MPI+OpenMP; OpenMP tasks

Abstract

The discovery and clustering of three-dimensional protein patterns in a set of protein structures, without relying on predefined search patterns, can be highly beneficial for predicting the functions of unknown proteins and facilitating rational multi-target drug design. This work introduces a novel OpenMP parallelization of the 3D-PP algorithm using explicit and nested tasks, which balances data sharing synchronization and load unbalance, improving previous implementations based on implicit tasks. Given the vast number of protein structures available in the Protein Data Bank (over 231,000 from PDB and more than 1,068,000 from AlphaFold), processing large datasets locally can be constrained by available resources such as main memory and secondary storage. To address this challenge, we propose parallel approaches to the 3D-PP algorithm for distributed memory systems (using MPI) and hybrid systems (combining MPI and OpenMP). The evaluated strategies distribute the workload based on the entire protein structure. Experimental results using a dataset of 8,344 protein structures show that the new OpenMP taskified version is 1.6x faster than the previous best OpenMP implementation. Moreover, the hybrid MPI+OpenMP version achieves a speedup of up to 162.5x, demonstrating its scalability and efficiency for large-scale protein structure analysis. © 2025 IEEE.

Más información

Título según SCOPUS: Parallel and Distributed Protein Processing for 3D-protein Pattern Discovery and Clustering
Editorial: Institute of Electrical and Electronics Engineers Inc.
Fecha de publicación: 2025
Página de inicio: 260
Página final: 268
Idioma: English
DOI:

10.1109/eScience65000.2025.00038

Notas: SCOPUS