Parallel and Distributed Protein Processing for 3D-protein Pattern Discovery and Clustering
Keywords: Clustering and Discovering 3D, protein patterns; Hybrid MPI+OpenMP; OpenMP tasks
Abstract
The discovery and clustering of three-dimensional protein patterns in a set of protein structures, without relying on predefined search patterns, can be highly beneficial for predicting the functions of unknown proteins and facilitating rational multi-target drug design. This work introduces a novel OpenMP parallelization of the 3D-PP algorithm using explicit and nested tasks, which balances data sharing synchronization and load unbalance, improving previous implementations based on implicit tasks. Given the vast number of protein structures available in the Protein Data Bank (over 231,000 from PDB and more than 1,068,000 from AlphaFold), processing large datasets locally can be constrained by available resources such as main memory and secondary storage. To address this challenge, we propose parallel approaches to the 3D-PP algorithm for distributed memory systems (using MPI) and hybrid systems (combining MPI and OpenMP). The evaluated strategies distribute the workload based on the entire protein structure. Experimental results using a dataset of 8,344 protein structures show that the new OpenMP taskified version is 1.6x faster than the previous best OpenMP implementation. Moreover, the hybrid MPI+OpenMP version achieves a speedup of up to 162.5x, demonstrating its scalability and efficiency for large-scale protein structure analysis. © 2025 IEEE.
Más información
| Título según SCOPUS: | Parallel and Distributed Protein Processing for 3D-protein Pattern Discovery and Clustering |
| Editorial: | Institute of Electrical and Electronics Engineers Inc. |
| Fecha de publicación: | 2025 |
| Página de inicio: | 260 |
| Página final: | 268 |
| Idioma: | English |
| DOI: |
10.1109/eScience65000.2025.00038 |
| Notas: | SCOPUS |