Semi-supervised learning for MALDI-TOF mass spectrometry data classification: an application in the salmon industry

Gonzalez, Camila; Astudillo, Cesar A.; Lopez-Cortes, Xaviera A.; Maldonado, Sebastian

Abstract

MALDI-TOF mass spectrometry (Matrix-Assisted Laser Desorption-Ionization (MALDI) and a Time-of-Flight detector (TOF) is a promising strategy for identifying patterns in data, establishing a relevant methodology for rapid and accurate microorganisms identification. However, this type of data is challenging to analyze due to its high complexity, and sometimes it is impossible to make a correct labeling. To address this problem, advanced data analysis techniques such as machine learning methods can be applied. In this work, we propose a novel approach using the semi-supervised paradigm for classifying MALDI-TOF mass spectrometry data. In addition, our study considers the use of labeled and unlabeled data to alleviate the issue of data labeling. Specifically, mass spectrometry data of healthy and infected salmon with the Piscirickettsia salmonis pathogen was analyzed. Our proposed algorithm based on self-training showed superior performance compared to traditional ML methods (NB, RF, SVM). Even considering a small percentage of labeled instances (25%), semi-supervised learning attains equilibrated performance across all metrics. Experimental results showed that self-training with a random forest classifier reached an accuracy of 0.9, sensitivity of 0.75, and specificity of 1. Furthermore, the feature selection allowed the identification of 15 potential biomarkers that define healthy and infected salmon profiles accurately. From a more general perspective, these results demonstrate the potential of the proposed semi-supervised learning methodology for classifying MALDI-TOF mass spectrometry data.

Más información

Título según WOS: ID WOS:000936082200002 Not found in local WOS DB
Título de la Revista: NEURAL COMPUTING & APPLICATIONS
Editorial: SPRINGER LONDON LTD
Fecha de publicación: 2023
DOI:

10.1007/s00521-023-08333-2

Notas: ISI