Semi-supervised learning for MS MALDI-TOF data
Abstract
MALDI-TOF mass spectrometry (laser desorption/ionization assisted by a flight time mass detection matrix) is a promising strategy for identifying patterns in data, establishing a relevant methodology for rapid and accurate identification of microorganisms. However, this type of data is difficult to analyze due to its complexity, and sometimes it is impossible to make a correct labeling. To address this problem, advanced data analysis techniques such as machine learning (ML) methods can be applied. This research proposes a methodology to classify mass spectrometry (MS) data applying a semi-supervised learning (SSL) approach called self-training. This type of learning uses labeled and unlabeled data simultaneously in the training process to alleviate the scarcity of data labels. To demonstrate the efficiency of this proposal, MS data of healthy salmon infected with the pathogen Piscirickettsia salmonis was analyzed. Experimental results showed that self-training with random forest performs appropriately, achieving an accuracy of 0.9. Furthermore, feature selection allows the identification of seven potential biomarkers that define healthy and sick salmon profiles accurately.
Más información
Título según WOS: | Semi-supervised learning for MS MALDI-TOF data |
Título de la Revista: | 2021 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI) |
Editorial: | IEEE |
Fecha de publicación: | 2021 |
DOI: |
10.1109/LA-CCI48322.2021.9769825 |
Notas: | ISI |