Metagenomic Binning based on Unsupervised Extreme Learning Machine

Herazo-Alvarez; J.; Barria-Valdebenito; P.; Mora; M.; Cuadros-Orellana; S.

Keywords: ELM, Clustering; GC content; kmers; Metagenomic Binning; US

Abstract

Metagenomics studies the genetic information of microbial communities in different contexts. As metagenomic DNA is often fragmented and then sequenced into small reads, these reads can be assembled into longer sequences called contigs. An important step in the metagenomic analysis pipeline is Binning, which corresponds to the classification (supervised) or clustering (unsupervised) of reads or contigs. In the case of unsupervised Binning, several Machine Learning algorithms that use DNA sequence descriptors, such as k-mers Frequency and GC Content to perform clustering, have been employed. This paper proposes the use of Unsupervised Extreme Learning Machines (US-ELM) for Metagenomic Binning. The experiments use three datasets with different numbers of species present, and compare the results obtained by US-ELM with respect to the k-means and Maximization Expectation (ME) algorithms. The performance comparison employed metrics widely used in the problem, such as Accuracy, Rand�s index, and Clustering Computation Time. From the experiments, we can see that USELM windenly outperforms the other two clustering methods in accuracy. In terms of computational cost, US-ELM is comparable to k-means, and both algorithms are much faster than EM. Numerical results show the interesting potential of the US-ELM algorithm in the metagenomic binning problem. © 2023 IEEE.

Más información

Título según SCOPUS: Metagenomic Binning based on Unsupervised Extreme Learning Machine
Título de la Revista: Proceedings - IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, ChileCon
Editorial: Institute of Electrical and Electronics Engineers Inc.
Fecha de publicación: 2023
Idioma: Spanish
DOI:

10.1109/CHILECON60335.2023.10418667

Notas: SCOPUS