Evaluating alternate feature sets as a local search strategy for gene expression feature selection problem
Abstract
The identification of genetic biomarkers is a key process in the analysis of gene expression data. It helps to unveil possible molecular targets for designing drugs or treatments, for diseases like cancer and Alzheimer, among others. This problem has the particularity that the number of features to analyze is in the range of the thousands and the number of samples is in the range of at most one or two hundreds. Another particularity is that features can be highly correlated between each other, so a selected set of features can have thousands of alternatives with similar or good quality. This work explores the use of alternate features sets as a local search strategy (simLS), as part of three well known metaheuristics Simulated Annealing (SA), Variable Neighborhood Local Search (VNS) and Genetic Algorithm (GA). We use gene expression cancer data sets taken from the public repository CUMIDA (https://sbcb.inf.ufrgs.br/cumida) to test the proposal. Results show that the use of simLS allowed to find solutions with better or similar quality than not using it in a metaheuristic. Additionally, in most cases it is able to find smaller gene panels.
Más información
Título según WOS: | ID WOS:001458245200013 Not found in local WOS DB |
Título de la Revista: | 2024 43RD INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, SCCC |
Editorial: | IEEE |
Fecha de publicación: | 2024 |
DOI: |
10.1109/SCCC63879.2024.10767621 |
Notas: | ISI |