A data mining method for breast cancer identification based on a selection of variables

Holsbach, Nicole; Fogliatto, Flavio Sanson; Anzanello, Michel Jose

Abstract

In the majority of countries, breast cancer among women is highly prevalent. If diagnosed in the early stages, there is a high probability of a cure. Several statistical-based approaches have been developed to assist in early breast cancer detection. This paper presents a method for selection of variables for the classification of cases into two classes, benign or malignant, based on cytopathological analysis of breast cell samples of patients. The variables are ranked according to a new index of importance of variables that combines the weighting importance of Principal Component Analysis and the explained variance based on each retained component. Observations from the test sample are categorized into two classes using the k-Nearest Neighbor algorithm and Discriminant Analysis, followed by elimination of the variable with the index of lowest importance. The subset with the highest accuracy is used to classify observations in the test sample. When applied to the Wisconsin Breast Cancer Database, the proposed method led to average of 97.77% in classification accuracy while retaining an average of 5.8 variables.

Más información

Título según WOS: ID WOS:000336049700030 Not found in local WOS DB
Título de la Revista: CIENCIA & SAUDE COLETIVA
Volumen: 19
Número: 4
Editorial: ABRASCO - Brazilian Association of Collective Health
Fecha de publicación: 2014
Página de inicio: 1295
Página final: 1304
DOI:

10.1590/1413-81232014194.01722013

Notas: ISI