Efficient Pruning of Class Association Rules using Statistics and Genetic Relation Algorithm

Keywords: pruning, evolutionary computation, association rule mining, classification accuracy

Abstract

Association rule mining is one of the most exploited areas in data mining, which includes applications from business to data classification and summarization. Thus, several association rule-based classification methods have been proposed. Most of them often produce too many rules for humans to read over, that is, the generated rules are usually complex and hardly understandable for the users. Only some of the rules extracted are of real interest. Most of the rules are either redundant, irrelevant, or obvious. In this paper, a new post-processing method for pruning class association rules is proposed by a combination of statistics and an evolutionary method named Genetic Relation Algorithm (GRA). The algorithm is carried out in two phases. In the first phase, the rules are pruned depending on their matching degree with data, and in the second phase, GRA selects the most interesting rules using the distance between them. The two-phase method has the following properties: 1) efficient since it reduces dramatically the pruning processing time. 2) reliable because a small rule set is produced which is accurate (it keeps at least the same prediction accuracy as the original large rule set), comprehensible (it is more understandable for the users since the number of attributes involved in the rule is also small) and interesting because of the diversity of rules. The advantages of the proposed method is demonstrated using several real datasets and it is compared with other conventional methods including GNP-based mining in terms of prediction accuracy and time consumption.

Más información

Título de la Revista: SICE Journal of Control, Measurement and System Integration
Volumen: 3
Número: 5
Fecha de publicación: 2010
Página de inicio: 336
Página final: 345
Idioma: English
URL: https://doi.org/10.9746/jcmsi.3.336