IEEE Transactions 2002; 160-169
Elegant Decision Tree Algorithm for Classification in Data Mining.Chandra B, Mazumdar S, Arena VC, Parimi NDecision trees have been found very effective for classifications especially in Data Mining. This paper aims at improving the performance of the SLIQ decision tree algorithm (Metha et. Al, 1996) for classification in data mining. The drawback of this algorithm is that large number of gini indices have to be computed at each node of the decision tree. In order to decide which attribute is to be split at each node, the gini indices have to be computed for all the attributes and for each successive pair of values for all patterns which have not been classified. An improvement over the SLIQ algorithm has been proposed to reduce the computational complexity. In this algorithm, the gini index is computed not for every successive pair of values of an attribute but over different ranges of attribute values. Classification accuracy of this technique was compared with the existing SLIQ and the Neural Network technique on three real life datasets consisting of the effect of different chemicals on water pollution, Wisconsin Breast Cancer Data and Image data. It was observed that the decision tree constructed using the proposed decision tree algorithm gave far better classification accuracy than the classification accuracy obtained using the SLIQ algorithm irrespective of the dataset under consideration. The classification accuracy of this algorithm was even better comparted to the neural network classification technique. Overall, it was observed that this decision tree algorithm not only reduces the number of computations of gini indices but also leads to better classification accuracy. Department of Biostatistics, Graduate School of Public Health, University
of Pittsburgh, Pennsylvania 15261, USA.
|