Abstract
This paper introduces a new splitting criterion
called Inter-node Hellinger Distance (iHD) and a
weighted version of it (iHDw) for constructing decision trees. iHD measures the distance between
the parent and each of the child nodes in a split using Hellinger distance. We prove that this ensures
the mutual exclusiveness between the child nodes.
The weight term in iHDw is concerned with the purity of individual child node considering the class
imbalance problem. The combination of the distance and weight term in iHDw thus favors a partition where child nodes are purer and mutually exclusive, and skew insensitive. We perform an experiment over twenty balanced and twenty imbalanced datasets. The results show that decision trees
based on iHD win against six other state-of-the-art
methods on at least 14 balanced and 10 imbalanced
datasets. We also observe that adding the weight
to iHD improves the performance of decision trees
on imbalanced datasets. Moreover, according to the
result of the Friedman test, this improvement is statistically significant compared to other methods