PIDT: A novel decision tree algorithm based on parameterised impurities and statistical pruning approaches

Daniel Stamate, Wajdi Alghamdi*, Daniel Stahl, Doina Logofatu, Alexander Zamyatin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

5 Citations (Scopus)
41 Downloads (Pure)

Abstract

In the process of constructing a decision tree, the criteria for selecting the splitting attributes influence the performance of the model produced by the decision tree algorithm. The most well-known criteria such as Shannon entropy and Gini index, suffer from the lack of adaptability to the datasets. This paper presents novel splitting attribute selection criteria based on some families of parameterised impurities that we proposed here to be used in the construction of optimal decision trees. These criteria rely on families of strict concave functions that define the new generalised parameterised impurity measures which we applied in devising and implementing our PIDT novel decision tree algorithm. This paper proposes also the S-condition based on statistical permutation tests, whose purpose is to ensure that the reduction in impurity, or gain, for the selected attribute is statistically significant. We implemented the S-pruning procedure based on the S-condition, to prevent model overfitting. These methods were evaluated on a number of simulated and benchmark datasets. Experimental results suggest that by tuning the parameters of the impurity measures and by using our S-pruning method, we obtain better decision tree classifiers with the PIDT algorithm.

Original languageEnglish
Title of host publicationArtificial Intelligence Applications and Innovations - 14th IFIP WG 12.5 International Conference, AIAI 2018, Proceedings
PublisherSpringer New York LLC
Pages273-284
Number of pages12
ISBN (Print)9783319920061
DOIs
Publication statusE-pub ahead of print - 22 May 2018
Event14th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2018 - Rhodes, Greece
Duration: 25 May 201827 May 2018

Publication series

NameIFIP Advances in Information and Communication Technology
Volume519
ISSN (Print)1868-4238

Conference

Conference14th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2018
Country/TerritoryGreece
CityRhodes
Period25/05/201827/05/2018

Keywords

  • Concave functions
  • Decision trees
  • Machine learning
  • Optimisation
  • Parameterised impurity measures
  • Permutation test
  • Preventing overfitting
  • Significance level
  • Statistical pruning

Cite this