CFP last date
16 December 2024
Reseach Article

Issues in Optimization of Decision Tree Learning: A Survey

by Dipak V. Patil, R. S. Bichkar
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 3 - Number 5
Year of Publication: 2012
Authors: Dipak V. Patil, R. S. Bichkar
10.5120/ijais12-450512

Dipak V. Patil, R. S. Bichkar . Issues in Optimization of Decision Tree Learning: A Survey. International Journal of Applied Information Systems. 3, 5 ( July 2012), 13-29. DOI=10.5120/ijais12-450512

@article{ 10.5120/ijais12-450512,
author = { Dipak V. Patil, R. S. Bichkar },
title = { Issues in Optimization of Decision Tree Learning: A Survey },
journal = { International Journal of Applied Information Systems },
issue_date = { July 2012 },
volume = { 3 },
number = { 5 },
month = { July },
year = { 2012 },
issn = { 2249-0868 },
pages = { 13-29 },
numpages = {9},
url = { https://www.ijais.org/archives/volume3/number5/227-0512/ },
doi = { 10.5120/ijais12-450512 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T10:45:47.888573+05:30
%A Dipak V. Patil
%A R. S. Bichkar
%T Issues in Optimization of Decision Tree Learning: A Survey
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 3
%N 5
%P 13-29
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Decision tree induction is a simple but powerful learning and classification model. Decision tree learning offers tools for discovery of relationships, patterns and knowledge from data in databases. The volume of data in databases is growing to quite large sizes, both in the number of attributes and instances. Decision tree learning from a very large set of records in a database is quite complex task and is usually a very slow process, which is often beyond the capabilities of existing computers. There are various issues and problems related to decision trees. To handle these issues various approaches have been proposed in the past by different researchers. This paper is an attempt to summarize the proposed approaches, tools etc. for decision tree learning with emphasis on optimization of constructed trees and handling large datasets.

References
  1. Mitchell, 1997. Machine Learning, The McGraw-Hill Companies, Inc.
  2. J. R. Quinlan, 1993. C4. 5: Programming for Machine Learning. San Francisco, CA: Morgan Kaufman.
  3. S. K. Murthy, 1998. Automatic construction of decision trees from data: a multi-disciplinary survey. Data Mining and Knowledge Discovery, Vol. 2, No. 4, pp. 345-389.
  4. E. Alpaydin, 2005. Introduction to machine Learning Prentice-Hall of India.
  5. S. Ruggieri, 2002. Efficient C4. 5. IEEE Transaction on Knowledge and Data Engineering, Vol. 14, No. 2, pp. 438-444.
  6. Moshe Ben-Bassat, 1987. Use of distance measure, Information measures and error bounds on feature evaluation. In Sreerama Murthy (1), pp. 9-11.
  7. Mark Last and Oded Maimon, 2004. A compact and accurate model for classification. IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 2, pp. 203-215.
  8. Byung Hwan Jun, Chang Soo Kim, Hong-Yeop Song and Jaihie Kim, 1997. A new criterion in selection and discretization of attributes for the generation of decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 12, pp. 1371-1375.
  9. Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone, 1984. Classification and Regression Trees. Wadsworth International Group, Belmont, California.
  10. S. K. Murthy, Simon Kasif and Steven Salzberg, 1994. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research 2, pp. 1-33.
  11. R. S. Mantaras, 1991. A distance based attribute selection measure for decision tree induction. Technical Report, Machine Learning, Vol. 6, pp. 81-92.
  12. B. Chandra, R. Kothari, P. Paul, 2010. A new node splitting measure for decision tree construction Pattern Recognition Vol. 43, Elsevier Publishers, pp. 2725-2731.
  13. E. Rounds, 1980. A combined nonparametric approach to feature selection and binary decision tree design. Pattern Recognition, Vol. 12, pp. 313-317.
  14. P. E. Utgoff and J. A. Clouse, 1996. A Kolmogorov-Smirnoff metric for decision tree induction. Tech. Rep. No. 96-3, Dept. Comp. Science, University Massachusetts, Amherst.
  15. J. K. Martin, 1997. An exact probability metric for decision tree splitting and stopping. Machine Learning, Vol. 28, No. 2-3, pp. 257-29.
  16. W. L. Buntine and T. Niblett, 1992. A further comparison of splitting rules for decision-tree induction. Machine Learning, Vol. 8, pp. 75-85.
  17. T. Windeatt and G. Ardeshir, 2001. An empirical comparison of pruning methods for ensemble classifiers. Proc. of 4th International Conference on Advances in Intelligent Data Analysis, Cascais, Portugal, pp. 208-217.
  18. Floriana Esposito, Donato Malerba and Giovanni Semeraro, 1997. A comparative analysis of methods for pruning decision trees. IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 19, No. 5, pp. 476-491.
  19. J. R. Quinlan, 1987. Simplifying decision trees. International Journal of Man Machine Studies Vol. 27, pp. 221-234.
  20. J. Mingers, 1989. An empirical comparison of pruning methods for decision tree induction,'' Machine Learning, Vol. 3, pp. 227-243.
  21. M. Mehta, J. Rissanen and R. Agrawal, 1995. MDL-based decision tree pruning. Proc. of the 1st International Conference on Knowledge Discovery in Databases and Data Mining, Montreal, Canada, pp. 216-221.
  22. J. Ross Quinlan and Ronald L. Rivest, 1989. Inferring decision trees using the minimum description length principle. Inform. Comput. Vol. 80, pp. 227-248.
  23. I. Bratko and M. Bohanec, 1994. Trading accuracy for simplicity in decision trees. Machine Learning, Vol. 15, pp. 223-250.
  24. H. Allamullim, 1996. An efficient algorithm for optimal pruning of decision trees. Artificial Intelligence, Vol. 83, Issues 2, pp. 347-362.
  25. Matti Kaariainen, 2004. Learning small trees and graphs that generalize. A Report, University of Helsinki, Finland Series of Publications Helsinki.
  26. Lawrence Hall, Richard Collins, Kevin W. Bowyer and Robert Banfield, 2002. Error-Based pruning of decision trees grown on very large data sets can work! Proc. of the 14th IEEE International Conference on Tools with Artificial Intelligence, pp. 233-238.
  27. T. Oates and D. Jensen, 1999. Toward a theoretical understanding of why and when decision tree pruning algorithms fail. Proc. of the Sixteenth National Conference on Artificial Intelligence, pp. 372-378.
  28. Jan Macek and Lenka Lhotsk, 2004. Gaussian complexities based decision tree pruning. Cybernetics and Systems 2004, Austrian Society for Cybernetics Studies Vienna, pp. 713-718.
  29. Eibe Frank, 2000. Pruning Decision Trees and List. A Doctoral Thesis Submitted to University of Waikato.
  30. J. P. Bradford, Clayton Kunz, Ron Kohavi, Clifford Brunk and C. E. Brodley, 1998. Pruning decision trees with misclassification costs. European Conference on Machine Learning, pp. 131-136.
  31. Clayton Scott, 2005. Tree pruning with subadditive penalties. IEEE Transactions On Signal Processing, Vol. 53, No. 12, pp. 4518-4525.
  32. Andrew P. Bradley and Brian C. Lovell, 1995. Cost-sensitive decision tree pruning: use of the ROC curve. In Eighth Australian Joint Conference on Artificial Intelligence, November 1995, Canberra, Australia pp. 1-8.
  33. J. Cai1, J. Durkin1 and Q. Cai, 2005. CC4. 5: cost-sensitive decision tree pruning. Proc. of Data Mining Conference, Skiathos, Greece. pp. 239-245.
  34. Y. Mansour, 1997. Pessimistic decision tree pruning based on tree size. Proc. of 14th International Conference on Machine Learning, pp. 195-201.
  35. X. Huo, Seoung Bum Kim, Kwok-Leung Tsui and Shuchun Wang, 2006. FBP: A frontier-based tree-pruning algorithm. INFORMS Journal on Computing Vol. 18, No. 4, pp. 494-505.
  36. Johannes Faurnkranz, 1997. Pruning algorithms for rule learning. Machine Learning, Vol. 27, pp. 139-172.
  37. Shesha Shah and P. S. Sastry, 1999. New algorithms for learning and pruning oblique decision trees. IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews, Vol. 29, No. 4, pp. 494-505.
  38. G. V. Kass, 1980. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, Vol. 29, No. 2, pp. 119-127.
  39. Quinlan J. R. , 1986. Induction of decision trees. Machine Learning, Vol. 1-1, pp. 81-106.
  40. Ron Kohavi 1994. Feature subset selection as search with probabilistic estimates. In proc. the AAAI Fall Symposium on Relevance. pp. 122-126.
  41. Rich Caruana and Dayne Freitag, 1994. Greedy attribute selection. Proc. of the 11th International Conference on Machine Learning. pp. 28-36.
  42. Mark Last, Abraham Kandel, Oded Maimon and Eugene Eberbach, 2000. Anytime algorithm for feature selection. Proc. of Second International Conference on Rough Sets and Current Trends in Computing, pp. 532 - 539.
  43. Shaomin Wu and Peter A. Flach, 2002. Feature selection with labelled and unlabelled data. In Marko Bohanec, Dunja Mladenic, and Nada Lavrac, editors, ECML/PKDD'02 workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning. pp. 156-167.
  44. Huang Yuan, Shian-Shyong Tseng, Wu Gangshan and Zhang Fuyan, 1999. A two-phase feature selection method using both filter and wrapper. Proc. of IEEE International Conference on Systems, Man and Cybernetics, pp. 132 - 136.
  45. Krzysztof Grabczewski and Norbert Jankowski, 2005. Feature selection with decision tree criterion. Proc. of Fifth International Conference on Hybrid Intelligent Systems. 6-9 Nov. pp. 212-217.
  46. Jose Bins, Bruce A. Draper, 2001. Feature selection from huge feature sets. Proc. of International Conference on Computer Vision, Vancouver, pp. 159-165.
  47. Cesar Guerra-Salcedo, Stephen Chen, Darrell Whitley and Stephen Smith, 1999. Fast and accurate feature selection using hybrid genetic strategies. Proc. of the Congress on Evolutionary Computation. pp. 177-184.
  48. Jacques-Andre Landry, Luis Da Costa and Thomas Bernier, 2006. Discriminant feature selection by genetic programming: towards a domain independent multi-class object detection system. Journal of Systemics, Cybernetics and Informatics, Vol. 1, 3. pp. 76-81.
  49. Bala, J. Huang and H. Vafaie, K. DeJong and H. Wechsler, 1995. Hybrid learning using genetic algorithms and decision trees for pattern classification. Proc. of the IJCAI conference, Montreal. pp. 719-724
  50. Gaelle Legrand and Nicolas Nicoloyannis, 2005. Feature selection and preferences aggregation. Machine Learning and Data Mining in Pattern Recognition, Springer Heidelberg, pp. 203-217.
  51. Mark A. Hall and Lloyd A. Smith, 1997. Feature subset selection: a correlation based filter approach. Proc. of International Conference on Neural Information Processing and Intelligent Information Systems1997, pp. 855-858.
  52. W. Duch, J. Biesiada, T. Winiarski, K. Grudzinski and K. Gr. Abczewski, 2002. Feature selection based on information theory filters and feature elimination wrapper methods. Proc. of the International Conference on Neural Networks and Soft Computing Advances in Soft Computing, pp. 173-176.
  53. Mark A. Hall, 2000. Correlation-based feature selection for discrete and numeric class machine learning. Proc. of International Conference on Machine Learning, Stanford University, CA. Morgan Kaufmann Publishers, pp. 359-366.
  54. Huang Yuan, Shian-Shyong Tseng, Wu Gangshan and Zhang Fuyan, 1999. A two-phase feature selection method using both filter and wrapper. Proc. of IEEE International Conference on Systems, Man and Cybernetics, pp. 132 - 136.
  55. Pier Luca Lanzi, 1997. Fast feature selection with genetic algorithms: a filter approach. Proc. of 1997 IEEE International Conference on Evolutionary Computation. pp. 537-540
  56. G. H. John, 1995. Robust decision trees: Removing outliers from databases. In Proc. of the First ICKDDM, 1995, pp. 174-179.
  57. A. Arning, R. Agrawal, and P. Raghavan, 1996. A linear method for deviation detection in large databases. In KDDM 1996, Pp. 164–169.
  58. A. I. Guyon, N. Matic and V. Vapnik, 1996. Discovering informative patterns and data cleaning, Advances in knowledge discovery and data mining, AAAI 1996, pp. 181-203.
  59. G D. Gamberger and N. Lavrac, 1997. Conditions for Occam's Razor applicability and noise elimination. In Marteen van Someren and Gerhard Widmer, editors, Proc. of the 9th European Conference on Machine Learning, Springer, pp. 108-123.
  60. E. M. Knorr and R. T. Ng 1997. A unified notion of outliers: properties and computation, In Proc. of 3rd International Conference on Knowledge Discovery and Data Mining.
  61. E. M. Knorr and R. T. Ng, 1998. Algorithms for mining distance-based outliers in large datasets. In Proc. 24th VLDB, pp. 392–403, 24–27.
  62. D. Tax and R. Duin, 1998. Outlier detection using classifier instability Proc. of the workshop Statistical Pattern Recognition, Sydney.
  63. C. E. Brodley and M. A. Friedl, 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, pp. 131-167.
  64. I S. Weisberg, 1985. Applied Linear Regression, John Wiley and Sons.
  65. D. D. Gamberger, N. Lavrac, and C. Groselj, 1999. Experiments with noise filtering in a medical domain. , In Proc. 16th ICML, Morgan Kaufman, San Francisco, CA, pp. 143–151.
  66. S. Schwarm and S. Wolfman, 2000. Cleaning data with Bayesian methods. Final project report for University of Washington Computer Science and Engineering.
  67. S. Ramaswamy, R. Rastogi, and K. Shim, 2000. Efficient algorithms for mining outliers from large data sets. ACM SIGMOD Volume 29, Issue 2 June, pp. 427 - 438.
  68. V. Raman and J. M. Hellerstein, 2000. An interactive framework for data transformation and cleaning Technical report University of California Berkeley, California.
  69. J. Kubica and A. Moore, 2003. Probabilistic noise identification and data cleaning, Third IEEE International Conference on Data Mining, 19-22 Nov.
  70. V. Verbaeten and A. V. Assche, 2003. Ensemble methods for noise elimination in classification problems. In Multiple Classifier Systems. Springer.
  71. J. A. Loureiro, L. Torgo, and C. Soares, 2004. Outlier detection using clustering methods: a data cleaning application, in Proc. of KDNet Symposium on Knowledge-based Systems for the Public Sector. Bonn, Germany.
  72. H. Xiong, G. Pande, M. Stein and Vipin Kumar, 2006. Enhancing Data analysis with noise removal, IEEE Transaction on knowledge and Data Engineering Volume 18, Issue 3, March 2006, pp. 304 – 319.
  73. Seung Kim, Nam Wook Cho, Bokyoung Kang, Suk-Ho Kang, 2011. Fast outlier detection for very large log data Expert Systems with Applications Vol. 38, 2011 Elsevier. pp. 9587–9596.
  74. S. Hido, Y. Tsuboi, H. Kashima, M. Sugiyama and T. Kanamori, 2011. Statistical Outlier Detection Using Direct Density Ratio Estimation. Knowledge and Information Systems. vol. 26, no. 2, pp. 309-336, 2011.
  75. George John and Pat Langley, Static Versus Dynamic Sampling for Data Mining, In Proc. of the Second International Conference on Knowledge Discovery and Data Mining 1996, AAAI Press, pp. 367-370.
  76. Foster Provost and David Jensen and Tim Oates, 1999. Efficient Progressive sampling, In Proc. of the Fifth International Conference on Knowledge Discovery and Data Mining, 1999 ACM Press, pp. 23-32.
  77. D. V. Patil and R. S. Bichkar, 2006. A hybrid evolutionary approach to construct optimal decision trees with large data sets. In Proc. IEEE ICIT06 Mumbai, 15-17 December 2006, pp. 429-433.
  78. R. J. Little and D. B. Rubin, 1987. Statistical Analysis with Missing Data. John Wiley and Sons, New York.
  79. G. Batista and M. C. Monard, 2003. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, Vol. 17, pp. 519-533.
  80. Jerome H. Friedman, Jon Louis Bentley and Raphael Ari Finkel, 1977. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, Vol. 3 pp. 209-226.
  81. J. R. Quinlan, 1986. Unknown attribute values in induction. Journal of Machine Learning Vol. 1, pp. 81-106.
  82. Kuligowski R. J. and Barros A. P. 1998. Using artificial neural Networks to estimate missing rainfall data. Journal AWRA 34(6), 14.
  83. Brockmeier L. L. , Kromrey J. D. and Hines C. V. 1998. Systematically Missing Data and Multiple Regression Analysis: An Empirical Comparison of Deletion and Imputation Techniques. Multiple Linear Regression Viewpoints, Vol. 25, pp. 20-39.
  84. Abebe A. J. , Solomatine D. P. and Venneker R. G. W. 2000. Application of adaptive fuzzy rule-based models for reconstruction of missing precipitation events. Hydrological Sciences Journal. 45 (3), pp. 425–436.
  85. Sinharay S. , Stern H. S. and Russell D. 2001. The use of multiple imputations for the analysis of missing data. Psychological Methods Vol. 4: pp. 317–329.
  86. Khalil K. , Panu M. and Lennox W. C. 2001. Groups and neural networks based stream flow data infilling procedures. Journal of Hydrology, 241, pp. 153–176.
  87. Bhattacharya B. , Shrestha D. L. and Solomatine D. P. 2003. Neural networks in reconstructing missing wave data in Sedimentation modeling. In the Proc. of 30th IAHR Congress, Thessaloniki, Greece Congress, August 24-29 2003 Thessaloniki, Greece.
  88. Fessant F. and Midenet, S. 2002. Self-organizing map for data imputation and correction in surveys. Neural Comput. Appl. 10, pp. 300-310.
  89. Musil C. M. , Warner C. B. , Yobas P. K. and Jones S. L. 2002. A comparison of imputation techniques for handling missing data. Weston Journal of Nursing Research 24(7), pp. 815-829.
  90. Junninen H. , Niska H. , Tuppurainen K. , Ruuskanen J. and Kolehmainen M. 2004. Methods for imputation of missing values in air quality data sets. Atoms. Environ. 38, pp. 2895–2907.
  91. M. Subasi, E. Subasi and P. L. hammer, 2009. New Imputation Method for Incomplete Binary Data, Rutcor Research Report, August 2009.
  92. Amman Mohammad Kalteh and Peder Hjorth, 2009. Imputation of Missing values in precipitation-runoff process database. Journal of Hydrology research. 40. 4, pp. 420 - 432.
  93. Rhian M. Daniel, Michael G. Kenward, 2011. A method for increasing the robustness of multiple imputation, Computational Statistics and Data Analysis, doi 10. 1016/j. csda. 2011. 10. 006, Elsevier.
  94. Patil and Bichkar, 2010. Multiple Imputation of Missing Data with Genetic Algorithms based Techniques. International Journal on Computer Applications Special Issue on Evolutionary Computation in Optimisation Techniques (2), pp. 74 -78.
  95. Gary Mitchell Weiss, 2003. The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning. A Doctoral Thesis Submitted to the Graduate School, New Brunswick Rutgers, The State University of New Jersey.
  96. A. Papagelis and D. Kalles, 2000. GATree: Genetically evolved decision trees. Proc. 12th International Conference On Tools With Artificial Intelligence, pp. 203-206.
  97. Zhiwei Fu and Fannie Mae, 2001. A computational study of using genetic algorithms to develop intelligent decision trees. Proc. of the 2001 IEEE Congress On Evolutionary Computation Vol. 2. pp. 1382-1387.
  98. A. Niimi and E. Tazaki, 2000. Genetic programming combined with association rule algorithm for decision tree construction. Proc. of fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, Vol. 2, pp. 746-749.
  99. Y. Kornienko and A. Borisov, 2003. Investigation of a hybrid algorithm for decision tree generation. Proc. of the Second IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, pp. 63-68.
  100. Zhi-Hua Zhou and Yuan Jiang, 2004. NeC4. 5: Neural ensemble based C4. 5. IEEE Transactions On Knowledge And Data Engineering, Vol. 16, No. 6. pp. 770-773.
  101. C. Z. Janikow, 1998. Fuzzy decision trees: Issues and methods. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 28, Issue 1, pp. 1-14.
  102. Zeidler and M. Schlosser, 1996. Continuous-valued attributes in fuzzy decision trees. Proc. of the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 395-400,
  103. C. Z. Janikow, 1996. A genetic algorithm method for optimizing the fuzzy component of a fuzzy decision tree. In CA for Pattern Recognition, editors S. Pal and P. Wang, CKC Press, pp. 253-282,
  104. Myung Won Kim, Joong Geun Lee, and Changwoo Min, 1999. Efficient fuzzy rule generation based on fuzzy decision tree for data mining. Proc. of IEEE International Fuzzy Systems Conference Seoul, Korea, 22-25.
  105. Maciej Fajfer and C. Z. Janikow, 2000. Bottom-up fuzzy partitioning in fuzzy decision trees. Proc. of 19th International Conference of the North American Fuzzy Information Processing Society, 2000 pp. 326 - 330.
  106. Marina Guetova, Steffen Holldobler and Hans-Peter Storr, 2004. Incremental fuzzy decision trees. International Conference on Fuzzy Sets and Soft Computing in Economics and Finance, St. Petersburg, Russia.
  107. Kazuto Kubota, Hiroshi Sakai, Akihiko Nakase and Shigeru Oyanagi, 2000. Parallelization of decision tree algorithm and its performance evaluation. Proc. of The Fourth International Conference on High Performance Computing in the Asia-Pacific Region, Vol. 2. pp. 574 -579.
  108. R. Kufrin, 1997. Decision trees on parallel processors Machine Intelligence and pattern recognition , Vol. 20, Elsevier . pp. 279-306.
  109. G. J. Narlikar, 1998. A parallel, multithreaded decision tree builder. A Technical Report, School of Computer Science, Carnegie Mellon University.
  110. M. V. Joshi, G. Karypis and V. Kumar, 1998. Scalparc: A new scalable and efficient parallel classification algorithm for mining large datasets. Proc. of the International Parallel Processing Symposium. pp. 573-579.
  111. Lawrence O. Hall, Nitesh Chawla and Kevin W. Bowyer, 1998. Combining decision trees learned in parallel. Distributed Data Mining Workshop at International Conference of Knowledge Discovery and Data Mining. pp. 77-83.
  112. M. J. Zaki, C. T. Ho and R. Agrawal, 1999. Parallel classification for data mining on shared-memory multiprocessors. IEEE International Conference on Data Engineering, pp. 198-205.
  113. Anurag Srivastava, Eui¬Hong Han, Vipin Kumar and Vineet Singh, 1999. Parallel formulations of decision-tree classification algorithms. Data Mining and Knowledge Discovery: An International Journal, vol. 3, no. 3. pp. 237-261.
  114. Kazuto Kubota, Akihiko Nakase and Shigeru Oyanagi, 2001. Implementation and performance evaluation of dynamic scheduling for parallel decision tree generation. Proc. of the 15th International Parallel and Distributed Processing Symposium. pp. 1579-1588.
  115. Ruoming Jin and Gagan Agrawal, 2003. Communication and memory efficient parallel decision tree construction. Proc. of Third SIAM Conference on Data Mining.
  116. Ruoming Jin, Ge Yang and Gagan Agrawal, 2004. Shared memory parallelization of data mining algorithms: techniques, programming interface, and performance. IEEE Transactions On Knowledge And Data Engineering, Vol. 16, No. 10. pp. 71-89.
  117. Li Wenlong, Xing Changzheng, 2010. Parallel Decision Tree Algorithm Based on Combination, IEEE International Forum on Information Technology and Applications (IFITA) Kunming, 2010, 16-18 July 2010, pp. 99-101.
  118. Jie Ouyang, Patel N. , Sethi I. K. 2008. Chi-Square Test Based Decision Trees Induction in Distributed Environment IEEE International Conference on Data Mining Workshops, 2008. ICDMW '08. 15-19 Dec. pp. 477 – 485.
  119. Kanishka Bhaduri, Ran Wolff, Chris Giannella, Hillol Kargupta, 2008. Distributed Decision-Tree Induction in Peer-to-Peer Systems, Journal Statistical Analysis and Data Mining. Vol. 1 Issue 2, June 2008 John Wiley and Sons.
  120. Bin Liu, Shu-Gui Cao, Xiao-Li Jim, Zhao-Hua Zhi, 2010. Data mining in distributed data environment, International Conference on Machine Learning and Cybernetics (ICMLC), 11-14 July 2010 Vol. 1 pp. 421 – 426.
  121. M. Mehta, R. Agrawal and J. Rissanen, 1996. SLIQ: A fast scalable classifier for data mining. Proc. of the Fifth international Conference on Extending Database Technology, Avignon, France. pp. 18-32.
  122. Shafer, R. Agrawal and M. Mehta, 1996. SPRINT: A scalable parallel classifier for data mining. Proc. of the 22nd VLDB Conference. pp. 544-555.
  123. J. Gehrke, R. Ramakrishnan and V. Ganti, 1998. Rainforest- A framework for fast decision tree construction of large datasets. Proc. of Conference on Very Large Databases (VLDB). pp . 416-427.
  124. K. Alsabti, S. Ranka and V. Singh, 1998. CLOUDS: a decision tree classifier for large datasets. Proc. of Conference on Kno wledge Discovery and Data Mining (KDD-98), pp. 2-8.
  125. J. Gehrke, V. Ganti, R. Ramakrishnan and W. Loh, 1999. BOAT-optimistic decision tree construction. Proc. of Conference SIGMOD, pp. 169-180.
  126. P. Chan and S. J. Stolfo, 1993. Toward parallel and distributed learning by meta-learning. In Working Notes AAAI Work. Knowledge Discovery in Databases, pp. 227-240.
  127. Todorovski L. and Dzeroski, 2000. Combining multiple models with meta decision trees. Proc. of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery, pp. 54-64.
  128. L. Todorovski and Dzeroski, 2003. Combining classifiers with meta decision trees. Machine Learning, Vol. 50, issue 3, pp. 223-249.
  129. B. Zenko, L. Todorovski, and Dzeroski, 2001. A comparison of stacking with meta decision trees to bagging, boosting, and stacking with other methods. Proc. of the 2001 IEEE International Conference on Data Mining, pp. 669-670.
  130. Andreas L. Prodromidis, Philip K. Chan and Salvatore J. Stolfo, 2000. Meta-learning in distributed data mining systems: Issues and approaches. editors Hillol Kargupta and Philip Chan, Book on Advances of Distributed Data Mining AAAI press. pp. 81-113.
  131. S. Stolfo, W. Fan, W. Lee, A. Prodromidis and P. Chan, 1997. Credit Card Fraud Detection Using Metalearning: Issues and Initial Results. In working notes of AAAI Workshop on AI Approaches to Fraud Detection and Risk Management.
  132. S. Rasoul Safavian and David Landgrebe 1991. A survey of decision tree classifier methodology. IEEE Transaction on Systems, Man, and Cybernetics. Vol. 21, Issue 3, pp. 660 - 674.
  133. Lior Rokach and Oded Maimon, 2005. Top-down induction of decision trees classifiers-a survey. IEEE Transactions On Systems, Man, And Cybernetics-Part C: Applications And Reviews, Vol. 35, No. 4. pp. 476-487.
  134. Utgoff P. E. 1989. Incremental induction of decision trees. Machine Learning, 4. pp. 161-186.
  135. R. Reynolds and Hasan Al-Shehri, 1998. The use of cultural algorithms with evolutionary programming to guide decision tree induction in large databases Proc. of The 1998 IEEE International conference on Evolutionary Computation, at IEEE World Congress on Computational Intelligence at Anchorage, AK, USA, pp. 441-546.
  136. S. K. Murthy, S. Kasif, S. Salzberg, And R. Beigel, 1993. OC1: Randomized induction of oblique decision trees. In Proc. Eleventh National Conference on Artificial Intelligence, Washington, DC, 11-15th, July 1993. AAAI Press, pp. 322-327.
  137. Rudy Setiono and Huan Liu, 1999. A connectionist approach to generating oblique decision trees. IEEE Transactions On Systems, Man, And Cybernetics, Vol. 29, No. 3.
  138. Iyengar V. S. 1999. HOT: Heuristics for oblique trees. Proc. of Eleventh International Conference on Tools with Artificial Intelligence, IEEE Press, pp. 91-98.
  139. Cantu-Paz E. and Kamath C. 2003. Inducing oblique decision trees with evolutionary algorithms. IEEE Transactions on Evolutionary Computation, Vol. 7, Issue 1, pp. 5-68.
  140. Ian H. Witten and Eibe Frank, 2005. Data Mining Practical Machine Learning Tools and Techniques. Morgan Kaufmann.
  141. Frank A. and Asuncion A. 2010. UCI Machine Learning Repository Irvine, CA [http://archive. ics. uci. edu/ml]. University of California, School of Information and Computer Science.
  142. Lev Virine and Lisa Rapley, 2003. Visualization of probabilistic business models. Proc. of the 2003 Winter Simulation Conference, Vol. 2, pp. 1779-1786.
  143. Qiang Yang, Jie Yin, Charles X. Ling and Tielin Chen, 2003. Post processing decision trees to extract actionable knowledge. Proc. of the Third IEEE International Conference on Data Mining, 2003. Florida, USA.
  144. Defu Zhang, Xiyue Zhou, Stephen C. H. Leung, Jiemin Zheng, 2010. Vertical bagging decision trees model for credit scoring. Expert Systems with Applications, Elsevier Publishers, Vol. 37. pp. 7838-7843.
  145. Wing-Kin Sung, David Yang, Siu-Ming Yiu, David W. Cheung, Wai-Shing Ho, and Tak-Wah Lam, 2002. Automatic construction of online catalog topologies. IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews, Vol. 32, No. 4.
  146. Zhun Yu, Fariborz Haghighat, Benjamin C. M. Fung and Hiroshi Yoshino, 2010. A decision tree method for building energy demand modeling International Journal of Energy and Buildings Vol. 42. pp. 1637-1646.
  147. Sean D. MacArthur, Carla E. Brodley, Avinash C. Kak and Lynn S. Broderick 2002. Interactive content-based image retrieval using relevance feedback. Computer Vision and Image Understanding, pp. 55-75.
  148. Kyu Park, Kyoung Mu Lee and Sang Uk Lee, 1999. Perceptual grouping of 3D features in aerial image using decision tree classifier. In Proc. of 1999 International Conference on Image Processing, Vol. 1, pp. 31 - 35.
  149. Chris Sinclair, Lyn Pierce and Sara Matzner, 1999. An application of machine learning to network intrusion detection. In Proc. of 15th Annual Computer Security Applications Conference. pp. 371-37
  150. Tarek Abbes, Adel Bouhoula and Michael Rusinowitch, 2004. Protocol analysis in intrusion detection using decision tree. Proc. of the International Conference on Information Technology: Coding and Computing, IEEE. pp. 404-408.
  151. A. Ch. Stasis, E. N. Loukis, S. A. Pavlopoulos and D. Koutsouris, 2003. Using decision tree algorithms as a basis for a heart sound diagnosis decision support system. Proc. of the 4th Annual IEEE Conference on Information Technology Applications in Biomedicine, UK, pp. 354 -357.
  152. M. Lenic, P. Povalej, M. Zorman V. Podgorelec, P. Kokol and L. Lhotska, 2003. Multimethod machine learning approach for medical diagnosing. Proc. of the 4th Annual IEEE Conf on Information Technology Applications in Biomedicine, UK, pp. 195-198.
  153. Ming Dong, Ravi Kothari, Marty Visschert and Steven B.
  154. Lili Diao, Keyyun Hu, Yuchan Lu, Chunyi Shi Boosting, 2002. Simple decision trees with Bayesian learning for text categorization. IEEE Robotics and Automation Society Proc. of the 4th World Congress on Intelligent Control and Automation, Shanghai, China, pp. 321- 325.
  155. Bing Wu, Wen-Jun Zhou and Wei-Dong Zhang, 2003. The applications of data mining technologies in dynamic traffic prediction. IEEE Intelligent Transportation Systems, Vol. 1 pp. 396-401. Hoatht, 2001. Evaluating skin condition using a new decision tree induction algorithm. Proc. International Joint Conference on Neural Networks, Vol. 4, pp. 2456 - 2460.
  156. P. R. Kennedy and K. D. Adams, 2003. A decision tree for brain-computer interface devices. IEEE Transactions On Neural Systems And Rehabilitation Engineering, Vol. 11, No. 2. pp. 148-150.
  157. Liu Hui and GaiLiping, 2009. Statistical estimation of diagnosis with genetic markers based on decision tree analysis of complex disease International Journal of Computers in Biology and Medicine Vol. 39, pp. 989- 992.
  158. Juan Pablo Gonzalez and U. Ozguner, 2000. Lane detection using histogram-based segmentation and decision trees. Proc. of IEEE Intelligent Transportation Systems, pp. 346-351.
  159. J. Freixenet, X. Lladb, J. Marti and X. Cufi, 2000. Use of decision trees in color feature selection. application to object recognition in outdoor scenes. Proc. of International Conference on Image Processing, Vol. 3, pp. 496-499.
  160. Claudio M. Rocco S. 2004. Approximate reliability expressions using a decision tree approach. Proc. of Annual Symposium - RAMS Reliability and Maintainability, pp. 116-121.
  161. Tariq Assaf and Joanne Bechta Dugan, 2004. Diagnostic expert systems from dynamic fault trees. Annual Symposium-RAMS Reliability and Maintainability, pp. 444-450.
  162. Simard, Sasan S. Saatchi and Gianfranco De Grandi, 2000. The use of decision tree and multiscale texture for classification of jers-1 sar data over tropical forest. IEEE Transactions On Geoscience And Remote Sensing, Vol. 38, No. 5.
  163. Palaniappan, Feng Zhu, Xinhua Zhuang and Yunxin Zhao Blanchard, 2000. Enhanced binary tree genetic algorithm for automatic land cover classification. Proc. of International Geoscience and Remote Sensing Symposium, pp. 688-692.
  164. Ram Manavi, C. Weisbin, W. Zimmerman and G. Rodriguez, 2002. Technology portfolio options for NASA missions using decision trees. Proc. of IEEE Aerospace Conference, Big Sky, Montana. pp. 115-126.
  165. Yali Amit and Alejandro Murua, 2001. Speech recognition using randomized relational decision trees. IEEE Transactions On Speech And Audio Processing, Vol. 9, No. 4. pp. 333-341.
  166. L. R. Bahl, Peter F. Brown, Peter V. De Souza and Robert L. Mercer, 1989. A tree-based statistical language model for natural language speech recognition. IEEE Transactions On Acoustics, Speech, And Signal Processing, Vol. 37, No. 7. pp. 1001-1008.
  167. Junichi Yamagishi, Makoto Tachibana, Takashi Masuko and Takao Kobayashi, 2004. Speaking style adaptation using context clustering decision tree for hmm-based speech synthesis. Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 5-8.
  168. Selby R. W. and Porter, 1988. Learning from examples: generation and evaluation of decision trees for software resource analysis. IEEE Transactions On Software Engineering, Vol. 14, pp. 1743-1757.
  169. T. M. Khoshgoftaar, N. Seliya and Yi Liu, 2003. Genetic programming-based decision trees for software quality classification. Proc. of 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 374-383.
  170. S. Geetha, N. N. Ishwarya, N. Kamaraj, 2010. Evolving decision tree rule based system for audio stego anomalies detection based on Hausdorff distance statistics Information Sciences Elsevier pp. 2540-2559.
  171. Peter Kokol, Milan Zorman, Vili Podgorelec and Spela Hleb Babie, 1999. Engineering for intelligent systems. Proc. of 1999 IEEE International Conference on Systems, Man, and Cybernetics, Vol. 6, pp. 306 - 311.
Index Terms

Computer Science
Information Sciences

Keywords

Decision Tree Optimization