Abstract
Decision Tree is a widely used data classification technique. This paper proposes a decision tree based classification method on uncertain data. Data uncertainty is common in emerging applications, such as sensor networks, moving object databases, medical and biological bases. Data uncertainty can be caused by various factors including measurements precision limitation, outdated sources, sensor errors, network latency and transmission problems. In this paper, we enhance the traditional decision tree algorithms and extend measures, including entropy and information gain, considering the uncertain data interval and probability distribution function. Our algorithm can handle both certain and uncertain datasets. The experiments demonstrate the utility and robustness of the proposed algorithm as well as its satisfactory prediction accuracy.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.: On density based transforms for uncertain data mining. In: ICDE, pp. 866–875 (2007)
Andrews, R., Diederich, J., Tickle, A.: A survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge Based Systems 8(6), 373–389 (1995)
Bi, J., Zhang, T.: Support Vector Classification with Input Data Uncertainty. Advances in Neural Information Processing Systems 17, 161–168 (2004)
Burdick, D., Deshpande, M.P., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data. The VLDB Journal 16(1), 123–144 (2007)
Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proceedings of the ACM SIGMOD, pp. 551–562 (2003)
Chui, C., Kao, B., Hung, E.: Mining Frequent Itemsets from Uncertain Data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS, vol. 4426, pp. 47–58. Springer, Heidelberg (2007)
Cormode, G., McGregor, A.: Approximation algorithms for clustering uncertain data. In: PODS 2008, pp. 191–199 (2008)
Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Gonzalez, E.V., Broitman, I.A.E., Vallejo, E.E., Taylor, C.E.: Targeting Input Data for Acoustic Bird Species Recognition Using Data Mining and HMMs. In: Proceedings of the ICDMW 2007, pp. 513–518 (2007)
Hawarah, L., Simonet, A., Simonet, M.: Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: The Second International Workshop on Mining Complex Data, pp. 325–329 (2006)
Jebari, C., Ounelli, H.: Genre categorization of web pages, In: Proceedings of the ICDMW 2007, pp. 455–464 (2007)
Kriegel, H., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Proceedings of the KDD 2005, pp. 672–677 (2005)
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of the tenth National Conference on artigicial intelligence, pp. 223–228 (1992)
Lobo, O., Numao, M.: Ordered estimation of missing values. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS, vol. 1574, pp. 499–503. Springer, Heidelberg (1999)
Ngai, W.K., Kao, B., Chui, C.K., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proceedings of ICDM 2006, pp. 436–445 (2006)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers, San Francisco (1993)
Quinlan, J.R.: Probabilistic decision trees. Machine Learning: an Artificial Intelligence Approach 3, 140–152 (1990)
Singh, S., Mayfield, C., Prabhakar, S., Shah, R., Hambrusch, S.: Indexing Categorical data with uncertainty. In: Proceedings of ICDE 2007, pp. 616–625 (2007)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufman Publishers, San Francisco (2005)
Xia, Y., Xi, B.: Conceptual clustering categorical data with uncertainty. In: Proceedings of international conference on tools with artificial intelligence, pp. 329–336 (2007)
Yu, Z., Wong, H.: Mining Uncertain Data in Low-dimensional Subspace. In: Proceedings of ICPR 2006, pp. 748–751 (2006)
Qin, B., Xia, Y., Prbahakar, S., Tu, Y.: A Rule-based Classification Algorithm for Uncertain Data. In: The Workshop on Management and Mining Of Uncertain Data (MOUND) (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qin, B., Xia, Y., Li, F. (2009). DTU: A Decision Tree for Uncertain Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-01307-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)