Skip to main content

DTU: A Decision Tree for Uncertain Data

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Abstract

Decision Tree is a widely used data classification technique. This paper proposes a decision tree based classification method on uncertain data. Data uncertainty is common in emerging applications, such as sensor networks, moving object databases, medical and biological bases. Data uncertainty can be caused by various factors including measurements precision limitation, outdated sources, sensor errors, network latency and transmission problems. In this paper, we enhance the traditional decision tree algorithms and extend measures, including entropy and information gain, considering the uncertain data interval and probability distribution function. Our algorithm can handle both certain and uncertain datasets. The experiments demonstrate the utility and robustness of the proposed algorithm as well as its satisfactory prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://archive.ics.uci.edu/ml/datasets.html

  2. Aggarwal, C.: On density based transforms for uncertain data mining. In: ICDE, pp. 866–875 (2007)

    Google Scholar 

  3. Andrews, R., Diederich, J., Tickle, A.: A survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge Based Systems 8(6), 373–389 (1995)

    Article  MATH  Google Scholar 

  4. Bi, J., Zhang, T.: Support Vector Classification with Input Data Uncertainty. Advances in Neural Information Processing Systems 17, 161–168 (2004)

    Google Scholar 

  5. Burdick, D., Deshpande, M.P., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data. The VLDB Journal 16(1), 123–144 (2007)

    Article  Google Scholar 

  6. Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proceedings of the ACM SIGMOD, pp. 551–562 (2003)

    Google Scholar 

  7. Chui, C., Kao, B., Hung, E.: Mining Frequent Itemsets from Uncertain Data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS, vol. 4426, pp. 47–58. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Cormode, G., McGregor, A.: Approximation algorithms for clustering uncertain data. In: PODS 2008, pp. 191–199 (2008)

    Google Scholar 

  9. Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Gonzalez, E.V., Broitman, I.A.E., Vallejo, E.E., Taylor, C.E.: Targeting Input Data for Acoustic Bird Species Recognition Using Data Mining and HMMs. In: Proceedings of the ICDMW 2007, pp. 513–518 (2007)

    Google Scholar 

  11. Hawarah, L., Simonet, A., Simonet, M.: Dealing with Missing Values in a Probabilistic Decision Tree during Classification. In: The Second International Workshop on Mining Complex Data, pp. 325–329 (2006)

    Google Scholar 

  12. Jebari, C., Ounelli, H.: Genre categorization of web pages, In: Proceedings of the ICDMW 2007, pp. 455–464 (2007)

    Google Scholar 

  13. Kriegel, H., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Proceedings of the KDD 2005, pp. 672–677 (2005)

    Google Scholar 

  14. Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of the tenth National Conference on artigicial intelligence, pp. 223–228 (1992)

    Google Scholar 

  15. Lobo, O., Numao, M.: Ordered estimation of missing values. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS, vol. 1574, pp. 499–503. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  16. Ngai, W.K., Kao, B., Chui, C.K., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proceedings of ICDM 2006, pp. 436–445 (2006)

    Google Scholar 

  17. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers, San Francisco (1993)

    Google Scholar 

  18. Quinlan, J.R.: Probabilistic decision trees. Machine Learning: an Artificial Intelligence Approach 3, 140–152 (1990)

    Google Scholar 

  19. Singh, S., Mayfield, C., Prabhakar, S., Shah, R., Hambrusch, S.: Indexing Categorical data with uncertainty. In: Proceedings of ICDE 2007, pp. 616–625 (2007)

    Google Scholar 

  20. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

  21. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufman Publishers, San Francisco (2005)

    MATH  Google Scholar 

  22. Xia, Y., Xi, B.: Conceptual clustering categorical data with uncertainty. In: Proceedings of international conference on tools with artificial intelligence, pp. 329–336 (2007)

    Google Scholar 

  23. Yu, Z., Wong, H.: Mining Uncertain Data in Low-dimensional Subspace. In: Proceedings of ICPR 2006, pp. 748–751 (2006)

    Google Scholar 

  24. Qin, B., Xia, Y., Prbahakar, S., Tu, Y.: A Rule-based Classification Algorithm for Uncertain Data. In: The Workshop on Management and Mining Of Uncertain Data (MOUND) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qin, B., Xia, Y., Li, F. (2009). DTU: A Decision Tree for Uncertain Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics