Skip to main content

A Comparison of Mining Incomplete and Inconsistent Data

  • Conference paper
  • First Online:
Information and Software Technologies (ICIST 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 639))

Included in the following conference series:

  • 1265 Accesses

Abstract

We present experimental results on a comparison of incompleteness and inconsistency. Our experiments were conducted on 141 data sets, including 71 incomplete data and 62 inconsistent, created from eight original numerical data sets. We used the Modified Learning from Examples Module version 2 (MLEM2) rule induction algorithm for data mining. Among eight types of data sets combined with three kinds of probabilistic approximations used in experiments, in 12 out of 24 combinations the error rate, computed as a result of ten-fold cross validation, was smaller for inconsistent data (two-tailed test, 5 % significance level). For one data set, combined with all three probabilistic approximations, the error rate was smaller for incomplete data. For remaining nine combinations the difference in performance was statistically insignificant. Thus, we may claim that there is some experimental evidence that incompleteness is generally worse than inconsistency for data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. Int. J. Approximate Reasoning 15(4), 319–331 (1996)

    Article  MATH  Google Scholar 

  2. Clark, P.G., Grzymala-Busse, J.W.: Experiments on probabilistic approximations. In: Proceedings of the 2011 IEEE International Conference on Granular Computing, pp. 144–149 (2011)

    Google Scholar 

  3. Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)

    MATH  Google Scholar 

  4. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Notes of the Workshop on Foundations and New Directions of Data Mining, in conjunction with the Third International Conference on Data Mining, pp. 56–63 (2003)

    Google Scholar 

  5. Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Trans. Rough Sets 1, 78–95 (2004)

    MATH  Google Scholar 

  6. Grzymala-Busse, J.W.: Generalized parameterized approximations. In: Proceedings of the 6-th International Conference on Rough Sets and Knowledge Technology, pp. 136–145 (2011)

    Google Scholar 

  7. Grzymala-Busse, J.W., Rzasa, W.: Definability and other properties of approximations for generalized indiscernibility relations. Trans. Rough Sets 11, 14–39 (2010)

    MATH  Google Scholar 

  8. Grzymala-Busse, J.W., Wang, A.Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In: Proceedings of the 5-th International Workshop on Rough Sets and Soft Computing in conjunction with the Third Joint Conference on Information Sciences, pp. 69–72 (1997)

    Google Scholar 

  9. Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11, 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  10. Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)

    MATH  Google Scholar 

  11. Pawlak, Z., Skowron, A.: Rough sets: some extensions. Inf. Sci. 177, 28–40 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  12. Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int. J. Man Mach. Stud. 29, 81–95 (1988)

    Article  MATH  Google Scholar 

  13. Stefanowski, J., Tsoukias, A.: Incomplete information tables and rough classification. Comput. Intell. 17(3), 545–566 (2001)

    Article  MATH  Google Scholar 

  14. Yao, Y.Y.: Probabilistic rough set approximations. Int. J. Approximate Reasoning 49, 255–271 (2008)

    Article  MATH  Google Scholar 

  15. Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximate concepts. Int. J. Man Mach. Stud. 37, 793–809 (1992)

    Article  Google Scholar 

  16. Ziarko, W.: Probabilistic approach to rough sets. Int. J. Approximate Reasoning 49, 272–284 (2008)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerzy W. Grzymala-Busse .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Clark, P.G., Gao, C., Grzymala-Busse, J.W. (2016). A Comparison of Mining Incomplete and Inconsistent Data. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46254-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46253-0

  • Online ISBN: 978-3-319-46254-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics