Skip to main content

SubClass: Classification of Multidimensional Noisy Data Using Subspace Clusters

  • Conference paper
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Abstract

Classification has been widely studied and successfully employed in various application domains. In multidimensional noisy settings, however, classification accuracy may be unsatisfactory. Locally irrelevant attributes often occlude class-relevant information. A global reduction to relevant attributes is often infeasible, as relevance of attributes is not necessarily a globally uniform property. In a current project with an airport scheduling software company, locally varying attributes in the data indicate whether flights will be on time, delayed or ahead of schedule. To detect locally relevant information, we propose combining classification with subspace clustering (SubClass). Subspace clustering aims at detecting clusters in arbitrary subspaces of the attributes. It has proved to work well in multidimensional and noisy domains. However, it does not utilize class label information and thus does not necessarily provide appropriate groupings for classification. We propose incorporating class label information into subspace search. As a result we obtain locally relevant attribute combinations for classification. We present the SubClass classifier that successfully exploits classifying subspace cluster information. Experiments on both synthetic and real world datasets demonstrate that classification accuracy is clearly improved for noisy multidimensional settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of International Conference on Very Large Databases (VLDB), pp. 487–499 (1994)

    Google Scholar 

  3. Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)

    Google Scholar 

  4. Assent, I., Krieger, R., Glavic, B., Seidl, T.: Spatial multidimensional sequence clustering. In: Proceedings of International Workshop on Spatial and Spatio-Temporal Data Mining (SSTDM), conjunction with IEEE International Conference on Data Mining (ICDM) (2006)

    Google Scholar 

  5. Assent, I., Krieger, R., Müller, E., Seidl, T.: DUSC: Dimensionality unbiased subspace clustering. In: Proceedings of IEEE International Conference on Data Mining (ICDM) (2007)

    Google Scholar 

  6. Bolat, A.: Procedures for providing robust gate assignments for arriving aircrafts. European Journal of Operational Research 120, 63–80 (2000)

    Article  MATH  Google Scholar 

  7. Bureau of Transportation Statistics. Airline on-time performance data, http://www.transtats.bts.gov

  8. Cheng, C., Fu, A., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 84–93 (1999)

    Google Scholar 

  9. Domeniconi, C., Peng, J., Gunopulos, D.: Locally adaptive metric nearest-neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(9), 1281–1285 (2002)

    Article  Google Scholar 

  10. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, Chichester (2000)

    Google Scholar 

  11. Eurocontrol Central Office for Delay Analysis. Delays to air transport in europe, http://www.eurocontrol.int/eCoda

  12. Gray, R.: Entropy and Information Theory. Springer, Heidelberg (1990)

    MATH  Google Scholar 

  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  14. Hettich, S., Bay, S.: The uci kdd archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu

    Google Scholar 

  15. Kailing, K., Kriegel, H.-P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 246–257 (2004)

    Google Scholar 

  16. Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multipleclass-association rules. In: Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 369–376 (2001)

    Google Scholar 

  17. Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, Burges, Smola (eds.) Advances in Kernel Methods, MIT Press, Cambridge (1998)

    Google Scholar 

  18. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1992)

    Google Scholar 

  19. Shannon, C., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana, Illinois (1949)

    MATH  Google Scholar 

  20. Silva, L., de Sa, J.M., Alexandre, L.: Neural network classification using shannon’s entropy. In: Proceedings of European Symposium on Artificial Neural Networks (ESANN) (2005)

    Google Scholar 

  21. Washio, T., Nakanishi, K., Motoda, H.: Deriving Class Association Rules Based on Levelwise Subspace Clustering. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 692–700. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  22. Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Assent, I., Krieger, R., Welter, P., Herbers, J., Seidl, T. (2008). SubClass: Classification of Multidimensional Noisy Data Using Subspace Clusters. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68125-0_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68124-3

  • Online ISBN: 978-3-540-68125-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics