Skip to main content

Subspace MOA: Subspace Stream Clustering Evaluation Using the MOA Framework

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7826))

Abstract

Most available static data are becoming more and more high-dimensional. Therefore, subspace clustering, which aims at finding clusters not only within the full dimension but also within subgroups of dimensions, has gained a significant importance. Recently, OpenSubspace framework was proposed to evaluate and explorate subspace clustering algorithms in WEKA with a rich body of most state of the art subspace clustering algorithms and measures. Parallel to it, MOA (Massive Online Analysis) framework was developed also above WEKA to provide algorithms and evaluation methods for mining tasks on evolving data streams over the full space only.

Similar to static data, most streaming data sources are becoming high-dimensional, and tracking their evolving clusters is also becoming important and challenging. In this demonstrator, we present, to the best of our knowledge, the first subspace clustering evaluation framework over data streams called Subspace MOA. Our demonstrator follows the online-offline model which is used in most data stream clustering algorithms. In the online phase, users have the possibility to select one of three most famous summarization techniques to form the microclusters. In the offline phase, one of five subspace clustering algorithms can be selected. The framework is supported with a subspace stream generator, a visualization interface to present the evolving clusters over different subspaces, and various subspace clustering evaluation measures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proc. of the 29th Int. Conf. on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92 (2003)

    Google Scholar 

  2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proc. of the 30th Int. Conf. on Very Large Data Bases, VLDB 2004, vol. 30, pp. 852–863 (2004)

    Google Scholar 

  3. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is ”nearest neighbor” meaningful? In: Int. Conf. on Database Theory, pp. 217–235 (1999)

    Google Scholar 

  4. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339 (2006)

    Google Scholar 

  5. Hassani, M., Spaus, P., Gaber, M.M., Seidl, T.: Density-based projected clustering of data streams. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds.) SUM 2012. LNCS, vol. 7520, pp. 311–324. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Hulten, G., Domingos, P.: VFML – a toolkit for mining high-speed time-changing data streams (2003)

    Google Scholar 

  7. Kranen, P., Kremer, H., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B., Read, J.: Stream data mining using the moa framework. In: Lee, S.-g., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part II. LNCS, vol. 7239, pp. 309–313. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Müller, E., Assent, I., Günnemann, S., Jansen, T., Seidl, T.: Opensubspace: An open source framework for evaluation and exploration of subspace clustering algorithms in weka. In: Open Source in Data Mining Workshop at PAKDD, pp. 2–13 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hassani, M., Kim, Y., Seidl, T. (2013). Subspace MOA: Subspace Stream Clustering Evaluation Using the MOA Framework. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37450-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37450-0_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37449-4

  • Online ISBN: 978-3-642-37450-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics