Skip to main content

Finding Dense Clusters in Hyperspace: An Approach Based on Row Shuffling

  • Conference paper
  • First Online:
Advances in Web-Age Information Management (WAIM 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2118))

Included in the following conference series:

Abstract

High dimensional data sets generally exhibit low density, since the number of possible cells exceeds the actual number of cells in the set. This characteristic has prompted researchers to automate the search for subspaces where the density is higher. In this paper we present an algorithm that takes advantage of categorical, unordered dimensions to increase the density of subspaces in the data set. It does this by shuffling rows in those dimensions, so the final ordering results in increased density of regions in hyperspace. We argue for the usage of this shuffling technique as a preprocessing step for other techniques that compress the hyperspace by means of statistical models, since denser regions usually result in better-fitting models. The experimental results support this argument. We also show how to integrate this algorithm with two grid clustering procedures in order to find these dense regions. The experimental results in both synthetic and real data sets show that row-shuffling can drastically increase the density of the subspaces, leading to better clusters.

This work has been supported by NSF grant IIS-9732113

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, J. Gerhrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, Washington, 1998.

    Google Scholar 

  2. D. Barbará, X. Wu. Using Approximations to Scale Exploratory Data Analysis in Datacubes. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, 1999.

    Google Scholar 

  3. D. Barbará, X. Wu. Finding Dense Clusters in Hyperspace: An Approach Based on Row Shuffling. Technical Report, George Mason university, ISE Dept, August 2000.

    Google Scholar 

  4. D. Barbará, X. Wu. Using Loglinear Models to Compress Datacubes. In Proceedings of the first International Conference on WebInformation Management, Shanghai, China, 2000.

    Google Scholar 

  5. P. Bradley, U. Fayyad, and C. Reina, Scaling Clustering Algorithms to Large Databases. In Proceedings of the 1998 ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, August 1998.

    Google Scholar 

  6. M. Ester, H. Kriegel, J. Sander, and X. Xu A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of 1996 ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, 1996.

    Google Scholar 

  7. A. Hinneburg, D.A. Keim Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In Proceedings of the 25rd VLDB Conference, Edinburgh, Scotland, 1999.

    Google Scholar 

  8. International Business Machines IBM Intelligent Miner User’s Guide, 1996

    Google Scholar 

  9. Piotr Indyk. Dimensionality Reduction Techniques for Proximity Problems.

    Google Scholar 

  10. R.T. Ng, J. Han Efficient and Effective Clustering Methods for Spatial Data Mining. In Proceedings of the 20th Very Large Data Bases Conference, 1994.

    Google Scholar 

  11. PKDD99 Discovery Challenge Download the Data. http://lisp.vse.cz/pkdd99/chall.htm

  12. W. Wang, J. Yang, R. Muntz STING: A Statistical Information Grid Approach to Spatial Data Ming. In Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.

    Google Scholar 

  13. T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, June 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barbará, D., Wu, X. (2001). Finding Dense Clusters in Hyperspace: An Approach Based on Row Shuffling. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-47714-4_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42298-3

  • Online ISBN: 978-3-540-47714-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics