Skip to main content

A Practical Approach for Clustering Transaction Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Abstract

In this paper we consider the problem of clustering transaction data. Most of existing transactional clustering algorithms encounter difficulties in the presence of overlapping clusters with a large number outlier items that do not contribute to formation of clusters. Furthermore, the vast majority of existing approaches are dependent on multiple parameters which may be difficult to tune, especially in real-life applications. To these problems, we propose a parameter-free transactional clustering algorithm. Our algorithm first scans the data set in a sequential manner such that the destination of the next transaction is guided by a novel objective function. Once the first scan of the data set is completed, the algorithm performs a few other passes over the data set in order to refine the clustering. The proposed algorithm is able to automatically identify clusters in the presence of large number of outlier items in the data set without any parameters setting by the user. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    MATH  Google Scholar 

  2. Bouguessa, M., Wang, S.: Mining Projected Clusters in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 21(4), 507–522 (2009)

    Article  Google Scholar 

  3. Li, T.: A General Model for Clustering Binary Data. In: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 188–197 (2005)

    Google Scholar 

  4. Wang, K., Xu, C., Liu, B.: Clustering Transactions Using Large Items. In: 8th ACM International Conference on Information and Knowledge Management, pp. 483–490 (1999)

    Google Scholar 

  5. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Reading (2006)

    Google Scholar 

  6. Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, pp. 121–130. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  7. Cesario, E., Manco, G., Ortale, R.: Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data. IEEE Transactions on Knowledge and Data Engineering vol 12(12), 1607–1623 (2007)

    Article  Google Scholar 

  8. Yan, H., Chen, K., Liu, L., Bae, J., Yi, Z.: Efficiently Clustering Transactional Data with Weighted Coverage Density. In: 15th ACM International Conference on Information and Knowledge Management, pp. 367–376 (2006)

    Google Scholar 

  9. Yang, Y., Padmanabhan, B.: GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions. IEEE Transactions on Knowledge and Data Engineering 17(9), 1300–1304 (2005)

    Article  Google Scholar 

  10. Yang, Y., Guan, X., You, J.: CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 682–687 (2002)

    Google Scholar 

  11. Giannotti, F., Gozzi, C., Manco, G.: Characterizing Web user accesses: a transactional approach to Web log clustering. In: IEEE International Conference on Information Technology: Coding and Computing, pp. 312–317 (2002)

    Google Scholar 

  12. Yun, C.-H., Chuang, K.-T., Chen, M.-S.: Adherence clustering: an efficient method for mining market-basket clusters. Information systems 31(3), 170–186 (2004)

    Article  Google Scholar 

  13. Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards Parameter-Free Data Mining. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2004)

    Google Scholar 

  14. Gan, G., Wu, J.: Subspace clustering for high dimensional categorical data. ACM SIGKDD Explorations News letter 6(2), 87–94 (2004)

    Article  Google Scholar 

  15. Barbara, D., Li, Y., Couto, J.: COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management, pp. 582–589 (2002)

    Google Scholar 

  16. Andritsos, P., Tsaparas, P., Miller, R., Sevcik, K.: LIMBO: Scalable Clustering of Categorical Data. In: Proceedings of the 9th International Conference on Extending Database Technology, pp. 123–146 (2004)

    Google Scholar 

  17. Zaki, M., Peters, M., Assent, I., Seidl, T.: CLICKS: An Effective Algorithm for Mining Subspace Clusters in Categorical Datasets. Data and Knowledge Engineering 60(1), 51–70 (2007)

    Article  Google Scholar 

  18. Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Systems 25(5), 345–366 (2001)

    Article  Google Scholar 

  19. Zhong, S., Ghosh, J.: A Comparative Study of Generative Models for Documeent Clustering. In: SIAM International Conference on Data Mining, Workshop on Clustering High Dimensional Data and its Application (2003)

    Google Scholar 

  20. Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: 15th ACM SIGKDD International Conference on Knowledge Discovery and data mining, pp. 877–885 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bouguessa, M. (2011). A Practical Approach for Clustering Transaction Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23199-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23198-8

  • Online ISBN: 978-3-642-23199-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics