A Practical Approach for Clustering Transaction Data

Bouguessa, Mohamed

doi:10.1007/978-3-642-23199-5_20

A Practical Approach for Clustering Transaction Data

Mohamed Bouguessa²⁰

Conference paper

2118 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Abstract

In this paper we consider the problem of clustering transaction data. Most of existing transactional clustering algorithms encounter difficulties in the presence of overlapping clusters with a large number outlier items that do not contribute to formation of clusters. Furthermore, the vast majority of existing approaches are dependent on multiple parameters which may be difficult to tune, especially in real-life applications. To these problems, we propose a parameter-free transactional clustering algorithm. Our algorithm first scans the data set in a sequential manner such that the destination of the next transaction is guided by a novel objective function. Once the first scan of the data set is completed, the algorithm performs a few other passes over the data set in order to refine the clustering. The proposed algorithm is able to automatically identify clusters in the presence of large number of outlier items in the data set without any parameters setting by the user. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real data sets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
MATH Google Scholar
Bouguessa, M., Wang, S.: Mining Projected Clusters in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 21(4), 507–522 (2009)
Article Google Scholar
Li, T.: A General Model for Clustering Binary Data. In: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 188–197 (2005)
Google Scholar
Wang, K., Xu, C., Liu, B.: Clustering Transactions Using Large Items. In: 8th ACM International Conference on Information and Knowledge Management, pp. 483–490 (1999)
Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Reading (2006)
Google Scholar
Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, pp. 121–130. Springer, Heidelberg (2001)
Chapter Google Scholar
Cesario, E., Manco, G., Ortale, R.: Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data. IEEE Transactions on Knowledge and Data Engineering vol 12(12), 1607–1623 (2007)
Article Google Scholar
Yan, H., Chen, K., Liu, L., Bae, J., Yi, Z.: Efficiently Clustering Transactional Data with Weighted Coverage Density. In: 15th ACM International Conference on Information and Knowledge Management, pp. 367–376 (2006)
Google Scholar
Yang, Y., Padmanabhan, B.: GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions. IEEE Transactions on Knowledge and Data Engineering 17(9), 1300–1304 (2005)
Article Google Scholar
Yang, Y., Guan, X., You, J.: CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 682–687 (2002)
Google Scholar
Giannotti, F., Gozzi, C., Manco, G.: Characterizing Web user accesses: a transactional approach to Web log clustering. In: IEEE International Conference on Information Technology: Coding and Computing, pp. 312–317 (2002)
Google Scholar
Yun, C.-H., Chuang, K.-T., Chen, M.-S.: Adherence clustering: an efficient method for mining market-basket clusters. Information systems 31(3), 170–186 (2004)
Article Google Scholar
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards Parameter-Free Data Mining. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2004)
Google Scholar
Gan, G., Wu, J.: Subspace clustering for high dimensional categorical data. ACM SIGKDD Explorations News letter 6(2), 87–94 (2004)
Article Google Scholar
Barbara, D., Li, Y., Couto, J.: COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management, pp. 582–589 (2002)
Google Scholar
Andritsos, P., Tsaparas, P., Miller, R., Sevcik, K.: LIMBO: Scalable Clustering of Categorical Data. In: Proceedings of the 9th International Conference on Extending Database Technology, pp. 123–146 (2004)
Google Scholar
Zaki, M., Peters, M., Assent, I., Seidl, T.: CLICKS: An Effective Algorithm for Mining Subspace Clusters in Categorical Datasets. Data and Knowledge Engineering 60(1), 51–70 (2007)
Article Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Systems 25(5), 345–366 (2001)
Article Google Scholar
Zhong, S., Ghosh, J.: A Comparative Study of Generative Models for Documeent Clustering. In: SIAM International Conference on Data Mining, Workshop on Clustering High Dimensional Data and its Application (2003)
Google Scholar
Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: 15th ACM SIGKDD International Conference on Knowledge Discovery and data mining, pp. 877–885 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Département d’informatique et d’ingénierie, Université du Québec en Outaouais, Gatineau, Quebec, J8X 3X7, Canada
Mohamed Bouguessa

Authors

Mohamed Bouguessa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intitute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bouguessa, M. (2011). A Practical Approach for Clustering Transaction Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-23199-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics