Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data

Yun, Ching-Huang; Chuang, Kun-Ta; Chen, Ming-Syan

doi:10.1007/3-540-46145-0_5

Ching-Huang Yun⁷,
Kun-Ta Chuang⁸ &
Ming-Syan Chen⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1255 Accesses
2 Citations

Abstract

In this paper, we devise an efficient algorithm for clustering market-basket data items. Market-basket data analysis has been well addressed in mining association rules for discovering the set of large items which are the frequently purchased items among all transactions. In essence, clustering is meant to divide a set of data items into some proper groups in such a way that items in the same group are as similar to one another as possible. In view of the nature of clustering market basket data, we present a measurement, called the small-large (SL) ratio, which is in essence the ratio of the number of small items to that of large items. Clearly, the smaller the SL ratio of a cluster, the more similar to one another the items in the cluster are. Then, by utilizing a self-tuning technique for adaptively tuning the input and output SL ratio thresholds, we develop an efficient clustering algorithm, algorithm STC (standing for Self-Tuning Clustering), for clustering market-basket data. The objective of algorithm STC is “Given a database of transactions, determine a clustering such that the average SL ratio is minimized.” We conduct several experiments on the real data and the synthetic workload for performance studies. It is shown by our experimental results that by utilizing the self-tuning technique to adaptively minimize the input and output SL ratio thresholds, algorithm STC performs very well. Specifically, algorithm STC not only incurs an execution time that is significantly smaller than that by prior works but also leads to the clustering results of very good quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, pages 478–499, September 1994.
Google Scholar
M.-S. Chen, J. Han, and P. S. Yu. Data Mining: An Overview from a Database Perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866–833, 1996.
Article Google Scholar
S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD International Conference on Management of Data, 27(2):73–84, June 1998.
Google Scholar
S. Guha, R. Rastogi, and K. Shim. ROCK: A Robust Clustering Algorithm for Categorical Attributes. Proceedings of the 15th International Conference on Data Engineering, 1999.
Google Scholar
A. K Jain, M. N. Murty, and P. J. Flynn. Data Clustering: A Review. ACM Computer Surveys, 31(3), Sept. 1999.
Google Scholar
K. Wang, C. Xu, and B. Liu. Clustering Transactions Using Large Items. Proceedings of ACM CIKM International Conference on Information and Knowledge Management, 1999.
Google Scholar
Y. Xiao and M. H. Dunham. Interactive Clustering for Transaction Data. Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2001), Sept. 2001.
Google Scholar
C.-H. Yun, K.-T. Chuang, and M.-S. Chen. An Efficient Clustering Algorithm for Market Basket Data Based on Small-Large Ratios. Proceedings of the 25th International Computer Software and Applications Conference (COMPSAC 2001), October 2001.
Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD International Conference on Management of Data, 25(2):103–114, June 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC
Ching-Huang Yun & Ming-Syan Chen
Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan, ROC
Kun-Ta Chuang

Authors

Ching-Huang Yun
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Ta Chuang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Syan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, 606-8501, Kyoto, Japan
Yahiko Kambayashi
Institute for Computer Science and Business Informatics, University of Vienna, Liebiggasse 4, 1010, Vienna, Austria
Werner Winiwarter
Center for Spatial Information Science (CSIS), University of Tokyo, 4-6-1, Komaba, Meguro-ku, 153-8904, Tokyo, Japan
Masatoshi Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yun, CH., Chuang, KT., Chen, MS. (2002). Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_5

Download citation

DOI: https://doi.org/10.1007/3-540-46145-0_5
Published: 02 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics