Clustering Transactions with an Unbalanced Hierarchical Product Structure

Wang, MinTzu; Hsu, PingYu; Lin, K. C.; Chen, ShiuannShuoh

doi:10.1007/978-3-540-74553-2_23

MinTzu Wang¹,
PingYu Hsu²,
K. C. Lin³ &
…
ShiuannShuoh Chen²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1247 Accesses
4 Citations

Abstract

The datasets extracted from large retail stores often contain sparse information composed of a huge number of items and transactions, with each transaction only containing a few items. These data render basket analysis with extremely low item support, customer clustering with large intra cluster distance and transaction classifications having huge classification trees. Although a similarity measure represented by counting the depth of the least common ancestor normalized by the depth of the concept tree lifts the limitation of binary equality, it produces counter intuitive results when the concept hierarchy is unbalanced since two items in deeper subtrees are very likely to have a higher similarity than two items in shallower subtrees. The research proposes to calculate the distance between two items by counting the edge traversal needed to link them in order to solve the issues. The method is straight forward yet achieves better performance with retail store data when concept hierarchy is unbalanced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R.S.: Fast Algorithms for Mining Association Rules. In: Proc. of the 20th Int’l Conference on Very Large Databases, Santiago, Chile (1994)
Google Scholar
Ball, G.H., Hall, D.J.: ISODATA: a novel technique for data analysis and pattern classification. Standford Res. Inst., Menlo Park, CA (1965)
Google Scholar
Golsberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–70 (1992)
Article Google Scholar
Herlocker, et al.: An Algorithmic Framework for Performing Collaborative Filtering. In: Proceedings of the 1999 Conference on Research and Development in Information Retrieval (1999)
Google Scholar
Han, J., Fu, Y.: Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering 11(5), 798–805 (1999)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Han, J., Cai, Y., Cercone, N.: Knowledge Discovery in Databases: An Attribute-Oriented Approach. In: VLDB, pp. 547–559 (1992)
Google Scholar
McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Kantardzic, M.: Data Mining: concepts, models, methods, and algorithms. John Wiley, Chichester (2002)
Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson International Edition (2005)
Google Scholar
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29, 293–313 (2004)
Article Google Scholar
Ganesan, P., Garcia-Molina, H., Widom, J.: Exploiting hierarchical domain structure to compute similarity. ACM Transactions on Information Systems 21(1) (2003)
Google Scholar
Chen, S., Han, J., Yu, P.S.: Data Mining: An Overview from a Database Perspective. IEEE Transactions on Knowledge and Data Engineering 8(6), 866–883 (1996)
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proceedings of VLDB 1995, pp. 407–419 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Business Administration, National Central University, Jhongli City, Taoyuan County 32001, Taiwan R.O.C., Department of Information Management, Technology and Science Institute of Northern Taiwan, Taipei 112, Taiwan, R.O.C.
MinTzu Wang
Department of Business Administration, National Central University, Jhongli City, Taoyuan County 32001, Taiwan, R.O.C.
PingYu Hsu & ShiuannShuoh Chen
Department of Management Information Systems, National Chung Hsing University, Taichung 402, Taiwan, R.O.C.
K. C. Lin

Authors

MinTzu Wang
View author publications
You can also search for this author in PubMed Google Scholar
PingYu Hsu
View author publications
You can also search for this author in PubMed Google Scholar
K. C. Lin
View author publications
You can also search for this author in PubMed Google Scholar
ShiuannShuoh Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, M., Hsu, P., Lin, K.C., Chen, S. (2007). Clustering Transactions with an Unbalanced Hierarchical Product Structure. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-540-74553-2_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics