Skip to main content

Clustering Transactions with an Unbalanced Hierarchical Product Structure

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Included in the following conference series:

Abstract

The datasets extracted from large retail stores often contain sparse information composed of a huge number of items and transactions, with each transaction only containing a few items. These data render basket analysis with extremely low item support, customer clustering with large intra cluster distance and transaction classifications having huge classification trees. Although a similarity measure represented by counting the depth of the least common ancestor normalized by the depth of the concept tree lifts the limitation of binary equality, it produces counter intuitive results when the concept hierarchy is unbalanced since two items in deeper subtrees are very likely to have a higher similarity than two items in shallower subtrees. The research proposes to calculate the distance between two items by counting the edge traversal needed to link them in order to solve the issues. The method is straight forward yet achieves better performance with retail store data when concept hierarchy is unbalanced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R.S.: Fast Algorithms for Mining Association Rules. In: Proc. of the 20th Int’l Conference on Very Large Databases, Santiago, Chile (1994)

    Google Scholar 

  2. Ball, G.H., Hall, D.J.: ISODATA: a novel technique for data analysis and pattern classification. Standford Res. Inst., Menlo Park, CA (1965)

    Google Scholar 

  3. Golsberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–70 (1992)

    Article  Google Scholar 

  4. Herlocker, et al.: An Algorithmic Framework for Performing Collaborative Filtering. In: Proceedings of the 1999 Conference on Research and Development in Information Retrieval (1999)

    Google Scholar 

  5. Han, J., Fu, Y.: Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering 11(5), 798–805 (1999)

    Article  Google Scholar 

  6. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  7. Han, J., Cai, Y., Cercone, N.: Knowledge Discovery in Databases: An Attribute-Oriented Approach. In: VLDB, pp. 547–559 (1992)

    Google Scholar 

  8. McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  9. Kantardzic, M.: Data Mining: concepts, models, methods, and algorithms. John Wiley, Chichester (2002)

    Google Scholar 

  10. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson International Edition (2005)

    Google Scholar 

  11. Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29, 293–313 (2004)

    Article  Google Scholar 

  12. Ganesan, P., Garcia-Molina, H., Widom, J.: Exploiting hierarchical domain structure to compute similarity. ACM Transactions on Information Systems 21(1) (2003)

    Google Scholar 

  13. Chen, S., Han, J., Yu, P.S.: Data Mining: An Overview from a Database Perspective. IEEE Transactions on Knowledge and Data Engineering 8(6), 866–883 (1996)

    Article  Google Scholar 

  14. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  15. Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proceedings of VLDB 1995, pp. 407–419 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, M., Hsu, P., Lin, K.C., Chen, S. (2007). Clustering Transactions with an Unbalanced Hierarchical Product Structure. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74553-2_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74552-5

  • Online ISBN: 978-3-540-74553-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics