Skip to main content

Building on the Arules Infrastructure for Analyzing Transaction Data with R

  • Conference paper
Advances in Data Analysis

Abstract

The free and extensible statistical computing environment R with its enormous number of extension packages already provides many state-of-the-art techniques for data analysis. Support for association rule mining, a popular exploratory method which can be used, among other purposes, for uncovering cross-selling opportunities in market baskets, has become available recently with the R extension package arules. After a brief introduction to transaction data and association rules, we present the formal framework implemented in arules and demonstrate how clustering and association rule mining can be applied together using a market basket data set from a typical retailer. This paper shows that implementing a basic infrastructure with formal classes in R provides an extensible basis which can very efficiently be employed for developing new applications (such as clustering transactions) in addition to association rule mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AGGARWAL, C.C., PROCOPIUC, C.M. and YU, P.S. (2002): Finding Localized Associations in Market Basket Data. Knowledge and Data Engineering, 14, 1, 51–62.

    Article  Google Scholar 

  • AGRAWAL, R., IMIELINSKI, T. and SWAMI, A. (1993): Mining Association Rules Between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM Press, 207–216.

    Google Scholar 

  • BATES, D. and MAECHLER, M. (2005): Matrix: A Matrix Package for R. R package version 0.95–5.

    Google Scholar 

  • BERRY, M. and LINOFF, G. (1997): Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley & Sons.

    Google Scholar 

  • BORGELT, C. (2003): Efficient Implementations of Apriori and Eclat. In: FIMI’03: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.

    Google Scholar 

  • CHAMBERS, J.M. (1998): Programming with Data. Springer, New York.

    Book  MATH  Google Scholar 

  • FOWLER, M. (2004): UML Distilled: A Brief Guide to the Standard Object Modeling Language. Addison-Wesley Professional, third edition.

    Google Scholar 

  • GUPTA, G.K., STREHL, A. and GHOSH, J. (1999): Distance Based Clustering of Association Rules. In: Proceedings of the Artificial Neural Networks in Engineering Conference, 1999, St. Louis. ASME, 9, 759–764.

    Google Scholar 

  • HAHSLER, M., GRÜN, B. and HORNIK, K. (2005): arules — A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14,15, 1–25.

    Article  Google Scholar 

  • HAHSLER, M., GRÜN, B. and HORNIK, K. (2006): arules: Mining Association Rules and Frequent Itemsets. R package version 0.2–7.

    Google Scholar 

  • HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer, Berlin.

    Book  MATH  Google Scholar 

  • HORNIK, K. (2005): A CLUE for CLUster Ensembles. Journal of Statistical Software, 14(12).

    Google Scholar 

  • HORNIK, K. (2006): CLUE: CLUster Ensembles. R package version 0.3–3.

    Google Scholar 

  • KAUFMAN, L. and ROUSSEEUW, P. (1990): Finding Groups in Data. Wiley-Interscience Publication.

    Google Scholar 

  • MAECHLER, M. (2005): cluster: Cluster Analysis Extended Rousseeuw et al. R package version 1.10.2.

    Google Scholar 

  • PIATETSKY-SHAPIRO, G. (1991): Discovery, Analysis, and Presentation of Strong Rules. In: G. Piatetsky-Shapiro and W. J. Frawley (Eds.): Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA.

    Google Scholar 

  • R DEVELOPMENT CORE TEAM (2005): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

    Google Scholar 

  • RUSSELL, G.J., BELL, D., BODAPATI, A., BROWN, C.L., JOENGWEN, C., GAETH, G., GUPTA, S. and MANCHANDA, P. (1997): Perspectives on Multiple Category Choice. Marketing Letters, 8,3, 297–305.

    Article  Google Scholar 

  • SNEATH, P.H. (1957): Some Thoughts on Bacterial Classification. Journal of General Microbiology, 17, 184–200.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hahsler, M., Hornik, K. (2007). Building on the Arules Infrastructure for Analyzing Transaction Data with R. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_51

Download citation

Publish with us

Policies and ethics