Building on the Arules Infrastructure for Analyzing Transaction Data with R

Hahsler, Michael; Hornik, Kurt

doi:10.1007/978-3-540-70981-7_51

Michael Hahsler³ &
Kurt Hornik⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3801 Accesses
5 Citations

Abstract

The free and extensible statistical computing environment R with its enormous number of extension packages already provides many state-of-the-art techniques for data analysis. Support for association rule mining, a popular exploratory method which can be used, among other purposes, for uncovering cross-selling opportunities in market baskets, has become available recently with the R extension package arules. After a brief introduction to transaction data and association rules, we present the formal framework implemented in arules and demonstrate how clustering and association rule mining can be applied together using a market basket data set from a typical retailer. This paper shows that implementing a basic infrastructure with formal classes in R provides an extensible basis which can very efficiently be employed for developing new applications (such as clustering transactions) in addition to association rule mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AGGARWAL, C.C., PROCOPIUC, C.M. and YU, P.S. (2002): Finding Localized Associations in Market Basket Data. Knowledge and Data Engineering, 14, 1, 51–62.
Article Google Scholar
AGRAWAL, R., IMIELINSKI, T. and SWAMI, A. (1993): Mining Association Rules Between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM Press, 207–216.
Google Scholar
BATES, D. and MAECHLER, M. (2005): Matrix: A Matrix Package for R. R package version 0.95–5.
Google Scholar
BERRY, M. and LINOFF, G. (1997): Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley & Sons.
Google Scholar
BORGELT, C. (2003): Efficient Implementations of Apriori and Eclat. In: FIMI’03: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
Google Scholar
CHAMBERS, J.M. (1998): Programming with Data. Springer, New York.
Book MATH Google Scholar
FOWLER, M. (2004): UML Distilled: A Brief Guide to the Standard Object Modeling Language. Addison-Wesley Professional, third edition.
Google Scholar
GUPTA, G.K., STREHL, A. and GHOSH, J. (1999): Distance Based Clustering of Association Rules. In: Proceedings of the Artificial Neural Networks in Engineering Conference, 1999, St. Louis. ASME, 9, 759–764.
Google Scholar
HAHSLER, M., GRÜN, B. and HORNIK, K. (2005): arules — A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14,15, 1–25.
Article Google Scholar
HAHSLER, M., GRÜN, B. and HORNIK, K. (2006): arules: Mining Association Rules and Frequent Itemsets. R package version 0.2–7.
Google Scholar
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer, Berlin.
Book MATH Google Scholar
HORNIK, K. (2005): A CLUE for CLUster Ensembles. Journal of Statistical Software, 14(12).
Google Scholar
HORNIK, K. (2006): CLUE: CLUster Ensembles. R package version 0.3–3.
Google Scholar
KAUFMAN, L. and ROUSSEEUW, P. (1990): Finding Groups in Data. Wiley-Interscience Publication.
Google Scholar
MAECHLER, M. (2005): cluster: Cluster Analysis Extended Rousseeuw et al. R package version 1.10.2.
Google Scholar
PIATETSKY-SHAPIRO, G. (1991): Discovery, Analysis, and Presentation of Strong Rules. In: G. Piatetsky-Shapiro and W. J. Frawley (Eds.): Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA.
Google Scholar
R DEVELOPMENT CORE TEAM (2005): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
Google Scholar
RUSSELL, G.J., BELL, D., BODAPATI, A., BROWN, C.L., JOENGWEN, C., GAETH, G., GUPTA, S. and MANCHANDA, P. (1997): Perspectives on Multiple Category Choice. Marketing Letters, 8,3, 297–305.
Article Google Scholar
SNEATH, P.H. (1957): Some Thoughts on Bacterial Classification. Journal of General Microbiology, 17, 184–200.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems and Operations, Wirtschaftsuniversität, A-1090, Wien, Austria
Michael Hahsler
Department of Statistics and Mathematics, Wirtschaftsuniversität, A-1090, Wien, Austria
Kurt Hornik

Authors

Michael Hahsler
View author publications
You can also search for this author in PubMed Google Scholar
Kurt Hornik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33501, Bielefeld, Germany
Reinhold Decker
Department of Economics, Freie Universität Berlin, Garystraße 21, 14195, Berlin, Germany
Hans -J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hahsler, M., Hornik, K. (2007). Building on the Arules Infrastructure for Analyzing Transaction Data with R. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-540-70981-7_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics