Abstract
The free and extensible statistical computing environment R with its enormous number of extension packages already provides many state-of-the-art techniques for data analysis. Support for association rule mining, a popular exploratory method which can be used, among other purposes, for uncovering cross-selling opportunities in market baskets, has become available recently with the R extension package arules. After a brief introduction to transaction data and association rules, we present the formal framework implemented in arules and demonstrate how clustering and association rule mining can be applied together using a market basket data set from a typical retailer. This paper shows that implementing a basic infrastructure with formal classes in R provides an extensible basis which can very efficiently be employed for developing new applications (such as clustering transactions) in addition to association rule mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AGGARWAL, C.C., PROCOPIUC, C.M. and YU, P.S. (2002): Finding Localized Associations in Market Basket Data. Knowledge and Data Engineering, 14, 1, 51–62.
AGRAWAL, R., IMIELINSKI, T. and SWAMI, A. (1993): Mining Association Rules Between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM Press, 207–216.
BATES, D. and MAECHLER, M. (2005): Matrix: A Matrix Package for R. R package version 0.95–5.
BERRY, M. and LINOFF, G. (1997): Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley & Sons.
BORGELT, C. (2003): Efficient Implementations of Apriori and Eclat. In: FIMI’03: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
CHAMBERS, J.M. (1998): Programming with Data. Springer, New York.
FOWLER, M. (2004): UML Distilled: A Brief Guide to the Standard Object Modeling Language. Addison-Wesley Professional, third edition.
GUPTA, G.K., STREHL, A. and GHOSH, J. (1999): Distance Based Clustering of Association Rules. In: Proceedings of the Artificial Neural Networks in Engineering Conference, 1999, St. Louis. ASME, 9, 759–764.
HAHSLER, M., GRÜN, B. and HORNIK, K. (2005): arules — A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14,15, 1–25.
HAHSLER, M., GRÜN, B. and HORNIK, K. (2006): arules: Mining Association Rules and Frequent Itemsets. R package version 0.2–7.
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer, Berlin.
HORNIK, K. (2005): A CLUE for CLUster Ensembles. Journal of Statistical Software, 14(12).
HORNIK, K. (2006): CLUE: CLUster Ensembles. R package version 0.3–3.
KAUFMAN, L. and ROUSSEEUW, P. (1990): Finding Groups in Data. Wiley-Interscience Publication.
MAECHLER, M. (2005): cluster: Cluster Analysis Extended Rousseeuw et al. R package version 1.10.2.
PIATETSKY-SHAPIRO, G. (1991): Discovery, Analysis, and Presentation of Strong Rules. In: G. Piatetsky-Shapiro and W. J. Frawley (Eds.): Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA.
R DEVELOPMENT CORE TEAM (2005): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
RUSSELL, G.J., BELL, D., BODAPATI, A., BROWN, C.L., JOENGWEN, C., GAETH, G., GUPTA, S. and MANCHANDA, P. (1997): Perspectives on Multiple Category Choice. Marketing Letters, 8,3, 297–305.
SNEATH, P.H. (1957): Some Thoughts on Bacterial Classification. Journal of General Microbiology, 17, 184–200.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hahsler, M., Hornik, K. (2007). Building on the Arules Infrastructure for Analyzing Transaction Data with R. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-70981-7_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)