Implications of Probabilistic Data Modeling for Mining Association Rules

Hahsler, Michael; Hornik, Kurt; Reutterer, Thomas

doi:10.1007/3-540-31314-1_73

Michael Hahsler²²,
Kurt Hornik²³ &
Thomas Reutterer²⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2253 Accesses
21 Citations

Abstract

Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine association rules are discussed in great detail. We present a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world grocery database to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left-hand-side of rules and that lift performs poorly to filter random noise in transaction data. The probabilistic data modeling approach presented in this paper not only is a valuable framework to analyze interest measures but also provides a starting point for further research to develop new interest measures which are based on statistical tests and geared towards the specific properties of transaction data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AGGARWAL, C.C., and YU, P.S. (1998): A new framework for itemset generation. PODS 98, Symposium on Principles of Database Systems. Seattle, WA, USA, 18–24.
Google Scholar
AGRAWAL, R., IMIELINSKI, T., and SWAMI, A. (1993): Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD International Conference on Management of Data. Washington D.C., 207–216.
Google Scholar
BAYARDO, R.J., JR. and AGRAWAL, R. (1999): Mining the most interesting rules. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery in Databases & Data Mining (KDD99), 145–154.
Google Scholar
BRIJS, T., SWINNEN, G., VANHOOF, K., and WETS, G. (2004): Building an association rules framework to improve product assortment decisions. Data Mining and Knowledge Discovery, 8(1):7–23.
Article MathSciNet Google Scholar
BRIN, S., MOTWANI, R., ULLMAN, J.D., and TSUR, S. (1997): Dynamic itemset counting and implication rules for market basket data. SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data. Tucson, Arizona, USA, 255–264.
Google Scholar
DUMOUCHEL, W., and PREGIBON, D. (2001): Empirical Bayes screening for multi-item associations. In: F. Provost and R. Srikant (Eds.): Proceedings of the ACM SIGKDD Intentional Conference on Knowledge Discovery in Databases & Data Mining (KDD01), 67–76. ACM Press
Google Scholar
GOETHALS, B., and ZAKI, M.J. (2004): Advances in frequent itemset mining implementations: Report on FIMI’03. SIGKDD Explorations, 6(1):109–117.
Google Scholar
HAHSLER, M., HORNIK, K., and REUTTERER, T. (2005): Implications of probabilistic data modeling for rule mining. Report 14, Research Report Series, Department of Statistics and Mathematics, Wirschaftsuniversität Wien, Augasse 2–6, 1090 Wien, Austria.
Google Scholar
HIPP, J., GÜNTZER, U., and NAKHAEIZADEH, G. (2000): Algorithms for association rule mining — A general survey and comparison. SIGKDD Explorations, 2(2):1–58.
Google Scholar
HRUSCHKA, H., LUKANOWICZ, M., and BUCHTA, C. (1999): Cross-category sales promotion effects. Journal of Retailing and Consumer Services, 6(2):99–105.
Article Google Scholar
LAWRENCE, R.D., ALMASI, G.S., KOTLYAR, V., VIVEROS, M.S., and DURI, S. (2001): Personalization of supermarket product recommendations. Data Mining and Knowledge Discovery, 5(1/2):11–32.
Article Google Scholar
LIN, W., ALVAREZ, S.A., and RUIZ, C. (2002): Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6(1):83–105.
Article MathSciNet Google Scholar
VAN DEN POEL, D., DE SCHAMPHELAERE, J., and WETS, G. (2004): Direct and indirect effects of retail promotions on sales and profits in the do-it-yourself market. Expert Systems with Applications, 27(1):53–62.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems and Operations, Wirtschaftsuniversität Wien, A-1090, Wien, Austria
Michael Hahsler
Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, A-1090, Wien, Austria
Kurt Hornik
Department of Retailing and Marketing, Wirtschaftsuniversität Wien, A-1090, Wien, Austria
Thomas Reutterer

Authors

Michael Hahsler
View author publications
You can also search for this author in PubMed Google Scholar
Kurt Hornik
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Reutterer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Technische und Betriebliche Informationssysteme, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Myra Spiliopoulou
Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse , Christian Borgelt & Andreas Nürnberger , &
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hahsler, M., Hornik, K., Reutterer, T. (2006). Implications of Probabilistic Data Modeling for Mining Association Rules. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_73

Download citation

DOI: https://doi.org/10.1007/3-540-31314-1_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics