Abstract
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying subsidies to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder whether a so-called “interesting” rule noted LHS -> RHS is meaningful when 30 % of LHS data are not up-to-date anymore, 20% of RHS data are not accurate, and 15% of LHS data come from a data source that is well-known for its bad credibility. In this paper we propose to integrate data quality measures for effective and quality-aware association rule mining and we propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show for different variations of data quality indicators the corresponding cost and quality of discovered association rules that can be legitimately (or not) selected.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Batini, C., Catarci, T., Scannapiceco, M.: A Survey of Data Quality Issues in Cooperative Information Systems. In: Tutorial, Intl. Conf. on Conceptual Modeling, ER (2004)
Berti-Equille, L., Moussouni, F.: Quality-Aware Integration and Warehousing of Genomic Data. In: Proc. of the Intl. Conf. on Information Quality. MIT, Cambridge (2005)
Dasu, T., Johnson, T.: Hunting of the Snark: Finding Data Glitches with Data Mining Methods. In: Proc. of the Intl. Conf. on Information Quality. MIT, Cambridge (1999)
Dasu, T., Johnson, T.: Exploratory Data Mining and Data Cleaning. Wiley, Chichester (2003)
Hipp, J., Guntzer, U., Grimmer, U.: Data Quality Mining - Making a Virtue of Necessity. In: Proc. of the Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD2001), Santa Barbara, CA, U.S.A, May 20th (2001)
Jeusfeld, M.A., Quix, C., Jarke, M.: Design and Analysis of Quality Information for Data Warehouses. In: Ling, T.-W., Ram, S., Li Lee, M. (eds.) ER 1998. LNCS, vol. 1507, pp. 349–362. Springer, Heidelberg (1998)
Lavrač, N., Flach, P.A., Zupan, B.: Rule Evaluation Measures: A Unifying View, ILP, pp. 174–185 (1999)
Lübbers, D., Grimmer, U., Jarke, M.: Systematic Development of Data Mining-Based Data Quality Tools. In: Proc. of the Intl. VLDB Conf., pp. 548–559 (2003)
Pearson, R.K.: Data Mining in Face of Contaminated and Incomplete Records. In: Proc. of SIAM Intl. Conf. Data Mining (2002)
Pyle, D.: Data Preparation for Data Mining, Morgan Kaufmann (1999)
Rahm, E., Do, H.: Data Cleaning: Problems and Current Approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the Right Interestingness Measure for Association Patterns. In: Proc. of Intl. KDD Conf., pp. 32–41 (2002)
Wang, R., Storey, V., Firth, C.: A Framework for Analysis of Data Quality Research. IEEE TKDE 7(4), 670–677 (1995)
Wang, K., Zhou, S., Yang, Q., Yeung, J.M.S.: Mining Customer Value: from Association Rules to Direct Marketing. J. of Data Mining and Knowledge Discovery (2005)
Zhang, C., Yang, Q., Liu, B.: Introduction: Special Section on Intelligent Data Preparation. IEEE Transactions on Knowledge and Data Engineering 17(9) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berti-Équille, L. (2006). Quality-Aware Association Rule Mining. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_51
Download citation
DOI: https://doi.org/10.1007/11731139_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)