Abstract
Comparison between sets of frequent itemsets has been traditionally utilized for raw dataset comparison assuming that frequent itemsets inherit the information lying in the original raw datasets. In this work, we revisit this assumption and examine whether dissimilarity between sets of frequent itemsets could serve as a measure of dissimilarity between raw datasets. In particular, we investigate how the dissimilarity between two sets of frequent itemsets is affected by the minSupport threshold used for their generation and the adopted compactness level of the itemsets lattice, namely frequent itemsets, closed frequent itemsets or maximal frequent itemsets. Our analysis shows that utilizing frequent itemsets comparison for dataset comparison is not as straightforward as related work has argued, a result which is verified through an experimental study and opens issues for further research in the KDD field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burdick, D., Calimlim, M., Gehrke, J.: Mafia: A maximal frequent itemset algorithm for transactional databases. In: International Conference on Data Engineering (ICDE), pp. 443–452. IEEE Computer Society, Los Alamitos (2001)
FIMI. Frequent itemsets mining data set repository (valid as of May 2008), http://fimi.cs.helsinki.fi/data/
Ganti, V., Gehrke, J., Ramakrishnan, R.: A framework for measuring changes in data characteristics. In: ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 126–137. ACM Press, New York (1999)
Li, T., Ogihara, M., Zhu, S.: Association-based similarity testing and its applications. Intelligent Data Analysis 7, 209–232 (2003)
Parthasarathy, S., Ogihara, M.: Clustering distributed homogeneous datasets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 566–574. Springer, Heidelberg (2000)
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: International Conference on Very Large Data Bases (VLDB), pp. 709–720. VLDB Endowment (2005)
Zaki, M., Hsiao, C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering (TKDE) 17(4), 462–478 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ntoutsi, I., Theodoridis, Y. (2008). Comparing Datasets Using Frequent Itemsets: Dependency on the Mining Parameters. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2008. Lecture Notes in Computer Science(), vol 5138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87881-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-87881-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87880-3
Online ISBN: 978-3-540-87881-0
eBook Packages: Computer ScienceComputer Science (R0)