ABSTRACT
In this paper, we propose a novel probabilistic approach to summarize frequent itemset patterns. Such techniques are useful for summarization, post-processing, and end-user interpretation, particularly for problems where the resulting set of patterns are huge. In our approach items in the dataset are modeled as random variables. We then construct a Markov Random Fields (MRF) on these variables based on frequent itemsets and their occurrence statistics. The summarization proceeds in a level-wise iterative fashion. Occurrence statistics of itemsets at the lowest level are used to construct an initial MRF. Statistics of itemsets at the next level can then be inferred from the model. We use those patterns whose occurrence can not be accurately inferred from the model to augment the model in an iterative manner, repeating the procedure until all frequent itemsets can be modeled. The resulting MRF model affords a concise and useful representation of the original collection of itemsets. Extensive empirical study on real datasets show that the new approach can effectively summarize a large number of itemsets and typically significantly outperforms extant approaches.
- F. N. Afrati, A. Gionis, and H. Mannila. Approximating a collection of frequent sets. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 12--19, 2004.]] Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 487--499, 1994.]] Google ScholarDigital Library
- C. Borgelt. Efficient implementations of apriori and eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003.]]Google Scholar
- T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, 2002.]] Google ScholarDigital Library
- T. Calders and B. Goethals. Depth-first non-derivable itemset mining. In Proceedings of the SIAM 2005 International Conference on Data Mining, 2005.]]Google ScholarCross Ref
- A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman & Hall/CRC, 2004.]]Google Scholar
- A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. K. Chen, and P. Dubey. Cache-conscious frequent pattern mining on a modern processor. In Proceedings of the 31st International Conference on Very Large Data Bases, pages 577--588, 2005.]] Google ScholarDigital Library
- K. Gouda and M. J. Zaki. Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3):223--242, November 2005.]] Google ScholarDigital Library
- D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. Data mining, hypergraph transversals, and machine learning. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 209--216, 1997.]] Google ScholarDigital Library
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 1--12, 2000.]] Google ScholarDigital Library
- J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In Proceedings of the 2002 IEEE International Conference on Data Mining, pages 211--218, 2002.]] Google ScholarDigital Library
- F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1998.]] Google ScholarDigital Library
- S. Lauritzen and D. Speigelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B (Methodological), 50(2):157--224, 1988.]]Google ScholarCross Ref
- N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Database Theory - ICDT '99, 7th International Conference, Jerusalem, Israel, January 10-12, 1999, Proceedings, pages 398--416, 1999.]] Google ScholarDigital Library
- D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering, 15(6):1409--1421, November 2003.]] Google ScholarDigital Library
- C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In The Ohio State University, Technical Report, 2006.]]Google Scholar
- X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing itemset patterns: a profile-based approach. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 314--323, 2005.]] Google ScholarDigital Library
- M. J. Zaki and C.-J. Hsiao. Charm: An efficient algorithm for closed itemset mining. In Proceedings of the Second SIAM International Conference on Data Mining, 2002.]]Google ScholarCross Ref
- M. J. Zaki, S. Parthasarathy, and W. L. Mitsunori Ogihara. New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 283--286, 1997.]]Google ScholarDigital Library
Index Terms
- Summarizing itemset patterns using probabilistic models
Recommendations
Summarizing itemset patterns: a profile-based approach
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningFrequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the ...
Summarizing probabilistic frequent patterns: a fast approach
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningMining probabilistic frequent patterns from uncertain data has received a great deal of attention in recent years due to the wide applications. However, probabilistic frequent pattern mining suffers from the problem that an exponential number of result ...
Discovering Skyline Periodic Itemset Patterns in Transaction Sequences
Advanced Data Mining and ApplicationsAbstractAs an extended version of frequent itemset patterns, periodic itemset patterns concern both the frequency and periodicity of itemsets at the same time, so they contain more information than frequent itemset patterns, which only concern the ...
Comments