Abstract
Contrast set mining has been well-studied to detect the change between several contrasted databases. In the previous studies, they compared the supports of an itemset and extracted the itemsets with significantly different supports across those databases. Differently, we contrast the correlations of an itemset between two contrasted databases and try to detect potential changes. Any highly correlated itemset is out of our concern in order to focus on implicitly emerging correlation. Therefore, we set correlation constraints (upper bounds) in both databases, and then extract the itemsets consisting of items that are not highly correlated in both databases, but having a significant change of correlations from one database to the other. We regard both of positive and negative correlation. We also consider correlated itemsets under conditioning by third variables. Thus so called partial correlation is also regarded. To cover the correlation notion, we use extended mutual information. In our search procedure for the correlated itemsets, we use double clique condition that is necessary for itemsets to be solutions satisfying the correlation constraints. We show its usefulness by some experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brin, S., Motwani, R., Silverstein, C.: Beyond Market Baskets: Generalizing Association Rules to Correlations. In: ACM SIGMOD International Conference on Management of Data, pp. 265–276. ACM Press, New York (1997)
Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining Colossal Frequent Patterns by Core Pattern Fusion. In: 23rd IEEE International Conference on Data Engineering, pp. 706–715. IEEE Press, Los Alamitos (2007)
Taniguchi, T.: A Study on Correlation Mining Based on Contrast Sets. Doctoral Dissertation. IST, Hokkaido University, Japan (2008)
Dong, G., Li, J.: Mining Border Descriptions of Emerging Patterns from Dataset Pairs. Knowledge and Information Systems 8(2), 178–202 (2005)
Bay, S.D., Pazzani, M.J.: Detecting Group Differences: Mining Contrast Sets. In: Data Mining and Knowledge Discovery, vol. 5, pp. 213–246. Kluwer Academic Publishers, Dordrecht (2001)
Zhang, X., Pan, F., Wang, W., Nobel, A.: Mining Non-Redundant High Order Correlations in Binary Data. In: Proceedings of VLDB, vol. 1(1), pp. 1178–1188 (2008)
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. SIAM, Philadelphia (2007)
Omiecinski, E.: Alternative Interest Measures for Mining Associations in Databases. IEEE Transactions on Knowledge and Data Engineering 15, 57–69 (2003)
Kim, W.Y., Lee, Y.K., Han, J.W.: CCMine: Efficient Mining of Confidence-Closed Correlated Patterns. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 569–579. Springer, Heidelberg (2004)
Younes, N.B., Hamrouni, T., Yahia, S.B.: Bridging Conjunctive and Disjunctive Search Spaces for Mining a New Concise and Exact Representation of Correlated Patterns. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS, vol. 6332, pp. 189–204. Springer, Heidelberg (2010)
Cheng, C., Fu, A., Zhang, Y.: Entropy-Based Subspace Clustering for Mining Numerical Data. In: 5th ACM SIGKDD, pp. 84–93. ACM press, New York (1999)
Novak, P.K., Lavrac, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. Journal of Machine Learning Research 10, 377–403 (2009)
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: ACM SIGMOD in 1993, pp. 207–216 (1993)
Ke, Y.P., Cheng, J., Ng, W.: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach. In: ACM KDD, pp. 227-236 (2006)
Rymon, R.: Search through Systematic Set Enumeration. In: International Conference on Principles of Knowledge Representation Reasoning-KR 1992, pp. 539–550. Morgan Kaufmann Publisher, CA (1992)
Sinka, M.P., Corne, D.W.: A Large Benchmark Dataset for Web Document Clustering. In: Soft Computing Systems: Design, Management and Applications. Frontiers in Artificial Intelligence and Applications, vol. 87, pp. 881–890 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, A., Haraguchi, M., Okubo, Y. (2011). Contrasting Correlations by an Efficient Double-Clique Condition. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-23199-5_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)