Abstract
The application of feature subsets with high order correlation in classification has demonstrates its power in a recent study, where non-redundant interacting feature subsets (NIFS) is defined based on multi-information. In this paper, we re-examine the problem of finding NIFSs. We further improve the upper bounds and lower bounds on the correlations, which can be used to significantly prune the search space. The experiments on real datasets demonstrate the efficiency and effectiveness of our approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between sets of items in large databases. In: Buneman, P. (ed.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, pp. 207–216. ACM Press, New York (1993)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB 1994, pp. 487–499 (1994)
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proceedings of 1997 ACM-SIGMOD International Conference on Management of Data (SIGMOD 1997) (1997)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Cover, T., Thomas, J.: Elements of Information Theory. Wiley Series in Telecommunications. Wiley Interscience, Hoboken (1991)
Han, T.S.: Nonnegative entropy measures of multivariate symmetric correlations. Inform. Contr. 36, 133–156 (1978)
Heikinheimo, H., Hinkkanen, E., Mannila, H., Mielikainen, T., Seppanen, J.: Finding low-Entropy sets and trees from binary data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)
Ke, Y., Cheng, J., Ng, W.: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach. In: Eliassi-Rad, T. (ed.) Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 227–236. ACM, Philadelphia (2006)
Knobbe, A., Ho, E.: Maximally informative k-itemsets and their efficient discovery. In: KDD 2006, pp. 237–244 (2006)
Omiecinski, E.R.: Alternative Interest Measures for Mining Associations in Databases. IEEE Transactions on Data Engineering 15(1), 57–69 (2003)
Pan, F., Roberts, A., McMillan, L., de Villena, F., Threadgill, D., Wang, W.: Sample selection for maximal diversity. In: Proceedings of the 5th IEEE International Conference on Data Mining (2007)
Pan, F., Wang, W., Tung, A.K.H., Yang, J.: Finding representative set from massive data. In: ICDM 2005, pp. 338–345 (2005)
Xiong, H., Tan, P., Kumar, V.: Hyperclique Pattern Discovery. Data Mining and Knowledge Discovery Journal 13(2), 219–242 (2006)
Yeung, R.W.: A first course in information theory. Springer, Heidelberg (2002)
Zhang, X., Pan, F., Wang, W., Nobel, A.: Mining non-redundant high order correlation in binary data. In: Proceedings of the 34th International Conference on Very Large Data Bases, Vienna, Austria, Auckland, New Zealand (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sha, C., Gong, J., Zhou, A. (2009). An Improved Algorithm for Mining Non-Redundant Interacting Feature Subsets. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-00672-2_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00671-5
Online ISBN: 978-3-642-00672-2
eBook Packages: Computer ScienceComputer Science (R0)