An Improved Algorithm for Mining Non-Redundant Interacting Feature Subsets

Sha, Chaofeng; Gong, Jian; Zhou, Aoying

doi:10.1007/978-3-642-00672-2_32

Chaofeng Sha²²,
Jian Gong²² &
Aoying Zhou²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5446))

Included in the following conference series:

1301 Accesses

Abstract

The application of feature subsets with high order correlation in classification has demonstrates its power in a recent study, where non-redundant interacting feature subsets (NIFS) is defined based on multi-information. In this paper, we re-examine the problem of finding NIFSs. We further improve the upper bounds and lower bounds on the correlations, which can be used to significantly prune the search space. The experiments on real datasets demonstrate the efficiency and effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Feature Selection Method Using Dynamic Dependency and Redundancy Analysis

Article 08 February 2022

Dynamic interaction-based feature selection algorithm for maximal relevance minimal redundancy

Article 03 August 2022

Feature Selection Using Approximate Multivariate Markov Blankets

References

Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between sets of items in large databases. In: Buneman, P. (ed.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, pp. 207–216. ACM Press, New York (1993)
Chapter Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB 1994, pp. 487–499 (1994)
Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proceedings of 1997 ACM-SIGMOD International Conference on Management of Data (SIGMOD 1997) (1997)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
MATH Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. Wiley Series in Telecommunications. Wiley Interscience, Hoboken (1991)
Book MATH Google Scholar
Han, T.S.: Nonnegative entropy measures of multivariate symmetric correlations. Inform. Contr. 36, 133–156 (1978)
Article MathSciNet MATH Google Scholar
Heikinheimo, H., Hinkkanen, E., Mannila, H., Mielikainen, T., Seppanen, J.: Finding low-Entropy sets and trees from binary data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)
Google Scholar
Ke, Y., Cheng, J., Ng, W.: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach. In: Eliassi-Rad, T. (ed.) Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 227–236. ACM, Philadelphia (2006)
Google Scholar
Knobbe, A., Ho, E.: Maximally informative k-itemsets and their efficient discovery. In: KDD 2006, pp. 237–244 (2006)
Google Scholar
Omiecinski, E.R.: Alternative Interest Measures for Mining Associations in Databases. IEEE Transactions on Data Engineering 15(1), 57–69 (2003)
Article MathSciNet Google Scholar
Pan, F., Roberts, A., McMillan, L., de Villena, F., Threadgill, D., Wang, W.: Sample selection for maximal diversity. In: Proceedings of the 5th IEEE International Conference on Data Mining (2007)
Google Scholar
Pan, F., Wang, W., Tung, A.K.H., Yang, J.: Finding representative set from massive data. In: ICDM 2005, pp. 338–345 (2005)
Google Scholar
Xiong, H., Tan, P., Kumar, V.: Hyperclique Pattern Discovery. Data Mining and Knowledge Discovery Journal 13(2), 219–242 (2006)
Article MathSciNet Google Scholar
Yeung, R.W.: A first course in information theory. Springer, Heidelberg (2002)
Book Google Scholar
Zhang, X., Pan, F., Wang, W., Nobel, A.: Mining non-redundant high order correlation in binary data. In: Proceedings of the 34th International Conference on Very Large Data Bases, Vienna, Austria, Auckland, New Zealand (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, China
Chaofeng Sha & Jian Gong
Shanghai Key Laboratory of Trustworthy Computing, ECNU, China
Aoying Zhou

Authors

Chaofeng Sha
View author publications
You can also search for this author in PubMed Google Scholar
Jian Gong
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Qing Li
Department of Computer Science & Technology, Tsinghua University, Beijing, China
Ling Feng
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby BC, Canada
Jian Pei
Department of Computer Science, University of Vermont, VT 05405, Burlington, USA
Sean X. Wang
School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, Brisbane, Australia
Xiaofang Zhou
Jiangsu Provincial Key Lab of Computer Information Processing Technology School of Computer Science & Technology, Soochow University China, 1 shizi Street Suzhou, 215006, Jiangsu, China
Qiao-Ming Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sha, C., Gong, J., Zhou, A. (2009). An Improved Algorithm for Mining Non-Redundant Interacting Feature Subsets. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-00672-2_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00671-5
Online ISBN: 978-3-642-00672-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics