skip to main content
10.1145/3299815.3314428acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Heuristically Reducing the Cost of Correlation-based Feature Selection

Published: 18 April 2019 Publication History

Abstract

Feature selection plays a critical role in processing today's massive datasets. A popular wrapper-based feature selection technique, Correlation-based Feature Selection (CFS), heuristically searches the exponential number of possible feature subsets to find a subset with a high predictive power. In this work, we examine the ability of multi-task learning and heuristic subset decomposition to reduce the cost of using CFS. To do this, we describe and evaluate two algorithms, Multi-Task Correlation-based Feature Selection (MT-CFS) and Heuristic Decomposition Correlation-based Feature Selection (HD-CFS). Both algorithms utilize CFS on feature subspaces reduced by either a technique inspired by multi-task learning or a novel technique that heuristically subdivides the feature space for faster execution with minimal losses in accuracy. We conclude that HD-CFS shows promise as a standalone feature selection technique.

References

[1]
M. Bennasar, Y. Hicks, and R. Setchi. 2015. Feature Selection Using Joint Mutual Information Maximisation. Expert Systems with Applications 42, 22 (2015), pp. 8520--8532.
[2]
K. Brown and D. Talbert. 2018. Multi-Task Correlation-Based Feature Selection for Gene Expression Data. In Proceedings of the American Medical Informatics Association Annual Symposium. San Francisco, CA, USA.
[3]
R. Caruana. 1997. Multitask Learning. Machine Learning 28, 1 (1997), pp. 41--75.
[4]
D. Dheeru and E. Karra Taniskidou. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[5]
D. Dong, H. Wu, W. He, D. Yu, and H. Wang. 2015. Multi-Task Learning for Multiple Language Translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. Beijing, China, pp. 1723--1732.
[6]
J. Friedmanand, T. Hastie, and R. Tibshirani. 2009. The Elements of Statistical Learning. Vol. 2. Springer Series in Statistics, New York, NY, USA.
[7]
L. Gautier. 2008. rpy2: A Simple and Efficient Access to R from Python. URL http://rpy.sourceforge.net/rpy2.html (2008).
[8]
P. Ghamisi and J. Benediktsson. 2015. Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization. IEEE Geoscience and Remote Sensing Letters 12, 2 (2015), pp. 309--313.
[9]
I. Guyon and A. Elisseeff. 2003. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, Mar (2003), pp. 1157--1182.
[10]
M. Hall. 1999. Correlation-Based Feature Selection for Machine Learning. Ph.D. Dissertation. Hamilton, New Zealand.
[11]
Z. Hu, C. Fan, D. Oh, J. M, B. Qaqish, F. Bahjat, C. Livasy, L. Carey, et al. 2006. The Molecular Portraits of Breast Tumors Are Conserved Across Microarray Platforms. BMC genomics 7, 1 (2006), p. 96.
[12]
G. John, R. Kohavi, and K. Pfleger. 1994. Irrelevant Features and the Subset Selection Problem. In Machine Learning Proceedings 1994. Elsevier, pp. 121--129.
[13]
K. Kira and L. Rendell. 1992. A Practical Approach to Feature Selection. In Proceedings of the Ninth International Workshop on Machine Learning (ML92). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 249--256. http://dl.acm.org/citation.cfm?id=141975.142034
[14]
R. Kohavi and G. John. 1997. Wrappers for Feature Subset Selection. Artificial intelligence 97, 1-2 (1997), pp. 273--324.
[15]
I. Kononenko. 1994. Estimating Attributes: Analysis and Extensions of RELIEF. In Machine Learning: ECML-94. Springer Berlin Heidelberg, Berlin, Germany, pp. 171--182.
[16]
H. Liu and H. Motoda. 2007. Computational Methods of Feature Selection. CRC Press, Boca Raton, FL, USA.
[17]
F. Nie, H. Huang, X. Cai, and C. Ding. 2010. Efficient and Robust Feature Selection via Joint l2, 1-Norms Minimization. In Advances in Neural Information Processing Systems. Vancouver, Canada, pp. 1813--1821.
[18]
G. Obozinkski, B. Taskar, and M. Jordan. 2006. Multi-Task Feature Selection. Statistics Department, UC Berkeley, Tech. Rep 2 (2006).
[19]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passo, D. Cournapeu, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), pp. 2825--2830.
[20]
M. Robnik-Šikonja and I. Kononenko. 2003. Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine learning 53, 1-2 (2003), pp. 23--69.
[21]
G. Roffo, S. Melzi, and M. Cristani. 2015. Infinite Feature Selection. In 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, pp. 4202--4210.
[22]
P. Romanski. 2013. FSelector: Selecting Attributes. R Package Version 0.19.
[23]
S. Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. CoRR abs/1706.05098 (2017).
[24]
Y. Wang, I. Tetko, M. Hall, E. Frank, A. Facius, K. Mayer, and H. Mewes. 2005. Gene Selection from Microarray Data for Cancer Classification -- A Machine Learning Approach. Computational Biology and Chemistry 29, 1 (2005), pp. 37--46.
[25]
B. Xue, M. Zhang, W. Browne, and X. Yao. 2016. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Transactions on Evolutionary Computation 20, 4 (2016), pp. 606--626.
[26]
L. Yu and H. Liu. 2003. Feature Selection for High-Dimensional Data: A Fast Correlation-based Filter Solution. In Proceedings of the 20th International Conference on Machine Learning(ICML-03). Washington D.C., USA, pp. 856--863.

Cited By

View all
  • (2022)A Sketch-based Index for Correlated Dataset Search2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00264(2928-2941)Online publication date: May-2022
  • (2020)A Redundancy Metric based on the Framework of Possibility Theory for Technical Systems2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA46521.2020.9212080(1571-1578)Online publication date: Sep-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '19: Proceedings of the 2019 ACM Southeast Conference
April 2019
295 pages
ISBN:9781450362511
DOI:10.1145/3299815
  • Conference Chair:
  • Dan Lo,
  • Program Chair:
  • Donghyun Kim,
  • Publications Chair:
  • Eric Gamess
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Feature Selection
  2. Lifelong Machine Learning
  3. Multi-task Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACM SE '19
Sponsor:
ACM SE '19: 2019 ACM Southeast Conference
April 18 - 20, 2019
GA, Kennesaw, USA

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Sketch-based Index for Correlated Dataset Search2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00264(2928-2941)Online publication date: May-2022
  • (2020)A Redundancy Metric based on the Framework of Possibility Theory for Technical Systems2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA46521.2020.9212080(1571-1578)Online publication date: Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media