research-article

Empirical Analysis of Filter Feature Selection Criteria on Financial Datasets

Authors:
Bui Quoc Trung

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

0000-0002-9349-3099
View Profile

,
Tran Van Tri

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

0000-0002-1755-3467
View Profile

,
Bui Thi-Mai-Anh

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

0000-0001-7877-9438
View Profile

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication TechnologyDecember 2022Pages 413–419https://doi.org/10.1145/3568562.3568604

Published:01 December 2022Publication History

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

Pages 413–419

ABSTRACT

High dimensionality is one of the data quality problems that affects the performance of machine learning models. Feature selection which aims to identify and remove as many redundant and irrelevant features as possible allows to boot the overall performance of the models while reducing the computational cost. However, the choice of an appropriate feature selection method is still a big challenge as there is no the best selection criterion that fits to all datasets. It is then essential to comparatively analyze the performance of feature selection criteria according to different characteristics of high-dimensional datasets, particularly large financial datasets where features are highly-correlated and redundant. In this paper, we explore nine different feature selection criteria which are typically categorized into two classes: (i) information theoretical based criteria and (ii) similarity based criteria over seven public financial datasets. To the best of our knowledge, no previous comprehensive empirical investigation has been carried out to demonstrate the positive effects of feature selection criteria on financial data. Experimental results indicate that the information theoretical-based methods suffer from a high computation time in case of high dimensional data (i.e., high number of features) while the similarity-based methods require significant computations to deal with high volume dataset (i.e., high number of samples).

References

Raymond Anderson. 2007. The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford University Press.Google Scholar
Roberto Battiti. 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on neural networks 5, 4 (1994), 537–550.Google ScholarDigital Library
Verónica Bolón-Canedo, Noelia Sánchez-Maroño, and Amparo Alonso-Betanzos. 2015. Recent advances and emerging challenges of feature selection in the context of big data. Knowledge-based systems 86 (2015), 33–45.Google Scholar
Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. 2012. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. The journal of machine learning research 13 (2012), 27–66.Google Scholar
Zheng Chen, Meng Pang, Zixin Zhao, Shuainan Li, Rui Miao, Yifan Zhang, Xiaoyue Feng, Xin Feng, Yexian Zhang, Meiyu Duan, 2020. Feature selection may improve deep neural networks for the bioinformatics problems. Bioinformatics 36, 5 (2020), 1542–1552.Google Scholar
Baljinder Ghotra, Shane McIntosh, and Ahmed E Hassan. 2017. A large-scale study of the impact of feature selection techniques on defect classification models. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 146–157.Google ScholarDigital Library
D Asir Antony Gnana, S Appavu Alias Balamurugan, and E Jebamalar Leavline. 2016. Literature review on feature selection methods for high-dimensional data. International Journal of Computer Applications 136, 1(2016), 9–17.Google ScholarCross Ref
Robert M Gray. 2011. Entropy and information theory. Springer Science & Business Media.Google Scholar
Peter E Hart, David G Stork, and Richard O Duda. 2000. Pattern classification. Wiley Hoboken.Google Scholar
Xiaofei He, Deng Cai, and Partha Niyogi. 2005. Laplacian score for feature selection. Advances in neural information processing systems 18 (2005).Google Scholar
Firuz Kamalov and Fadi Thabtah. 2017. A feature selection method based on ranked vector scores of features for classification. Annals of Data Science 4, 4 (2017), 483–502.Google ScholarCross Ref
Nikita Kozodoi, Stefan Lessmann, Konstantinos Papakonstantinou, Yiannis Gatsoulis, and Bart Baesens. 2019. A multi-objective approach for profit-driven feature selection in credit scoring. Decision support systems 120 (2019), 106–117.Google Scholar
Stefan Lessmann, Bart Baesens, Hsin-Vonn Seow, and Lyn C Thomas. 2015. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247, 1 (2015), 124–136.Google ScholarCross Ref
David D Lewis. 1992. Feature selection and feature extraction for text categorization. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992.Google ScholarDigital Library
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM computing surveys (CSUR) 50, 6 (2017), 1–45.Google Scholar
Hongliang Liang, Lu Sun, Meilin Wang, and Yuxing Yang. 2019. Deep learning with customized abstract syntax tree for bug localization. IEEE Access 7(2019), 116309–116320.Google ScholarCross Ref
Dahua Lin and Xiaoou Tang. 2006. Conditional infomax learning: An integrated framework for feature extraction and fusion. In European conference on computer vision. Springer, 68–82.Google ScholarDigital Library
Sebastián Maldonado, Juan Pérez, and Cristián Bravo. 2017. Cost-based feature selection for support vector machines: An application in credit scoring. European Journal of Operational Research 261, 2 (2017), 656–665.Google ScholarCross Ref
David Alfred Ostrowski. 2014. Feature selection for twitter classification. In 2014 IEEE International Conference on Semantic Computing. IEEE, 267–272.Google ScholarDigital Library
Kunal Pahwa and Neha Agarwal. 2019. Stock market analysis using supervised machine learning. In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon). IEEE, 197–200.Google ScholarCross Ref
Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence 27, 8(2005), 1226–1238.Google ScholarDigital Library
Khairan D Rajab. 2017. New hybrid features selection method: A case study on websites phishing. Security and Communication Networks 2017 (2017).Google Scholar
Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and empirical analysis of ReliefF and RReliefF. Machine learning 53, 1 (2003), 23–69.Google Scholar
Lyn Thomas, Jonathan Crook, and David Edelman. 2017. Credit scoring and its applications. SIAM.Google Scholar
Shrawan Kumar Trivedi. 2020. A study on credit scoring modeling with different feature selection and machine learning approaches. Technology in Society 63(2020), 101413.Google ScholarCross Ref

Index Terms

Empirical Analysis of Filter Feature Selection Criteria on Financial Datasets

Recommendations

Empirical Study of Individual Feature Evaluators and Cutting Criteria for Feature Selection in Classification
ISDA '09: Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications

The use of feature selection can improve accuracy, efficiency, applicability and understandability of a learning process and its resulting model. For this reason, many methods of automatic feature selection have been developed. By using a modularization ...
Read More
Feature Selection for financial data – comparison
Abstract
Data analysis is currently one the key for the success of good condition of the companies. Feature selection as a preprocessing of data method the estimator accuracy scores can be improved, as well the performance on very high-dimensional data set ...
Read More
Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality
Abstract
Curse of dimensionality problem needs to be addressed carefully when designing a classifier. Given a huge dimensional dataset, one interesting problem is the choice of optimal selection of features for classification. Feature selection is an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology
December 2022
474 pages
ISBN:9781450397254
DOI:10.1145/3568562

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature scoring
feature selection criteria
financial datasets
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of318submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 22
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Empirical Analysis of Filter Feature Selection Criteria on Financial Datasets

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Empirical Study of Individual Feature Evaluators and Cutting Criteria for Feature Selection in Classification

Feature Selection for financial data – comparison

Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Empirical Analysis of Filter Feature Selection Criteria on Financial Datasets

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Empirical Study of Individual Feature Evaluators and Cutting Criteria for Feature Selection in Classification

Feature Selection for financial data – comparison

Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media