Abstract
In this study, a novel approach for feature selection has been presented in order to overcome the challenge of classifying positive and negative risk prediction in the cryptocurrency market, which contains high fluctuation. This approach is based on maximizing information gain with simultaneously minimizing the similarity of selected features to achieve a proper feature set for improving classification accuracy. The proposed method was compared with other feature selection techniques, such as sequential and bidirectional feature selection, univariate feature selection, and least absolute shrinkage and selection operator. To evaluate the feature selection techniques, several classifiers were employed: XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression, long short-term memory, and deep neural networks. The features were elicited from the time series of Bitcoin, Binance, and Ethereum cryptocurrencies. The results of applying the selected features to different classifiers indicated that XGBoost and random forest provided better results on the time series datasets. Furthermore, the proposed feature selection method achieved the best results on two (out of three) cryptocurrencies. The accuracy in the best state varied between 55% to 68% for different time series. It is worth mentioning that preprocessed features were used in this research, meaning that raw data (candle data) were used to derive efficient features that can explain the problem and help the classifiers in predicting the labels.
- [1] . 2022. Retail vs institutional investor attention in the cryptocurrency market. Journal of International Financial Markets, Institutions and Money 81 (2022), 101674.
DOI: Google ScholarCross Ref - [2] . 2020. Security of cryptocurrencies in blockchain technology: State-of-art, challenges and future prospects. Journal of Network and Computer Applications 163 (2020), 102635. Google ScholarCross Ref
- [3] . 2022. Roles of stable versus nonstable cryptocurrencies in Bitcoin market dynamics. Research in International Business and Finance 62 (2022), 101720. Google ScholarCross Ref
- [4] . 2020. A deep learning-based cryptocurrency price prediction scheme for financial institutions. Journal of Information Security and Applications 55 (2020), 102583. Google ScholarCross Ref
- [5] . 2021. An extended regularized Kalman filter based on genetic algorithm: Application to dynamic asset pricing models. The Quarterly Review of Economics and Finance 79 (2021), 28–44. Google ScholarCross Ref
- [6] . 2021. CDEC: A constrained deep embedded clustering. International Journal of Intelligent Computing and Cybernetics 14, 4 (2021), 686–701. .Google ScholarCross Ref
- [7] . 2017. A comparative review on sleep stage classification methods in patients and healthy individuals. Computer Methods and Programs in Biomedicine 140 (2017), 77–91. Google ScholarDigital Library
- [8] . 2010. Contourlet-based mammography mass classification using the SVM family. Computers in Biology and Medicine 40, 4 (2010), 373–383.
DOI: Google ScholarDigital Library - [9] . 2022. SleepFCN: A fully convolutional deep learning framework for sleep stage classification using single-channel electroencephalograms. IEEE Transactions on Neural Systems and Rehabilitation Engineering 30 (2022), 2088–2096.
DOI: Google ScholarCross Ref - [10] . 2021. A combinatorial deep learning structure for precise depth of anesthesia estimation from EEG signals. IEEE Journal of Biomedical and Health Informatics 25, 9 (2021), 3408–3415.
DOI: Google ScholarCross Ref - [11] . 2022. MLP-based learnable window size for Bitcoin price prediction. Applied Soft Computing 129 (2022), 109584. Google ScholarDigital Library
- [12] . 2021. Short-term bitcoin market prediction via machine learning. The Journal of Finance and Data Science 7 (2021), 45–66.
DOI: Google ScholarCross Ref - [13] . 2020. Bitcoin price prediction using machine learning: An approach to sample dimension engineering. Journal of Computational and Applied Mathematics 365 (2020), 112395. Google ScholarDigital Library
- [14] . 2019. Bitcoin price forecasting with neuro-fuzzy techniques. European Journal of Operational Research 276 (2019), 770–780. Google ScholarCross Ref
- [15] . 2021. Feature selection and deep neural networks for stock price direction forecasting using technical analysis indicators. Machine Learning with Applications 5 (2021), 100060. Google ScholarCross Ref
- [16] . 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267–288. Google ScholarCross Ref
- [17] . 2019. Compound.Cox: Univariate feature selection and compound covariate for predicting survival. Computer Methods and Programs in Biomedicine 168 (2019), 21–37.
DOI: Google ScholarCross Ref - [18] . 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1 (1967), 21–27.
DOI: Google ScholarDigital Library - [19] . 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273–297. Google ScholarDigital Library
- [20] . 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 2, 8 (1995), 832–844. Google ScholarDigital Library
- [21] . 2000. Applied Logistic Regression (2nd ed.). WileyGoogle ScholarCross Ref
- [22] . 2021. Approximating XGBoost with an interpretable decision tree. Information Sciences 572 (2021), 522–542. Google ScholarDigital Library
- [23] . 2022. Continuous scoring of depression from EEG signals via a hybrid of convolutional neural networks. IEEE Transactions on Neural Systems and Rehabilitation Engineering 30 (2022), 176–183.
DOI: Google ScholarCross Ref - [24] . 2021. A deep neural network-based transfer learning to enhance the performance and learning speed of BCI systems. Brain-Computer Interfaces 8, 1-2 (2021), 14–25. Google ScholarCross Ref
- [25] . 2021. An EEG based hierarchical classification strategy to differentiate five intensities of pain. Expert Systems with Applications 180 (2021), 115010-1-14.
DOI: Google ScholarDigital Library - [26] . 2021. Quantification of pain severity using EEG-based functional connectivity. Biomedical Signal Processing and Control 69 (2021), 102840. Google ScholarCross Ref
- [27] . 2023. Feature selection and mapping of local binary pattern for texture classification. Multimedia Tools and Applications 82, 5 (2023), 7639–7676. Google ScholarDigital Library
- [28] . 2022. A hybrid feature selection scheme for high-dimensional data. Engineering Applications of Artificial Intelligence 113 (2022), 104894. Google ScholarDigital Library
- [29] . 2019. A fast hybrid feature selection method. 9th International Conference on Computer and Knowledge Engineering (ICCKE), Tehran (Iran), (2019), 6–11.
DOI: Google ScholarCross Ref - [30] . 2023. Cryptocurrency portfolio allocation using a novel hybrid and predictive big data decision support system. 115 (2023), 102787. Google ScholarCross Ref
- [31] . 2022. Past, present, and future of the application of machine learning in cryptocurrency research. Research in International Business and Finance 63 (2022), 101799. Google ScholarCross Ref
Index Terms
- A Novel Feature Selection Method for Risk Management in High-Dimensional Time Series of Cryptocurrency Market
Recommendations
A novel feature selection approach for biomedical data classification
This paper presents a novel feature selection approach to deal with issues of high dimensionality in biomedical data classification. Extensive research has been performed in the field of pattern recognition and machine learning. Dozens of feature ...
Effective hybrid feature subset selection for multilevel datasets using decision tree classifiers
Feature selection is one of the most significant procedures in machine learning algorithms. It is particularly to improve the performance and prediction accuracy for complex data classification. This paper discusses a hybrid feature selection technique ...
Ensemble feature selection for high dimensional data: a new method and a comparative study
The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for ...
Comments