ABSTRACT
The high-dimensional data in market price prediction is a great challenge that was not effectively addressed by the traditional data-driven feature selection approaches. This paper introduces a novel Domain-Knowledge based Feature Selection Framework (DKFS), specifically designed for product-oriented applications. By adopting a two-step approach, the traditional statistical method (i.e., filter method) is integrated with domain-specific knowledge, offering an enhanced layer of selection that ensures a rigorous and efficient exclusion of irrelevant or redundant features. The framework was applied to a real-world application of sailboat price prediction, with three modelling techniques (Multiple Linear Regression, Random Forest, and Gradient Boosting) evaluated based on a comprehensive dataset of over 2,500 sailboat transactions. The adopted approach demonstrated an exceptional performance capturing 90.8% variability with a small set of 26 features, including economic indicators and geographical factors. The proposed framework illustrates significant effectiveness in dimensionality reduction and offers broad applicability across various domains. It also presents a promising direction for further research into the use of expert systems and adaptive feature selection design.
- Dierks, L. and Seuken, S. (2020) ‘The competitive effects of variance-based pricing’, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence [Preprint]. doi:10.24963/ijcai.2020/51.Google ScholarCross Ref
- Dunn, Jack & Mingardi, Luca & Zhuo, Ying. (2021). Comparing interpretability and explainability for feature selection., Kenji Kira and Larry A. Rendell. 1992. The feature selection problem: traditional methods and a new algorithm. In Proceedings of the tenth national conference on Artificial intelligence (AAAI'92). AAAI Press, 129–134.Google Scholar
- Kumar, V. (2014) ‘Feature selection: A literature review’, The Smart Computing Review, 4(3). doi:10.6029/smartcr.2014.03.007.Google ScholarCross Ref
- Bolón-Canedo, V., Sánchez-Maroño, N. and Alonso-Betanzos, A. (2012) ‘A review of feature selection methods on Synthetic Data’, Knowledge and Information Systems, 34(3), pp. 483–519. doi:10.1007/s10115-012-0487-8.Google ScholarDigital Library
- Liu, H., Liu, L. and Zhang, H. (2008) ‘Feature selection using Mutual Information: An experimental study’, PRICAI 2008: Trends in Artificial Intelligence, pp. 235–246. doi:10.1007/978-3-540-89197-0_24.Google ScholarDigital Library
- George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, null (3/1/2003), 1289–1305.Google Scholar
- Budak, H. and Erpolat Taşabat, S. (2016) ‘A modified T-score for feature selection’, ANADOLU UNIVERSITY JOURNAL OF SCIENCE AND TECHNOLOGY A - Applied Sciences and Engineering, 17(5), pp. 845–845. doi:10.18038/aubtda.279853.Google ScholarCross Ref
- Kohavi, R. and John, G.H. (1997) ‘Wrappers for feature subset selection’, Artificial Intelligence, 97(1–2), pp. 273–324. doi:10.1016/s0004-3702(97)00043-x.Google ScholarDigital Library
- Riyaz Sikora and Selwyn Piramuthu. 2007. Framework for efficient feature selection in genetic algorithm based data mining. European Journal of Operational Research 180, 723-737. https://doi.org/10.1016/j.ejor.2006.02.040Google ScholarCross Ref
- Hehui Qian and Zhiwei Qiu. 2014. Feature selection using C4.5 algorithm for electricity price prediction. 2014 International Conference on Machine Learning and Cybernetics, 175-180. https://doi.org/10.1109/icmlc.2014.7009113Google ScholarCross Ref
- He, B. (2019) ‘Heuristic search algorithm for dimensionality reduction optimally combining feature selection and feature extraction’, Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), pp. 2280–2287. doi:10.1609/aaai.v33i01.33012280.Google ScholarDigital Library
- Kenji Kira and Larry A. Rendell. 1992. The feature selection problem: traditional methods and a new algorithm. In Proceedings of the tenth national conference on Artificial intelligence (AAAI'92). AAAI Press, 129–134.Google Scholar
- Qingqi Zhang. 2021. Housing Price Prediction Based on Multiple Linear Regression. Scientific Programming 2021, 1-9. https://doi.org/10.1155/2021/7678931Google ScholarCross Ref
- Radhika Swarnkar, Rhea Sawant, Harikrishnan R, and Srideviponmalar P. 2023. Multiple Linear Regression Algorithm-based Car Price Prediction. 2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS), 675-681. https://doi.org/10.1109/icais56108.2023.10073882Google ScholarCross Ref
- Jengei Hong, Heeyoul Choi, and Woo-sung Kim. 2020. A HOUSE PRICE VALUATION BASED ON THE RANDOM FOREST APPROACH: THE MASS APPRAISAL OF RESIDENTIAL PROPERTY IN SOUTH KOREA. International Journal of Strategic Property Management 24, 140-152. https://doi.org/10.3846/ijspm.2020.11544Google ScholarCross Ref
- Khaidem, L., Saha, S. and Dey, S.R., 2016. Predicting the direction of stock market prices using random forest. arXiv:1605.00003. Retrieved from https://arxiv.org/abs/1605.00003Google Scholar
- Sarkar Snigdha Sarathi Das, Mohammed Eunus Ali, Yuan-Fang Li, Yong-Bin Kang, and Timos Sellis. 2021. Boosting house price predictions using geo-spatial network embedding. Data Mining and Knowledge Discovery 35, 2221-2250. https://doi.org/10.1007/s10618-021-00789-xGoogle ScholarDigital Library
- Baoyang Cui, Zhonglin Ye, Haixing Zhao, Zhuome Renqing, Lei Meng, and Yanlin Yang. 2022. Used Car Price Prediction Based on the Iterative Framework of XGBoost+LightGBM. Electronics 11, 2932. https://doi.org/10.3390/electronics11182932Google ScholarCross Ref
- Kazi Ekramul Hoque and Hamoud Aljamaan. 2021. Impact of Hyperparameter Tuning on Machine Learning Models in Stock Price Forecasting. IEEE Access 9, 163815-163830. https://doi.org/10.1109/access.2021.3134138Google ScholarCross Ref
- Prabaljeet Singh Saini and Lekha Rani. 2023. Performance Evaluation of Popular Machine Learning Models for Used Car Price Prediction. Proceedings of International Conference on Data Analytics and Insights, ICDAI 2023, 577-588. DOI:https://doi.org/10.1007/978-981-99-3878-0_49Google ScholarCross Ref
- Zhang, Y. (2022) ‘Analysis and prediction of second-hand house price based on Random Forest’, Applied Mathematics and Nonlinear Sciences, 7(1), pp. 27–42. doi:10.2478/amns.2022.1.00052.Google ScholarCross Ref
- Jing Zhang, Shicheng Cui, Yan Xu, Qianmu Li, and Tao Li. 2018. A novel data-driven stock price trend prediction system. Expert Systems with Applications 97, 60-69. https://doi.org/10.1016/j.eswa.2017.12.026Google ScholarCross Ref
- Bin Weng, Lin Lu, Xing Wang, Fadel M. Megahed, and Waldyn Martinez. 2018. Predicting short-term stock prices using ensemble methods and online data sources. Expert Systems with Applications 112, 258-273. https://doi.org/10.1016/j.eswa.2018.06.016Google ScholarCross Ref
Index Terms
- Novel Domain-Knowledge Based Feature Selection Framework for Price Prediction: Comprehensive Modelling in Sailboat Market
Recommendations
General framework for class-specific feature selection
Commonly, when a feature selection algorithm is applied, a single feature subset is selected for all the classes, but this subset could be inadequate for some classes. Class-specific feature selection allows selecting a possible different feature subset ...
Correlation based feature selection method
Feature selection is an important data preprocessing step which is performed before a learning algorithm is applied. The issue that has to be taken into consideration when proposing a feature selection method is its computational complexity. Often, if ...
The performance of corporate financial distress prediction models with features selection guided by domain knowledge and data mining approaches
Experts in finance and accounting select feature subset for corporate financial distress prediction according to their professional understanding of the characteristics of the features, while researchers in data mining often believe that data alone can ...
Comments