Abstract
It is noticeable in different heterogeneity types that complexity is inherent in heterogeneous data, and regression analysis methods are well defined and exhibit high-accuracy performance with numeric data. However, real-world problems contain non-numerical variables. There are two main approaches to handling mixed-type data sets in regression analyses. The first approach is unifying data types for all the variables (such as continuous numerical data) and then applying the regression analysis. However, this approach degrades the data quality, as some original data types are converted to other types in the learning stage. The second approach is to apply some similarity measurements, which can be highly complex in some situations. To overcome these limitations, we propose a tree-based regression model to effectively handle the mixed-type data sets without using a dummy code or a similarity measurement.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kim, K., Hong, J.S.: A hybrid decision tree algorithm for mixed numeric and categorical data in regression analysis. Pattern Recogn. Lett. 98, 39–45 (2017). https://doi.org/10.1016/j.patrec.2017.08.011
Cuadras, C.M., Arenas, C.: A distance-based regression model for prediction with mixed data. Commun. Stat. - Theor. Meth. 19, 2261–2279 (1990). https://doi.org/10.1080/03610929008830319
Hardy, M.A.: Regression with Dummy Variables, vol. 93. Sage Publications, Newbury Park (1993)
Cuadras, C.M., Areans, C., Fortiana, J.: Some computational aspects of a distance-based model for prediction. Commun. Stat. - Simul. Comput. 25, 593–609 (1996). https://doi.org/10.1080/03610919608813332
Marcoulides, G.A.: Discovering knowledge in data: an introduction to data mining. J. Am. Stat. Assoc. 100, 1465–1465 (2005)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Royal Stat. Soc.: Ser. B (Statistical Methodology) 68, 49–67 (2006). https://doi.org/10.1111/j.1467-9868.2005.00532.x
Boj Del Val, E., Claramunt Bielsa, M.M., Fortiana, J.: Selection of predictors in distance-based regression. Commun. Stat. - Simul. Comput. 36, 87–98 (2007). https://doi.org/10.1080/03610910601096312
Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. Royal Stat. Soc.: Ser. B (Statistical Methodology) 70, 53–71 (2008). https://doi.org/10.1111/j.1467-9868.2007.00627.x
Han, J., Kamber, M., Pei, J.: Data Mining: Moncepts and Techniques. Morgan Kaufmann, San Francisco (2012)
Cohen, J., Cohen, P., West, S.G., Aiken, L.S.: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Routledge, New York (2003)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2011). https://doi.org/10.1016/C2009-0-19715-5
UCI Machine Learning repository. https://archive.ics.uci.edu/ml/index.php
Kaggle. https://www.kaggle.com
Quinlan, J.R.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, vol. 92, pp. 343-348. Singapore (1992)
Grubinger, T., Zeileis, A., Pfeiffer, K.-P.: Evtree : evolutionary learning of globally optimal classification and regression trees in R. J. Stat. Softw. 61, 1–29 (2014). https://doi.org/10.18637/jss.v061.i01
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Alghanmi, N., Zeng, XJ. (2020). A Hybrid Regression Model for Mixed Numerical and Categorical Data. In: Ju, Z., Yang, L., Yang, C., Gegov, A., Zhou, D. (eds) Advances in Computational Intelligence Systems. UKCI 2019. Advances in Intelligent Systems and Computing, vol 1043. Springer, Cham. https://doi.org/10.1007/978-3-030-29933-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-29933-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29932-3
Online ISBN: 978-3-030-29933-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)