skip to main content
10.1145/3449726.3463141acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

House price prediction using clustering and genetic programming along with conducting a comparative study

Published: 08 July 2021 Publication History

Abstract

One of the most important tasks in machine learning is prediction. Data scientists use different regression methods to find the most appropriate and accurate model for each type of datasets. This study proposes a method to improve accuracy in regression and prediction. In common methods, different models are applied to the whole data to find the best model with higher accuracy. In our proposed approach, first, we cluster data using different methods such as K-means, DBSCAN, and agglomerative hierarchical clustering algorithms. Then, for each clustering method and for each generated cluster we apply various regression models including linear and polynomial regressions, SVR, neural network, and symbolic regression in order to find the most accurate model and study the genetic programming potential in improving the prediction accuracy. This model is a combination of clustering and regression. After clustering, the number of samples in each created cluster, compared to the number of samples in the whole dataset is reduced, and consequently by decreasing the number of samples in each group, we lose accuracy. On the other hand, specifying data and setting similar samples in one group enhances the accuracy and decreases the computational cost. As a case study, we used real estate data with 20 features to improve house price estimation; however, this approach is applicable to other large datasets.

References

[1]
[n.d.]. European Public Real Estate Association. http://alturl.com/7snxx.
[2]
[n.d.]. House Sales Data set in King County, USA. https://www.kaggle.com/harlfoxem/housesalesprediction/version/1.
[3]
Patrick Bajari, Denis Nekipelov, Stephen P Ryan, and Miaoyu Yang. 2015. Machine learning methods for demand estimation. American Economic Review 105, 5 (2015), 481--85.
[4]
Pavel Berkhin. 2006. A survey of clustering data mining techniques. In Grouping multidimensional data. Springer, 25--71.
[5]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.
[6]
Martin Ester and Kriegel. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, Vol. 96. 226--231.
[7]
John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), 100--108.
[8]
M Kantardzic. 2003. Data Mining Concepts, Models, Methods, and Algorithms. A John Wiley & Sons. Inc., Chichester (2003).
[9]
R Kohavi and F Provost. 1998. Glossary of terms: Machine learning. 30:271 274 (1998).
[10]
J Koza, Martin A Keane, and James P Rice. 1993. Performance improvement of machine learning via automatic discovery of facilitating functions as applied to a problem of symbolic system identification. In IEEE International Conference on Neural Networks. IEEE, 191--198.
[11]
John R Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4, 2 (1994), 87--112.
[12]
John McCarthy and Edward A Feigenbaum. 1990. In memoriam: Arthur samuel: Pioneer in machine learning. AI Magazine 11, 3 (1990), 10--10.
[13]
Warren S McCulloch and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics 5, 4 (1943), 115--133.
[14]
Janardan Misra and Indranil Saha. 2010. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74, 1-3 (2010), 239--255.
[15]
Vahid Moosavi. 2017. Urban data streams and machine learning: a case of swiss real estate market. arXiv preprint arXiv:1704.04979 (2017).
[16]
Lior Rokach and Oded Maimon. 2005. Clustering methods. In Data mining and knowledge discovery handbook. Springer, 321--352.
[17]
Michael Schmidt and Hod Lipson. 2013. Eureqa (version 0.98 beta)[software]. Nutonian, Somerville, Mass, USA (2013).
[18]
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.
[19]
Halbert White. 1992. Artificial neural networks: approximation and learning theory. Blackwell Publishers, Inc.
[20]
Chi Zhang. [n.d.]. Genetic programming for symbolic regression. University of Tennesse, Knoxville, TN 37996 ([n. d.]).

Cited By

View all
  • (2024)An Optimal House Price Prediction Algorithm: XGBoostAnalytics10.3390/analytics30100033:1(30-45)Online publication date: 2-Jan-2024
  • (2023)House Price Prediction in US Metropolitan AreasSSRN Electronic Journal10.2139/ssrn.4494273Online publication date: 2023
  • (2022)House Price Prediction Using Linier Regression2022 IEEE 8th International Conference on Computing, Engineering and Design (ICCED)10.1109/ICCED56140.2022.10010684(1-5)Online publication date: 28-Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2021
2047 pages
ISBN:9781450383516
DOI:10.1145/3449726
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. genetic programming
  3. house price prediction
  4. machine learning
  5. multi-level-model
  6. regression
  7. symbolic regression

Qualifiers

  • Research-article

Conference

GECCO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Optimal House Price Prediction Algorithm: XGBoostAnalytics10.3390/analytics30100033:1(30-45)Online publication date: 2-Jan-2024
  • (2023)House Price Prediction in US Metropolitan AreasSSRN Electronic Journal10.2139/ssrn.4494273Online publication date: 2023
  • (2022)House Price Prediction Using Linier Regression2022 IEEE 8th International Conference on Computing, Engineering and Design (ICCED)10.1109/ICCED56140.2022.10010684(1-5)Online publication date: 28-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media