skip to main content
research-article
Open access

Adaptive Weighted Finite Mixture Model: Identifying the Feature-Influence of Real Estate

Published: 14 September 2020 Publication History

Abstract

It is significant for real estate investors to understand how the construction environments and building characteristics impact the housing unit price. However, it is challenging for identifying the complex feature-influence from construction environments and building characteristics. It is also hard to alleviate the heterogeneity of real estate. In this article, we propose a framework named Adaptive Weighted Finite Mixture Model to identify the feature-influence and simultaneously alleviate the ill effect of heterogeneity. Applying this framework, we can predict the housing unit price based on the corresponding features. Besides, we discover that the feature-influence exists in the dissimilarity among similar cities. Specifically, we adaptively learn the weights of features to identify the feature-influence, and we model the estimation of the housing unit price with the feature-influence into a finite mixture model. We utilize the Principle Component Analysis algorithm to obtain a low-dimensional representation of housing features. The low-dimensional representation reduces the computational cost of model learning, and it avoids a potential catastrophe of the singular matrix inversion during the process of learning model parameters, which are estimated by the Expectation Maximization algorithm. To avoid the blind search for the latent group number used in the proposed framework, we employ the pre-clustering result as a guide of the searching range of the latent group numbers. We conduct numerous experiments on three real datasets from Shenyang, Changchun, and Harbin, which are the three provincial capital cities that have similar geography, economics, and cultures. The experimental results demonstrate the effectiveness of the proposed framework.

References

[1]
Hervé Abdi and Lynne J. Williams. 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 4 (2010), 433--459.
[2]
David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 1027--1035.
[3]
Timothy L. Bailey and Charles Elkan. 1994. Fitting a mixture model by expectation maximization to discover motifs in bipolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California, 28--36.
[4]
Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (1975), 509--517.
[5]
Tadeusz Caliński and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3, 1 (1974), 1--27.
[6]
John Clapp and Carmelo Giaccotto. 2002. Evaluating house price forecasts. Journal of Real Estate Research 24, 1 (2002), 1--26.
[7]
John M. Clapp and Carmelo Giaccotto. 1994. The influence of economic variables on local house price dynamics. Journal of Urban Economics 36, 2 (1994), 161--183.
[8]
Jianhua Dai and Qing Xu. 2013. Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Applied Soft Computing 13, 1 (2013), 211--221.
[9]
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1--22.
[10]
Chris Ding and Xiaofeng He. 2004. K-means clustering via principal component analysis. In Proceedings of the 21st International Conference on Machine Learning. ACM, 29.
[11]
Leigh Drake. 1993. Modelling UK house prices using cointegration: An application of the Johansen technique. Applied Economics 25, 9 (1993), 1225--1228.
[12]
Mingjing Du, Shifei Ding, and Hongjie Jia. 2016. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowledge-Based Systems 99 (2016), 135--145.
[13]
Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani, et al. 2004. Least angle regression. The Annals of Statistics 32, 2 (2004), 407--499.
[14]
Mario A. T. Figueiredo and Anil K. Jain. 2002. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis 8 Machine Intelligence 3 (2002), 381--396.
[15]
Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119--139.
[16]
Y. Fu, Y. Ge, Y. Zheng, Z. Yao, Y. Liu, H. Xiong, and J. Yuan. 2014. Sparse real estate ranking with online user reviews and offline moving behaviors. In Proceedings of the IEEE International Conference on Data Mining.
[17]
Yanjie Fu, Hui Xiong, Yong Ge, Zijun Yao, Yu Zheng, and Zhi-Hua Zhou. 2014. Exploiting geographic dependencies for real estate appraisal: A mutual perspective of ranking and clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1047--1056.
[18]
J. Edward Jackson and Govind S. Mudholkar. 1979. Control procedures for residuals associated with principal component analysis. Technometrics 21, 3 (1979), 341--349.
[19]
Hanwool Jang, Kwangwon Ahn, Dongshin Kim, and Yena Song. 2018. Detection and prediction of house price bubbles: Evidence from a new city. In Proceedings of the International Conference on Computational Science. Springer, 782--795.
[20]
Ping Jia, Jian-hua Dai, Yun-he Pan, and Miao-liang Zhu. 2006. Novel algorithm for attribute reduction based on mutual-information gain ratio. Journal-Zhejiang University Engineering Science 40, 6 (2006), 1041.
[21]
Asha Gowda Karegowda, A. S. Manjunath, and M. A. Jayaram. 2010. Comparative study of attribute selection using gain ratio and correlation based feature selection. International Journal of Information Technology and Knowledge Management 2, 2 (2010), 271--277.
[22]
Visit Limsombunchai. 2004. House price prediction: Hedonic price model vs. artificial neural network. In Proceedings of the New Zealand Agricultural and Resource Economics Society Conference. 25--26.
[23]
Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, and Hui Xiong. 2011. Personalized travel package recommendation. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining. IEEE, 407--416.
[24]
G. J. McLachlan and D. C. McGiffin. 1994. On the role of finite mixture models in survival analysis. Statistical Methods in Medical Research 3, 3 (1994), 211--226.
[25]
Douglas A. McManus and Sol T. Mumey. 2002. System and method for providing house price forecasts based on repeat sales model. US Patent 6,401,070.
[26]
J. Ross Quinlan. 1986. Induction of decision trees. Machine Learning 1, 1 (1986), 81--106.
[27]
Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53--65.
[28]
David Sculley. 2010. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1177--1178.
[29]
Alex J. Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199--222.
[30]
David L. Streiner. 1996. Maintaining standards: Differences between the standard deviation and standard error, and when to use each. The Canadian Journal of Psychiatry 41, 8 (1996), 498--502.
[31]
Michael E. Tipping and Christopher M. Bishop. 1999. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61, 3 (1999), 611--622.
[32]
Catherine Tucker, Juanjuan Zhang, and Ting Zhu. 2013. Days on market and home sales. The RAND Journal of Economics 44, 2 (2013), 337--360.
[33]
Xin Xu, Zeyu Huang, Jingyi Wu, Yanjie Fu, Na Luo, Weitong Chen, Jianan Wang, and Minghao Yin. 2019. Finding the key influences on the house price by finite mixture model based on the real estate data in Changchun. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 378--382.
[34]
Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge 8 Data Engineering 1 (2006), 63--77.
[35]
Hengshu Zhu, Hui Xiong, Fangshuang Tang, Qi Liu, Yong Ge, Enhong Chen, and Yanjie Fu. 2016. Days on market: Measuring liquidity in real estate markets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 393--402.
[36]
Zoran Zivkovic and Ferdinand van der Heijden. 2004. Recursive unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 5 (2004), 651--656.

Cited By

View all
  • (2022)A Multi-Source Information Learning Framework for Airbnb Price Prediction2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00009(1-7)Online publication date: Nov-2022

Index Terms

  1. Adaptive Weighted Finite Mixture Model: Identifying the Feature-Influence of Real Estate

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM/IMS Transactions on Data Science
    ACM/IMS Transactions on Data Science  Volume 1, Issue 3
    Special Issue on Urban Computing and Smart Cities
    August 2020
    217 pages
    ISSN:2691-1922
    DOI:10.1145/3424342
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 September 2020
    Online AM: 07 May 2020
    Accepted: 01 December 2019
    Revised: 01 October 2019
    Received: 01 June 2019
    Published in TDS Volume 1, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Real estate
    2. expectation maximization algorithm
    3. finite mixture model
    4. identify feature influence

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)109
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Multi-Source Information Learning Framework for Airbnb Price Prediction2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00009(1-7)Online publication date: Nov-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media