research-article

Open access

Adaptive Weighted Finite Mixture Model: Identifying the Feature-Influence of Real Estate

Authors:

Minghao YinAuthors Info & Claims

ACM Transactions on Data Science, Volume 1, Issue 3

Article No.: 20, Pages 1 - 16

https://doi.org/10.1145/3379560

Published: 14 September 2020 Publication History

All formats PDF

Abstract

It is significant for real estate investors to understand how the construction environments and building characteristics impact the housing unit price. However, it is challenging for identifying the complex feature-influence from construction environments and building characteristics. It is also hard to alleviate the heterogeneity of real estate. In this article, we propose a framework named Adaptive Weighted Finite Mixture Model to identify the feature-influence and simultaneously alleviate the ill effect of heterogeneity. Applying this framework, we can predict the housing unit price based on the corresponding features. Besides, we discover that the feature-influence exists in the dissimilarity among similar cities. Specifically, we adaptively learn the weights of features to identify the feature-influence, and we model the estimation of the housing unit price with the feature-influence into a finite mixture model. We utilize the Principle Component Analysis algorithm to obtain a low-dimensional representation of housing features. The low-dimensional representation reduces the computational cost of model learning, and it avoids a potential catastrophe of the singular matrix inversion during the process of learning model parameters, which are estimated by the Expectation Maximization algorithm. To avoid the blind search for the latent group number used in the proposed framework, we employ the pre-clustering result as a guide of the searching range of the latent group numbers. We conduct numerous experiments on three real datasets from Shenyang, Changchun, and Harbin, which are the three provincial capital cities that have similar geography, economics, and cultures. The experimental results demonstrate the effectiveness of the proposed framework.

References

[1]

Hervé Abdi and Lynne J. Williams. 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 4 (2010), 433--459.

Digital Library

[2]

David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 1027--1035.

Digital Library

[3]

Timothy L. Bailey and Charles Elkan. 1994. Fitting a mixture model by expectation maximization to discover motifs in bipolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California, 28--36.

[4]

Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (1975), 509--517.

Digital Library

[5]

Tadeusz Caliński and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3, 1 (1974), 1--27.

[6]

John Clapp and Carmelo Giaccotto. 2002. Evaluating house price forecasts. Journal of Real Estate Research 24, 1 (2002), 1--26.

[7]

John M. Clapp and Carmelo Giaccotto. 1994. The influence of economic variables on local house price dynamics. Journal of Urban Economics 36, 2 (1994), 161--183.

[8]

Jianhua Dai and Qing Xu. 2013. Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Applied Soft Computing 13, 1 (2013), 211--221.

Digital Library

[9]

Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1--22.

[10]

Chris Ding and Xiaofeng He. 2004. K-means clustering via principal component analysis. In Proceedings of the 21st International Conference on Machine Learning. ACM, 29.

[11]

Leigh Drake. 1993. Modelling UK house prices using cointegration: An application of the Johansen technique. Applied Economics 25, 9 (1993), 1225--1228.

[12]

Mingjing Du, Shifei Ding, and Hongjie Jia. 2016. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowledge-Based Systems 99 (2016), 135--145.

Digital Library

[13]

Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani, et al. 2004. Least angle regression. The Annals of Statistics 32, 2 (2004), 407--499.

[14]

Mario A. T. Figueiredo and Anil K. Jain. 2002. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis 8 Machine Intelligence 3 (2002), 381--396.

[15]

Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119--139.

Digital Library

[16]

Y. Fu, Y. Ge, Y. Zheng, Z. Yao, Y. Liu, H. Xiong, and J. Yuan. 2014. Sparse real estate ranking with online user reviews and offline moving behaviors. In Proceedings of the IEEE International Conference on Data Mining.

[17]

Yanjie Fu, Hui Xiong, Yong Ge, Zijun Yao, Yu Zheng, and Zhi-Hua Zhou. 2014. Exploiting geographic dependencies for real estate appraisal: A mutual perspective of ranking and clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1047--1056.

Digital Library

[18]

J. Edward Jackson and Govind S. Mudholkar. 1979. Control procedures for residuals associated with principal component analysis. Technometrics 21, 3 (1979), 341--349.

[19]

Hanwool Jang, Kwangwon Ahn, Dongshin Kim, and Yena Song. 2018. Detection and prediction of house price bubbles: Evidence from a new city. In Proceedings of the International Conference on Computational Science. Springer, 782--795.

[20]

Ping Jia, Jian-hua Dai, Yun-he Pan, and Miao-liang Zhu. 2006. Novel algorithm for attribute reduction based on mutual-information gain ratio. Journal-Zhejiang University Engineering Science 40, 6 (2006), 1041.

[21]

Asha Gowda Karegowda, A. S. Manjunath, and M. A. Jayaram. 2010. Comparative study of attribute selection using gain ratio and correlation based feature selection. International Journal of Information Technology and Knowledge Management 2, 2 (2010), 271--277.

[22]

Visit Limsombunchai. 2004. House price prediction: Hedonic price model vs. artificial neural network. In Proceedings of the New Zealand Agricultural and Resource Economics Society Conference. 25--26.

[23]

Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, and Hui Xiong. 2011. Personalized travel package recommendation. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining. IEEE, 407--416.

Digital Library

[24]

G. J. McLachlan and D. C. McGiffin. 1994. On the role of finite mixture models in survival analysis. Statistical Methods in Medical Research 3, 3 (1994), 211--226.

[25]

Douglas A. McManus and Sol T. Mumey. 2002. System and method for providing house price forecasts based on repeat sales model. US Patent 6,401,070.

[26]

J. Ross Quinlan. 1986. Induction of decision trees. Machine Learning 1, 1 (1986), 81--106.

[27]

Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53--65.

Digital Library

[28]

David Sculley. 2010. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1177--1178.

Digital Library

[29]

Alex J. Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199--222.

Digital Library

[30]

David L. Streiner. 1996. Maintaining standards: Differences between the standard deviation and standard error, and when to use each. The Canadian Journal of Psychiatry 41, 8 (1996), 498--502.

[31]

Michael E. Tipping and Christopher M. Bishop. 1999. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61, 3 (1999), 611--622.

[32]

Catherine Tucker, Juanjuan Zhang, and Ting Zhu. 2013. Days on market and home sales. The RAND Journal of Economics 44, 2 (2013), 337--360.

[33]

Xin Xu, Zeyu Huang, Jingyi Wu, Yanjie Fu, Na Luo, Weitong Chen, Jianan Wang, and Minghao Yin. 2019. Finding the key influences on the house price by finite mixture model based on the real estate data in Changchun. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 378--382.

[34]

Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge 8 Data Engineering 1 (2006), 63--77.

Digital Library

[35]

Hengshu Zhu, Hui Xiong, Fangshuang Tang, Qi Liu, Yong Ge, Enhong Chen, and Yanjie Fu. 2016. Days on market: Measuring liquidity in real estate markets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 393--402.

[36]

Zoran Zivkovic and Ferdinand van der Heijden. 2004. Recursive unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 5 (2004), 651--656.

Digital Library

Cited By

Jiang LLi YLuo NWang JNing Q(2022)A Multi-Source Information Learning Framework for Airbnb Price Prediction2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00009(1-7)Online publication date: Nov-2022
https://doi.org/10.1109/ICDMW58026.2022.00009

Index Terms

Adaptive Weighted Finite Mixture Model: Identifying the Feature-Influence of Real Estate
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Finding the Key Influences on the House Price by Finite Mixture Model Based on the Real Estate Data in Changchun
Database Systems for Advanced Applications
Abstract
Nowadays it’s difficult for us to analyze the development law of real estate. What’s more, predictable house price and understandable key influences can also build a healthier real estate market. Therefore, we propose a model which can predict the ...
Finite mixture of varying coefficient model: Estimation and component selection
Abstract
Heterogeneous longitudinal data have become prevalent in medical, biological, and social studies. This paper proposes a finite mixture of varying coefficient models for handling heterogeneous populations. Each component of the mixture ...
Robust non-rigid point registration based on feature-dependant finite mixture model

In previous works on point registration based on finite mixture model, the correspondence probability is often determined by exploiting global relationship in the point set instead of considering the local point distribution. That results in a ...

Comments

Information & Contributors

Information

Published In

cover image ACM/IMS Transactions on Data Science

ACM/IMS Transactions on Data Science Volume 1, Issue 3

Special Issue on Urban Computing and Smart Cities

August 2020

217 pages

ISSN:2691-1922

DOI:10.1145/3424342

Editor:
Beng Chin Ooi
National University of Singapore, Singapore

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2020

Online AM: 07 May 2020

Accepted: 01 December 2019

Revised: 01 October 2019

Received: 01 June 2019

Published in TDS Volume 1, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
370
Total Downloads

Downloads (Last 12 months)109
Downloads (Last 6 weeks)19

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang LLi YLuo NWang JNing Q(2022)A Multi-Source Information Learning Framework for Airbnb Price Prediction2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00009(1-7)Online publication date: Nov-2022
https://doi.org/10.1109/ICDMW58026.2022.00009

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents