Abstract
Exploring massive mobile data for location-based services becomes one of the key challenges in mobile data mining. In this paper, we investigate a problem of finding a correlation between the collective behavior of mobile users and the distribution of points of interest (POIs) in a city. Specifically, we use large-scale cell tower data dumps collected from cell towers and POIs extracted from a popular social network service, Weibo. Our objective is to make use of the data from these two different types of sources to build a model for predicting the POI densities of different regions in the covered area. An application domain that may benefit from our research is a business recommendation application, where a prediction result can be used as a recommendation for opening a new store/branch. The crux of our contribution is the method of representing the collective behavior of mobile users as a histogram of connection counts over a period of time in each region. This representation ultimately enables us to apply a supervised learning algorithm to our problem in order to train a POI prediction model using the POI data set as the ground truth. We studied 12 state-of-the-art classification and regression algorithms; experimental results demonstrate the feasibility and effectiveness of the proposed method.










Similar content being viewed by others
References
Bao J, Zheng Y, Mokbel MF (2012) Location-based and preference-aware recommendation using sparse geo-social networking data. In: ACM SIGSPATIAL
Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD (1972) Statistical inference under order restrictions: The theory and application of isotonic regression. Wiley, New York
Becker RA, Caceres R, Hanson K, Loh JM, Urbanek S, Varshavsky A, Volinsky C (2011) A tale of one city: Using cellular network data for urban planning. IEEE Pervasive Computing 10(4):18–26
Birant D, St-dbscan AK (2007) An algorithm for clustering spatial–temporal data. DKE 60(1):208–221
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New York
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Chen XM, Liu WQ, Lai JH, Li Z, Lu C (2012) Face recognition via local preserving average neighborhood margin maximization and extreme learning machine. Soft Comput 16(9):1515–1523
Collins M, Schapire RE, Singer Y (2002) Logistic regression, adaboost and bregman distances. Mach Learn 48(1-3):253–285
Ghosh S, Lee K, Moorthy S (1995) Multiple scale analysis of heterogeneous elastic structures using homogenization theory and voronoi cell finite element method. IJSS 32(1):27–62
Goh JY, Taniar D (2004) Mobile data mining by location dependencies. In: IDEAL
Gokaraju B, Durbha SS, King RL, Younan NH (2011) A machine learning based spatio-temporal data mining approach for detection of harmful algal blooms in the Gulf of Mexico. IEEE J-STARS 4(3):710–720
Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108
Haykin S (1994) Neural networks: A comprehensive foundation. Prentice Hall PTR
Holmes G, Donkin A, Weka IH (1994) Witten: A machine learning workbench. In: ANZIIS
Isaacman S, Becker R, Cáceres R, Kobourov S, Martonosi M, Rowland J, Varshavsky A (2011) Identifying important places in people’s lives from cellular network data. In: Pervasive Computing
Kanasugi H, Sekimoto Y, Kurokawa M, Watanabe T, Muramatsu S, Shibasaki R (2013) Spatiotemporal route estimation consistent with human mobility using cellular network data. In: IEEE PerCom
Miller HJ, Han J (2009) Geographic data mining and knowledge discovery. CRC Press
Pan B, Zheng Y, Wilkie D, Shahabi C (2013) Crowd sensing of traffic anomalies based on human mobility and social media. In: ACM SIGSPATIAL
Quinlan JR (1996) Improved use of continuous attributes in C4.5. JAIR 4:77–90
Ratti C, Williams S, Frenchman D, Pulselli RM (2006) Mobile landscapes: using location data from cell phones for urban analysis. Environ Plan B: Planning and Design 33(5):727
Rish I (2001) An empirical study of the naive bayes classifier. In: IJCAI
Seber GAF, Lee AJ (2012) Linear regression analysis, volume 936. John Wiley & Sons
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. JRSS, Series B 53(3):683–690
Stone CJ (1985) Additive regression and other nonparametric models. Ann Stat:689–705
Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66
Toole JL, Ulm M, González MC, Bauer D (2012) Inferring land use from mobile phone activity. In: ACM UrbComp
Torgo L, Gama J (1996) Regression by classification. In: Advances in Artificial Intelligence, pp 51–60
Vapnik V (2000) The nature of statistical learning theory. Springer
Vieira MR, Frias-Martinez V, Oliver N, Frias-Martinez E (2010) Characterizing dense urban areas from mobile phone-call data: Discovery and social dynamics. In: IEEE SocialCom
Wang L, Huang YP, Luo XY, Wang Z, Luo SW (2011) Image deblurring with filters learned by extreme learning machine. Neurocomputing 74(16):2464–2474
Wang Y, Witten IH (1999) Pace regression. Technical Report 99/12, Department of Computer Science, The University of Waikato
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE TEVC 1(1):67–82
Yavaṡ G, Katsaros D, Ulusoy Ö, Manolopoulos Y (2005) A data mining approach for location prediction in mobile environments. DKE 54(2):121–146
Ye M, Yin P, Lee W-C, Lee D-L (2011) Exploiting geographical influence for collaborative point-of-interest recommendation. In: ACM SIGSPATIAL
Yuan J, Zheng Y, Xie X (2012) Discovering regions of different functions in a city using human mobility and pois. In: ACM SIGKDD
Yuan J, Zheng Y, Xie X, Sun G (2013) T-drive: Enhancing driving directions with taxi drivers’ intelligence. IEEE TKDE 25(1):220–232
Zha Z, Wang M, Zheng Y, Yang Y, Hong R, Chua T (2012) Interactive video indexing with statistical active learning. IEEE TMM 14(1):17–27
Zhang J-D, Chow C-Y (2013) iGSLR: Personalized geo-social location recommendation: A kernel density estimation approach. In: ACM SIGSPATIAL
Zheng J, Liu S, Ni LM (2013) Effective routine behavior pattern discovery from sparse mobile phone data via collaborative filtering. In: IEEE PerCom
Zheng Y, Chen Y, Xie X, Ma WY (2009) Geolife2.0: A location-based social networking service. In: IEEE MDM
Acknowledgments
R. Wang and C.-Y. Chow were partially supported by a research grant (CityU Project No. 9231131). S. Nutanong was partially supported by a CityU research grant (CityU Project No. 7200387). This work was also supported by the National Natural Science Foundation of China under the Grant 61402460.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, R., Chow, CY., Lyu, Y. et al. Exploring cell tower data dumps for supervised learning-based point-of-interest prediction (industrial paper). Geoinformatica 20, 327–349 (2016). https://doi.org/10.1007/s10707-015-0237-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-015-0237-7