Abstract
We present an efficient method to enrich a geolocalized dataset with contextual description about Points of Interest (POI). We implemented our solution using two large scale datasets: YFCC [14] and Geonames [2]. A practical problem we have encountered is the size of the manipulated data. Actually, the YFCC geolocalized dataset accounts for 45 million entries that we propose to cross with 12 millions of Geonames POIs. We show that using the Apache Spark cluster computing platform and the GeoSpark [18] spatial join library as-is lead to inefficient computation because of the important bias in the data. We propose a method to distribute the data non uniformly according to the data bias, which greatly improves the spatial join performance. Moreover, we propose a method to select among a set of close POIs, those which are the most relevant with the YFCC entries. The resulting enriched dataset will be made publicly available and should contribute to better validate future works on large scale POI recommendation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Deng, N., Li, X.R.: Feeling a destination through the “right” photos: a machine learning model for dmos’ photo selection. Tour. Manag. 65, 267–278 (2018)
Geonames: The geonames dataset. http://www.geonames.org/export. Accessed 26 Nov 2019
Griesner, J., Abdessalem, T., Naacke, H., Dosne, P.: Algeospf: a hierarchical factorization model for POI recommendation. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM, pp. 87–90 (2018)
Griesner, P.-B.: Scalable models for Points-Of-Interest recommender systems. Ph.D thesis, Telecom ParisTech, Paris, tel-02085091, 7 2018. Artificial Intel-ligence [cs.AI] (2018)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain, April 2017, pp. 427–431. Association for Computational Linguistics (2017)
Lim, K.H., Chan, J., Karunasekera, S., Leckie, C.: Personalized itinerary recommendation with queuing time awareness. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 325–334 (2017)
Lim, K.H., Chan, J., Leckie, C., Karunasekera, S.: Personalized tour recommendation based on user interests and points of interest visit durations. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 1778–1784 (2015)
Lim, K.H., Chan, J., Leckie, C., Karunasekera, S.: Personalized trip recommendation for tourists based on user interests, points of interest visit durations and visit recency. Knowl. Inf. Syst. 54(2), 375–406 (2017). https://doi.org/10.1007/s10115-017-1056-y
Liu Shudong, G.V.L.J.: User modeling for point-of-interest recommendations in location-based social networks: the state of the art. Mob. Inf. Syst. (2018)
Manolopoulos, Y., Theodoridis, Y., Tsotras, L., Vassilis, J.: Spatial indexing techniques. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 2702–2707. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-39940-9
Ni, K., et al.: Large-scale deep learning on the YFCC100M dataset. CoRR, abs/1502.03409 (2015)
Tang, L., Cai, D., Duan, Z., Ma, J., Han, M., Wang, H.: Discovering travel community for poi recommendation on location-based social networks. Complexity, 2019:8503962:1–8503962:8 (2019)
Taylor, K., Lim, K.H., Chan, J.: Travel itinerary recommendations with must-see points-of-interest. In: Companion Proceedings of the The Web Conference 2018, WWW 2018. International World Wide Web Conferences Steering Committee, pp. 1198–1205 (2018)
Thomee, B., et al.: Yfcc100m: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)
Wang, X., Leckie, C., Chan, J., Kwan Hui, L., Vaithianathan, T.: Improving personalized trip recommendation to avoid crowds using pedestrian sensor data. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), pp. 25–34 (2016)
Xiaoyi Zhang, Z.D.: Spatial index. Geographic Information Science and Technology Body of Knowledge (2017)
Yonghong Yu, X.C.: A survey of point-of-interest recommendation in location-based social networks. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI (2015)
Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 70:1–70:4 (2015)
Zhao, S., Zhao, T., Yang, H., Lyu, M.R., King, I.: Stellar: spatial-temporal latent ranking for successive point-of-interest recommendation. In: AAAI 2016: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Gueye, I., Naacke, H., Gançarski, S. (2020). Enriching Geolocalized Dataset with POIs Descriptions at Large Scale. In: Thorn, J., Gueye, A., Hejnowicz, A. (eds) Innovations and Interdisciplinary Solutions for Underserved Areas. InterSol 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 321. Springer, Cham. https://doi.org/10.1007/978-3-030-51051-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-51051-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51050-3
Online ISBN: 978-3-030-51051-0
eBook Packages: Computer ScienceComputer Science (R0)