Skip to main content

Enriching Geolocalized Dataset with POIs Descriptions at Large Scale

  • Conference paper
  • First Online:
  • 316 Accesses

Abstract

We present an efficient method to enrich a geolocalized dataset with contextual description about Points of Interest (POI). We implemented our solution using two large scale datasets: YFCC  [14] and Geonames  [2]. A practical problem we have encountered is the size of the manipulated data. Actually, the YFCC geolocalized dataset accounts for 45 million entries that we propose to cross with 12 millions of Geonames POIs. We show that using the Apache Spark cluster computing platform and the GeoSpark  [18] spatial join library as-is lead to inefficient computation because of the important bias in the data. We propose a method to distribute the data non uniformly according to the data bias, which greatly improves the spatial join performance. Moreover, we propose a method to select among a set of close POIs, those which are the most relevant with the YFCC entries. The resulting enriched dataset will be made publicly available and should contribute to better validate future works on large scale POI recommendation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Deng, N., Li, X.R.: Feeling a destination through the “right” photos: a machine learning model for dmos’ photo selection. Tour. Manag. 65, 267–278 (2018)

    Article  Google Scholar 

  2. Geonames: The geonames dataset. http://www.geonames.org/export. Accessed 26 Nov 2019

  3. Griesner, J., Abdessalem, T., Naacke, H., Dosne, P.: Algeospf: a hierarchical factorization model for POI recommendation. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM, pp. 87–90 (2018)

    Google Scholar 

  4. Griesner, P.-B.: Scalable models for Points-Of-Interest recommender systems. Ph.D thesis, Telecom ParisTech, Paris, tel-02085091, 7 2018. Artificial Intel-ligence [cs.AI] (2018)

    Google Scholar 

  5. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain, April 2017, pp. 427–431. Association for Computational Linguistics (2017)

    Google Scholar 

  6. Lim, K.H., Chan, J., Karunasekera, S., Leckie, C.: Personalized itinerary recommendation with queuing time awareness. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 325–334 (2017)

    Google Scholar 

  7. Lim, K.H., Chan, J., Leckie, C., Karunasekera, S.: Personalized tour recommendation based on user interests and points of interest visit durations. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 1778–1784 (2015)

    Google Scholar 

  8. Lim, K.H., Chan, J., Leckie, C., Karunasekera, S.: Personalized trip recommendation for tourists based on user interests, points of interest visit durations and visit recency. Knowl. Inf. Syst. 54(2), 375–406 (2017). https://doi.org/10.1007/s10115-017-1056-y

    Article  Google Scholar 

  9. Liu Shudong, G.V.L.J.: User modeling for point-of-interest recommendations in location-based social networks: the state of the art. Mob. Inf. Syst. (2018)

    Google Scholar 

  10. Manolopoulos, Y., Theodoridis, Y., Tsotras, L., Vassilis, J.: Spatial indexing techniques. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 2702–2707. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-39940-9

    Chapter  MATH  Google Scholar 

  11. Ni, K., et al.: Large-scale deep learning on the YFCC100M dataset. CoRR, abs/1502.03409 (2015)

    Google Scholar 

  12. Tang, L., Cai, D., Duan, Z., Ma, J., Han, M., Wang, H.: Discovering travel community for poi recommendation on location-based social networks. Complexity, 2019:8503962:1–8503962:8 (2019)

    Google Scholar 

  13. Taylor, K., Lim, K.H., Chan, J.: Travel itinerary recommendations with must-see points-of-interest. In: Companion Proceedings of the The Web Conference 2018, WWW 2018. International World Wide Web Conferences Steering Committee, pp. 1198–1205 (2018)

    Google Scholar 

  14. Thomee, B., et al.: Yfcc100m: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)

    Article  Google Scholar 

  15. Wang, X., Leckie, C., Chan, J., Kwan Hui, L., Vaithianathan, T.: Improving personalized trip recommendation to avoid crowds using pedestrian sensor data. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), pp. 25–34 (2016)

    Google Scholar 

  16. Xiaoyi Zhang, Z.D.: Spatial index. Geographic Information Science and Technology Body of Knowledge (2017)

    Google Scholar 

  17. Yonghong Yu, X.C.: A survey of point-of-interest recommendation in location-based social networks. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI (2015)

    Google Scholar 

  18. Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 70:1–70:4 (2015)

    Google Scholar 

  19. Zhao, S., Zhao, T., Yang, H., Lyu, M.R., King, I.: Stellar: spatial-temporal latent ranking for successive point-of-interest recommendation. In: AAAI 2016: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibrahima Gueye .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gueye, I., Naacke, H., Gançarski, S. (2020). Enriching Geolocalized Dataset with POIs Descriptions at Large Scale. In: Thorn, J., Gueye, A., Hejnowicz, A. (eds) Innovations and Interdisciplinary Solutions for Underserved Areas. InterSol 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 321. Springer, Cham. https://doi.org/10.1007/978-3-030-51051-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-51051-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-51050-3

  • Online ISBN: 978-3-030-51051-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics