Skip to main content

Mining POI Alias from Microblog Conversations

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10937))

Abstract

In location-based analysis for microblogs, it is important to know if two toponyms refer to the same point-of-interest, i.e., alias. However, existing online knowledge bases are often incomplete or inaccurate for toponym alias data, especially for those used in informal conversations. In this paper, we propose a method for extracting compatible toponyms from microblog conversations. We first extract a number of coordinate-associated toponyms, then use compatibility measures to identify compatible toponyms. We propose three compatibility measures, namely, geographical closeness, surface name similarity, and association similarity. We show that by combining these measures and using particle swarm optimization for weight tuning, we can reach a high matching accuracy. The finding of this paper can be useful for improving location-based analysis as well as extending existing knowledge bases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.geonames.org/.

  2. 2.

    Calculation can be found at http://www.movable-type.co.uk/scripts/latlong.html.

  3. 3.

    An algorithm for calculating edit distance can be found in https://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html.

  4. 4.

    https://dev.twitter.com/streaming/reference/get/statuses/sample.

  5. 5.

    https://dev.twitter.com/rest/reference/get/statuses/user_timeline.

  6. 6.

    If a user has posted less than 1,000 tweets, we collect all past tweets.

References

  1. Abdelhaq, H., Sengstock, C., Gertz, M.: EvenTweet: online localized event detection from Twitter. Proc. VLDB Endow. 6(12), 1326–1329 (2013)

    Article  Google Scholar 

  2. Bollegala, D., Matsuo, Y., Ishizuka, M.: Automatic discovery of personal name aliases from the web. IEEE Trans. Knowl. Data Eng. 23(6), 831–844 (2011)

    Article  Google Scholar 

  3. Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating Twitter users. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 759–768 (2010)

    Google Scholar 

  4. Dredze, M., Paul, M.J., Bergsma, S., Tran, H.: Carmen: a Twitter geolocation system with applications to public health. In: AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI, pp. 20–24 (2013)

    Google Scholar 

  5. Gelernter, J., Balaji, S.: An algorithm for local geoparsing of microtext. GeoInformatica 17(4), 635–667 (2013)

    Article  Google Scholar 

  6. Graham, M., Hale, S.A., Gaffney, D.: Where in the world are you? Geolocation and language identification in Twitter. Prof. Geogr. 66(4), 568–578 (2014)

    Article  Google Scholar 

  7. Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–774. ACM (2011)

    Google Scholar 

  8. Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 385–396. ACM (2014)

    Google Scholar 

  9. Hsiung, P., Moore, A., Neill, D., Schneider, J.: Alias detection in link data sets. In: Proceedings of the International Conference on Intelligence Analysis, vol. 4 (2005)

    Google Scholar 

  10. Huang, H., Wen, Z., Yu, D., Ji, H., Sun, Y., Han, J., Li, H.: Resolving entity morphs in censored data. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1083–1093 (2013)

    Google Scholar 

  11. Ikawa, Y., Enoki, M., Tatsubori, M.: Location inference using microblog messages. In: Proceedings of the 21st International World Wide Web Conference Companion, pp. 687–690 (2012)

    Google Scholar 

  12. Ji, Z., Sun, A., Cong, G., Han, J.: Joint recognition and linking of fine-grained locations from tweets. In: Proceedings of the 25th International Conference on World Wide Web, pp. 1271–1281 (2016)

    Google Scholar 

  13. Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766. Springer (2010)

    Google Scholar 

  14. Li, C., Sun, A.: Fine-grained location extraction from tweets with temporal awareness. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 43–52 (2014)

    Google Scholar 

  15. Li, C., Sun, A., Weng, J., He, Q.: Tweet segmentation and its application to named entity recognition. IEEE Trans. Knowl. Data Eng. 27(2), 558–570 (2015)

    Article  Google Scholar 

  16. Li, R., Lei, K.H., Khadiwala, R., Chang, K.-C.: TEDAS: a Twitter-based event detection and analysis system. In: Proceedings of 28th International Conference on Data Engineering, pp. 1273–1276 (2012)

    Google Scholar 

  17. Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs. In: Proceedings of the 22nd International World Wide Web Conference Companion, pp. 1017–1020 (2013)

    Google Scholar 

  18. Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. Association for Computational Linguistics (2011)

    Google Scholar 

  19. Lucia, W., Ferrari, E.: Egocentric: ego networks for knowledge-based short text classification. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 1079–1088. ACM (2014)

    Google Scholar 

  20. Malmasi, S., Dras, M.: Location mention detection in tweets and microblogs. In: Hasida, K., Purwarianti, A. (eds.) Computational Linguistics. CCIS, vol. 593, pp. 123–134. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-0515-2_9

    Chapter  Google Scholar 

  21. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International World Wide Web Conference, pp. 851–860 (2010)

    Google Scholar 

  22. Schulz, A., Hadjakos, A., Paulheim, H., Nachtwey, J., Mühlhäuser, M.: A multi-indicator approach for geolocalization of tweets. In: Proceedings of the Seventh International Conference on Weblogs and Social Media, pp. 573–582 (2013)

    Google Scholar 

  23. Zhang, W., Gelernter, J.: Geocoding location expressions in Twitter messages: a preference learning method. J. Spat. Inf. Sci. 2014(9), 37–70 (2014)

    Google Scholar 

  24. Zhang, Y., Szabo, C., Sheng, Q.Z.: Sense and focus: towards effective location inference and event detection on Twitter. In: Wang, J., Cellary, W., Wang, D., Wang, H., Chen, S.-C., Li, T., Zhang, Y. (eds.) WISE 2015. LNCS, vol. 9418, pp. 463–477. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26190-4_31

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Yao, L. (2018). Mining POI Alias from Microblog Conversations. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10937. Springer, Cham. https://doi.org/10.1007/978-3-319-93034-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93034-3_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93033-6

  • Online ISBN: 978-3-319-93034-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics