skip to main content
10.1145/2675354.2675701acmconferencesArticle/Chapter ViewAbstractPublication PagesgirConference Proceedingsconference-collections
research-article

Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers

Published:04 November 2014Publication History

ABSTRACT

This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.

References

  1. A. Crooks, A. Croitoru, A. Stefanidis, and J. Radzikowski. #Earthquake: Twitter as a distributed sensor system. Transactions in GIS, 17:124--147, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. D'Ignazio, R. Bhargava, E. Zuckerman, and L. Beck. Cliff-clavin: Determining geographic focus for news. In NewsKDD: Data Science for News Publishing, at KDD 2014, 2014.Google ScholarGoogle Scholar
  3. J. Gelernter and S. Balaji. An algorithm for local geoparsing of microtext. GeoInformatica, 17(4):635--667, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Guillén. Geoparsing web queries. In Advances in Multilingual and Multimodal Information Retrieval, volume 5152, pages 781--785. Springer, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Hu and L. Ge. A supervised machine learning approach to toponym disambiguation. In The Geospatial Web -- How geobrowsers, social software and the Web 2.0 are shaping the network society, pages 117--128. Springer, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Jones, R. Purves, A. Ruas, M. Sanderson, M. Sester, M. Kreveld, and R. Weibel. Spatial information retrieval and geographical ontologies. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 387--388, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Karimzadeh, W. Huang, S. Banerjee, J. O. Wallgrün, F. Hardisty, S. Pezanowski, P. Mitra, and A. M. MacEachren. GeoTxt: A web API to leverage place references in text. In C. Jones and R. Purves, editors, Proceedings of the 7th Workshop on Geographic Information Retrieval, pages 72--73, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Leetaru, S. Wang, G. Cao, A. Padmanabhan, and E. Shook. Mapping the global Twitter heartbeat: The geography of Twitter. First Monday, 18(5), 2013.Google ScholarGoogle Scholar
  9. J. L. Leidner. An evaluation dataset for the toponym resolution task. In Computers, Environment and Urban Systems, volume 30, 2006.Google ScholarGoogle Scholar
  10. J. L. Leidner. Toponym Resolution in Text. PhD thesis, School of Informatics, University of Edinburgh, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. MacEachren, A. Jaiswal, A. Robinson, S. Pezanowski, A. Savelyev, P. Mitra, X. Zhang, and J. Blanford. Senseplace2: GeoTwitter analytics support for situational awareness. In S. Miksch and M. Ward, editors, IEEE Conference on Visual Analytics Science and Technology, 2011.Google ScholarGoogle Scholar
  12. M. Potthast. Crowdsourcing a Wikipedia vandalism corpus. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 789--790, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M.-H. Tsou, J.-A. Yang, D. Lusher, S. Han, B. Spitzberg, J. Gawron, D. Gupta, and L. An. Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing). Cartography and Geographic Information Science, 40:1--12, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  14. B. Wong and S. Lee. Annotating legitimate disagreement in corpus construction. In Sixth International Joint Conference on Natural Language Processing, 2013.Google ScholarGoogle Scholar
  15. D. Woodward, J. Witmer, and J. Kalita. A comparison of approaches for geospatial entity extraction from Wikipedia. In IEEE Fourth International Conference on Semantic Computing (ICSC), pages 402--407, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GIR '14: Proceedings of the 8th Workshop on Geographic Information Retrieval
      November 2014
      94 pages
      ISBN:9781450331357
      DOI:10.1145/2675354

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 November 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      GIR '14 Paper Acceptance Rate11of15submissions,73%Overall Acceptance Rate46of61submissions,75%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader