ABSTRACT
This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.
- A. Crooks, A. Croitoru, A. Stefanidis, and J. Radzikowski. #Earthquake: Twitter as a distributed sensor system. Transactions in GIS, 17:124--147, 2012.Google ScholarCross Ref
- C. D'Ignazio, R. Bhargava, E. Zuckerman, and L. Beck. Cliff-clavin: Determining geographic focus for news. In NewsKDD: Data Science for News Publishing, at KDD 2014, 2014.Google Scholar
- J. Gelernter and S. Balaji. An algorithm for local geoparsing of microtext. GeoInformatica, 17(4):635--667, 2013. Google ScholarDigital Library
- R. Guillén. Geoparsing web queries. In Advances in Multilingual and Multimodal Information Retrieval, volume 5152, pages 781--785. Springer, 2008.Google ScholarDigital Library
- Y. Hu and L. Ge. A supervised machine learning approach to toponym disambiguation. In The Geospatial Web -- How geobrowsers, social software and the Web 2.0 are shaping the network society, pages 117--128. Springer, 2007.Google ScholarCross Ref
- C. Jones, R. Purves, A. Ruas, M. Sanderson, M. Sester, M. Kreveld, and R. Weibel. Spatial information retrieval and geographical ontologies. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 387--388, 2002. Google ScholarDigital Library
- M. Karimzadeh, W. Huang, S. Banerjee, J. O. Wallgrün, F. Hardisty, S. Pezanowski, P. Mitra, and A. M. MacEachren. GeoTxt: A web API to leverage place references in text. In C. Jones and R. Purves, editors, Proceedings of the 7th Workshop on Geographic Information Retrieval, pages 72--73, 2013. Google ScholarDigital Library
- K. Leetaru, S. Wang, G. Cao, A. Padmanabhan, and E. Shook. Mapping the global Twitter heartbeat: The geography of Twitter. First Monday, 18(5), 2013.Google Scholar
- J. L. Leidner. An evaluation dataset for the toponym resolution task. In Computers, Environment and Urban Systems, volume 30, 2006.Google Scholar
- J. L. Leidner. Toponym Resolution in Text. PhD thesis, School of Informatics, University of Edinburgh, 2007.Google ScholarDigital Library
- A. MacEachren, A. Jaiswal, A. Robinson, S. Pezanowski, A. Savelyev, P. Mitra, X. Zhang, and J. Blanford. Senseplace2: GeoTwitter analytics support for situational awareness. In S. Miksch and M. Ward, editors, IEEE Conference on Visual Analytics Science and Technology, 2011.Google Scholar
- M. Potthast. Crowdsourcing a Wikipedia vandalism corpus. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 789--790, 2010. Google ScholarDigital Library
- M.-H. Tsou, J.-A. Yang, D. Lusher, S. Han, B. Spitzberg, J. Gawron, D. Gupta, and L. An. Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing). Cartography and Geographic Information Science, 40:1--12, 2013.Google ScholarCross Ref
- B. Wong and S. Lee. Annotating legitimate disagreement in corpus construction. In Sixth International Joint Conference on Natural Language Processing, 2013.Google Scholar
- D. Woodward, J. Witmer, and J. Kalita. A comparison of approaches for geospatial entity extraction from Wikipedia. In IEEE Fourth International Conference on Semantic Computing (ICSC), pages 402--407, 2010. Google ScholarDigital Library
Index Terms
Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers
Recommendations
Behavior Analysis of Microblog Users Based on Transitions in Posting Activities
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & ServicesIn recent years, such microblogs as Twitter have spread widely over the world. Twitter, which enables instant text communications among users, was launched in 2006. In 2012, its Japanese users exceeded 29.9 million. Useful functions related to posting a ...
Extracting and Summarizing Situational Information from the Twitter Social Media during Disasters
Microblogging sites like Twitter have become important sources of real-time information during disaster events. A large amount of valuable situational information is posted in these sites during disasters; however, the information is dispersed among ...
A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?
ICHIT'11: Proceedings of the 5th international conference on Convergence and hybrid information technologyMicroblogging is a new informal communication medium of blogging that differs from a traditional blog in which content is much shorter. Microbloggers post about topics that describe their current status. Twitter is a popular microblogging service and ...
Comments