ABSTRACT
News sources around the world generate constant streams of information, but effective streaming news retrieval requires an intimate understanding of the geographic content of news. This process of understanding, known as geotagging, consists of first finding words in article text that correspond to location names (toponyms), and second, assigning each toponym its correct lat/long values. The latter step, called toponym resolution, can also be considered a classification problem, where each of the possible interpretations for each toponym is classified as correct or incorrect. Hence, techniques from supervised machine learning can be applied to improve accuracy. New classification features to improve toponym resolution, termed adaptive context features, are introduced that consider a window of context around each toponym, and use geographic attributes of toponyms in the window to aid in their correct resolution. Adaptive parameters controlling the window's breadth and depth afford flexibility in managing a tradeoff between feature computation speed and resolution accuracy, allowing the features to potentially apply to a variety of textual domains. Extensive experiments with three large datasets of streaming news demonstrate the new features' effectiveness over two widely-used competing methods.
- z-Ornelas(2010)}Abas10R. Abascal-Mena and E. López-Ornelas. Geo information extraction and processing from travel narratives. In ELPUB'10: Proceedings of the 14th International Conference on Electronic Publishing, pages 363--373, Helsinki, Finland, June 2010.Google Scholar
- M. D. Adelfio, M. D. Lieberman, H. Samet, and K. A. Firozvi. Ontuition: Intuitive data exploration via ontology navigation. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 540--541, San Jose, CA, Nov. 2010. Google ScholarDigital Library
- E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-Where: Geotagging web content. In SIGIR'04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 273--280, Sheffield, UK, July 2004. Google ScholarDigital Library
- I. Anastácio, B. Martins, and P. Calado. Classifying documents according to locational relevance. In EPIA'09: Proceedings of the 14th Portuguese Conference on Artificial Intelligence, pages 598--609, Aveiro, Portugal, Oct. 2009. Google ScholarDigital Library
- W. G. Aref and H. Samet. Efficient processing of window queries in the pyramid data structure. In PODS'90: Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 265--272, Nashville, TN, Apr. 1990. Google ScholarDigital Library
- P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In SIGIR'11: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 135--144, Beijing, China, July 2011. Google ScholarDigital Library
- L. Breiman. Random forests. Machine Learning, 45 (1): 5--32, Oct. 2001. Google ScholarDigital Library
- D. Buscaldi and B. Magnini. Grounding toponyms in an Italian local news corpus. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
- ves}Alen10R. O. de Alencar, C. A. Davis, Jr., and M. A. Gonçalves. Geographical classification of documents using evidence from Wikipedia. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
- R. O. Duda, P. E. Hart, and D. G. Stork. phPattern Classification. John Wiley & Sons, New York, second edition, 2001. Google ScholarDigital Library
- E. Garbin and I. Mani. Disambiguating toponyms in news. In HLT/EMNLP'05: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 363--370, Vancouver, Canada, Oct. 2005. Google ScholarDigital Library
- Y.-H. Hu and L. Ge. A supervised machine learning approach to toponym disambiguation. In A. Scharl and K. Tochtermann, editors, The Geospatial Web, pages 117--128. Springer, London, 2007.Google ScholarCross Ref
- U. Irmak and R. Kraft. A scalable machine-learning approach for semi-structured named entity recognition. In WWW'10: Proceedings of the 19th International World Wide Web Conference, pages 461--470, Raleigh, NC, Apr. 2010. Google ScholarDigital Library
- W. Kienreich, M. Granitzer, and M. Lux. Geospatial anchoring of encyclopedia articles. In IV'06: Proceedings of the 10th International Conference on Information Visualization, pages 211--215, London, July 2006. Google ScholarDigital Library
- J. L. Leidner. Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. PhD thesis, University of Edinburgh, Edinburgh, Scotland, 2007.Google Scholar
- H. Li, R. K. Srihari, C. Niu, and W. Li. InfoXtract location normalization: a hybrid approach to geographic references in information extraction. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 39--44, Edmonton, Canada, May 2003. Google ScholarDigital Library
- M. D. Lieberman and J. Lin. You are where you edit: Locating Wikipedia users through edit histories. In ICWSM'09: Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media, pages 106--113, San Jose, CA, May 2009.Google Scholar
- M. D. Lieberman and H. Samet. Multifaceted toponym recognition for streaming news. In SIGIR'11: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 843--852, Beijing, China, July 2011. Google ScholarDigital Library
- M. D. Lieberman, H. Samet, J. Sankaranarayanan, and J. Sperling. STEWARD: Architecture of a spatio-textual search engine. In GIS'07: Proceedings of the 15th ACM International Symposium on Geographic Information Systems, pages 186--193, Seattle, WA, Nov. 2007. Google ScholarDigital Library
- Lieberman, Samet, and Sankaranarayanan}Lieb10M. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging: Using proximity, sibling, and prominence clues to understand comma groups. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
- Lieberman, Samet, and Sankaranarayanan}Lieb10bM. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging with local lexicons to build indexes for textually-specified spatial data. In ICDE'10: Proceedings of the 26th International Conference on Data Engineering, pages 201--212, Long Beach, CA, Mar. 2010.Google ScholarCross Ref
- I. Mani, J. Hitzeman, J. Richer, and D. Harris. ACE 2005 English SpatialML Annotations. Linguistic Data Consortium, Philadelphia, PA, Jan. 2008. LDC Catalog Number LDC2008T03.Google Scholar
- B. Martins, H. Manguinhas, and J. Borbinha. Extracting and exploring the geo-temporal semantics of textual resources. In ICSC'08: Proceedings of the 2nd IEEE International Conference on Semantic Computing, pages 1--9, Santa Clara, CA, Aug. 2008. Google ScholarDigital Library
- o, and Calado}Mart10B. Martins, I. Anastácio, and P. Calado. A machine learning approach for resolving place references in text. In AGILE'10: Proceedings of the 13th AGILE International Conference on Geographic Information Science, pages 221--236, Guimaraes, Portugal, May 2010.Google ScholarCross Ref
- NTS Realty Holdings. NTS Realty Holdings limited partnership announces refinancing of debt on eight multifamily properties. URL http://www.earthtimes.org/articles/show/nts-realty-holdings-limited-partnership,1062375.shtml. Accessed 16 Jan 2010.Google Scholar
- R. C. Pasley, P. D. Clough, and M. Sanderson. Geo-tagging for imprecise regions of different sizes. pages 77--82.Google Scholar
- R. S. Purves, P. Clough, C. B. Jones, A. Arampatzis, B. Bucher, D. Finch, G. Fu, H. Joho, A. K. Syed, S. Vaid, and B. Yang. The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet. IJGIS: International Journal of Geographical Information Science, 21 (7): 717--745, Aug. 2007. Google ScholarDigital Library
- T. Qin, R. Xiao, L. Fang, X. Xie, and L. Zhang. An efficient location extraction algorithm by leveraging web contextual information. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 53--60, San Jose, CA, Nov. 2010. Google ScholarDigital Library
- G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the spatial reader scopes of news sources using local lexicons. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 43--52, San Jose, CA, Nov. 2010. Google ScholarDigital Library
- E. Rauch, M. Bukatin, and K. Baker. A confidence-based framework for disambiguating geographic terms. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 50--54, Edmonton, Canada, May 2003. Google ScholarDigital Library
- Hjaltason, Morgan, and Tanin}Same03H. Samet, H. Alborzi, F. Brabec, C. Esperança, G. R. Hjaltason, F. Morgan, and E. Tanin. Use of the SAND spatial browser for digital government applications. CACM: Communications of the ACM, 46 (1): 61--64, Jan. 2003. Google ScholarDigital Library
- H. Samet, B. E. Teitler, M. D. Adelfio, and M. D. Lieberman. Adapting a map query interface for a gesturing touch screen interface. In WWW'11: Proceedings of the 20th International World Wide Web Conference, pages 257--260, Hyderabad, India, Mar. 2011. Google ScholarDigital Library
- J. Sankaranarayanan, H. Samet, B. Teitler, M. D. Lieberman, and J. Sperling. TwitterStand: News in tweets. In GIS'09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42--51, Seattle, WA, Nov. 2009. Google ScholarDigital Library
- C. A. Shaffer, H. Samet, and R. C. Nelson. QUILT: a geographic information system based on quadtrees. IJGIS: International Journal of Geographical Information Science, 4 (2): 103--131, Apr. 1990.Google ScholarCross Ref
- D. A. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In ECDL'01: Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries, pages 127--136, Darmstadt, Germany, Sept. 2001. Google ScholarDigital Library
- J. Strötgen, M. Gertz, and P. Popov. Extraction and exploration of spatio-temporal information in documents. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
- B. E. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling. NewsStand: A new view on news. In GIS'08: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 144--153, Irvine, CA, Nov. 2008. Google ScholarDigital Library
- R. Tobin, C. Grover, K. Byrne, J. Reid, and J. Walsh. Evaluation of georeferencing. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
- W. Tobler. A computer movie simulating urban growth in the detroit region. Economic Geography, 46 (2): 234--240, 1970.Google ScholarCross Ref
- A. Weichselbraun. A utility centered approach for evaluating and optimizing geo-tagging. In KDIR'09: Proceedings of the 1st International Conference on Knowledge Discovery and Information Retrieval, Madeira, Portugal, Oct. 2009.Google Scholar
- B. P. Wing and J. Baldridge. Simple supervised document geolocation with geodesic grids. In ACL'11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 955--964, Portland, OR, June 2011. Google ScholarDigital Library
- A. G. Woodruff and C. Plaunt. GIPSY: Automated geographic indexing of text documents. JASIS: Journal of the American Society for Information Science, 45 (9): 645--655, Oct. 1994. Google ScholarDigital Library
Index Terms
- Adaptive context features for toponym resolution in streaming news
Recommendations
A Coherent Unsupervised Model for Toponym Resolution
WWW '18: Proceedings of the 2018 World Wide Web ConferenceToponym Resolution, the task of assigning a location mention in a document to a geographic referent (i.e., latitude/longitude), plays a pivotal role in analyzing location-aware content. However, the ambiguities of natural language and a huge number of ...
Multifaceted toponym recognition for streaming news
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalNews sources on the Web generate constant streams of information, describing many aspects of the events that shape our world. In particular, geography plays a key role in the news, and enabling geographic retrieval of news articles involves recognizing ...
Structured toponym resolution using combined hierarchical place categories
GIR '13: Proceedings of the 7th Workshop on Geographic Information RetrievalDetermining geographic interpretations for place names, or toponyms, involves resolving multiple types of ambiguity. Place names commonly occur within lists and data tables, whose authors frequently omit qualifications (such as city or state containers) ...
Comments