skip to main content
10.1145/2348283.2348381acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Adaptive context features for toponym resolution in streaming news

Published:12 August 2012Publication History

ABSTRACT

News sources around the world generate constant streams of information, but effective streaming news retrieval requires an intimate understanding of the geographic content of news. This process of understanding, known as geotagging, consists of first finding words in article text that correspond to location names (toponyms), and second, assigning each toponym its correct lat/long values. The latter step, called toponym resolution, can also be considered a classification problem, where each of the possible interpretations for each toponym is classified as correct or incorrect. Hence, techniques from supervised machine learning can be applied to improve accuracy. New classification features to improve toponym resolution, termed adaptive context features, are introduced that consider a window of context around each toponym, and use geographic attributes of toponyms in the window to aid in their correct resolution. Adaptive parameters controlling the window's breadth and depth afford flexibility in managing a tradeoff between feature computation speed and resolution accuracy, allowing the features to potentially apply to a variety of textual domains. Extensive experiments with three large datasets of streaming news demonstrate the new features' effectiveness over two widely-used competing methods.

References

  1. z-Ornelas(2010)}Abas10R. Abascal-Mena and E. López-Ornelas. Geo information extraction and processing from travel narratives. In ELPUB'10: Proceedings of the 14th International Conference on Electronic Publishing, pages 363--373, Helsinki, Finland, June 2010.Google ScholarGoogle Scholar
  2. M. D. Adelfio, M. D. Lieberman, H. Samet, and K. A. Firozvi. Ontuition: Intuitive data exploration via ontology navigation. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 540--541, San Jose, CA, Nov. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-Where: Geotagging web content. In SIGIR'04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 273--280, Sheffield, UK, July 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Anastácio, B. Martins, and P. Calado. Classifying documents according to locational relevance. In EPIA'09: Proceedings of the 14th Portuguese Conference on Artificial Intelligence, pages 598--609, Aveiro, Portugal, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. G. Aref and H. Samet. Efficient processing of window queries in the pyramid data structure. In PODS'90: Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 265--272, Nashville, TN, Apr. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In SIGIR'11: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 135--144, Beijing, China, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Breiman. Random forests. Machine Learning, 45 (1): 5--32, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Buscaldi and B. Magnini. Grounding toponyms in an Italian local news corpus. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ves}Alen10R. O. de Alencar, C. A. Davis, Jr., and M. A. Gonçalves. Geographical classification of documents using evidence from Wikipedia. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. O. Duda, P. E. Hart, and D. G. Stork. phPattern Classification. John Wiley & Sons, New York, second edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Garbin and I. Mani. Disambiguating toponyms in news. In HLT/EMNLP'05: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 363--370, Vancouver, Canada, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y.-H. Hu and L. Ge. A supervised machine learning approach to toponym disambiguation. In A. Scharl and K. Tochtermann, editors, The Geospatial Web, pages 117--128. Springer, London, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. U. Irmak and R. Kraft. A scalable machine-learning approach for semi-structured named entity recognition. In WWW'10: Proceedings of the 19th International World Wide Web Conference, pages 461--470, Raleigh, NC, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Kienreich, M. Granitzer, and M. Lux. Geospatial anchoring of encyclopedia articles. In IV'06: Proceedings of the 10th International Conference on Information Visualization, pages 211--215, London, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. L. Leidner. Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. PhD thesis, University of Edinburgh, Edinburgh, Scotland, 2007.Google ScholarGoogle Scholar
  16. H. Li, R. K. Srihari, C. Niu, and W. Li. InfoXtract location normalization: a hybrid approach to geographic references in information extraction. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 39--44, Edmonton, Canada, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. D. Lieberman and J. Lin. You are where you edit: Locating Wikipedia users through edit histories. In ICWSM'09: Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media, pages 106--113, San Jose, CA, May 2009.Google ScholarGoogle Scholar
  18. M. D. Lieberman and H. Samet. Multifaceted toponym recognition for streaming news. In SIGIR'11: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 843--852, Beijing, China, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. D. Lieberman, H. Samet, J. Sankaranarayanan, and J. Sperling. STEWARD: Architecture of a spatio-textual search engine. In GIS'07: Proceedings of the 15th ACM International Symposium on Geographic Information Systems, pages 186--193, Seattle, WA, Nov. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lieberman, Samet, and Sankaranarayanan}Lieb10M. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging: Using proximity, sibling, and prominence clues to understand comma groups. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lieberman, Samet, and Sankaranarayanan}Lieb10bM. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging with local lexicons to build indexes for textually-specified spatial data. In ICDE'10: Proceedings of the 26th International Conference on Data Engineering, pages 201--212, Long Beach, CA, Mar. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. I. Mani, J. Hitzeman, J. Richer, and D. Harris. ACE 2005 English SpatialML Annotations. Linguistic Data Consortium, Philadelphia, PA, Jan. 2008. LDC Catalog Number LDC2008T03.Google ScholarGoogle Scholar
  23. B. Martins, H. Manguinhas, and J. Borbinha. Extracting and exploring the geo-temporal semantics of textual resources. In ICSC'08: Proceedings of the 2nd IEEE International Conference on Semantic Computing, pages 1--9, Santa Clara, CA, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. o, and Calado}Mart10B. Martins, I. Anastácio, and P. Calado. A machine learning approach for resolving place references in text. In AGILE'10: Proceedings of the 13th AGILE International Conference on Geographic Information Science, pages 221--236, Guimaraes, Portugal, May 2010.Google ScholarGoogle ScholarCross RefCross Ref
  25. NTS Realty Holdings. NTS Realty Holdings limited partnership announces refinancing of debt on eight multifamily properties. URL http://www.earthtimes.org/articles/show/nts-realty-holdings-limited-partnership,1062375.shtml. Accessed 16 Jan 2010.Google ScholarGoogle Scholar
  26. R. C. Pasley, P. D. Clough, and M. Sanderson. Geo-tagging for imprecise regions of different sizes. pages 77--82.Google ScholarGoogle Scholar
  27. R. S. Purves, P. Clough, C. B. Jones, A. Arampatzis, B. Bucher, D. Finch, G. Fu, H. Joho, A. K. Syed, S. Vaid, and B. Yang. The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet. IJGIS: International Journal of Geographical Information Science, 21 (7): 717--745, Aug. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Qin, R. Xiao, L. Fang, X. Xie, and L. Zhang. An efficient location extraction algorithm by leveraging web contextual information. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 53--60, San Jose, CA, Nov. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the spatial reader scopes of news sources using local lexicons. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 43--52, San Jose, CA, Nov. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Rauch, M. Bukatin, and K. Baker. A confidence-based framework for disambiguating geographic terms. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 50--54, Edmonton, Canada, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hjaltason, Morgan, and Tanin}Same03H. Samet, H. Alborzi, F. Brabec, C. Esperança, G. R. Hjaltason, F. Morgan, and E. Tanin. Use of the SAND spatial browser for digital government applications. CACM: Communications of the ACM, 46 (1): 61--64, Jan. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Samet, B. E. Teitler, M. D. Adelfio, and M. D. Lieberman. Adapting a map query interface for a gesturing touch screen interface. In WWW'11: Proceedings of the 20th International World Wide Web Conference, pages 257--260, Hyderabad, India, Mar. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Sankaranarayanan, H. Samet, B. Teitler, M. D. Lieberman, and J. Sperling. TwitterStand: News in tweets. In GIS'09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42--51, Seattle, WA, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. A. Shaffer, H. Samet, and R. C. Nelson. QUILT: a geographic information system based on quadtrees. IJGIS: International Journal of Geographical Information Science, 4 (2): 103--131, Apr. 1990.Google ScholarGoogle ScholarCross RefCross Ref
  35. D. A. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In ECDL'01: Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries, pages 127--136, Darmstadt, Germany, Sept. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Strötgen, M. Gertz, and P. Popov. Extraction and exploration of spatio-temporal information in documents. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. E. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling. NewsStand: A new view on news. In GIS'08: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 144--153, Irvine, CA, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. Tobin, C. Grover, K. Byrne, J. Reid, and J. Walsh. Evaluation of georeferencing. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. W. Tobler. A computer movie simulating urban growth in the detroit region. Economic Geography, 46 (2): 234--240, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  40. A. Weichselbraun. A utility centered approach for evaluating and optimizing geo-tagging. In KDIR'09: Proceedings of the 1st International Conference on Knowledge Discovery and Information Retrieval, Madeira, Portugal, Oct. 2009.Google ScholarGoogle Scholar
  41. B. P. Wing and J. Baldridge. Simple supervised document geolocation with geodesic grids. In ACL'11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 955--964, Portland, OR, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. G. Woodruff and C. Plaunt. GIPSY: Automated geographic indexing of text documents. JASIS: Journal of the American Society for Information Science, 45 (9): 645--655, Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive context features for toponym resolution in streaming news

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
      August 2012
      1236 pages
      ISBN:9781450314725
      DOI:10.1145/2348283

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader