research-article

Adaptive context features for toponym resolution in streaming news

Authors:
Michael D. Lieberman

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Hanan Samet

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalAugust 2012Pages 731–740https://doi.org/10.1145/2348283.2348381

Published:12 August 2012Publication History

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Pages 731–740

ABSTRACT

News sources around the world generate constant streams of information, but effective streaming news retrieval requires an intimate understanding of the geographic content of news. This process of understanding, known as geotagging, consists of first finding words in article text that correspond to location names (toponyms), and second, assigning each toponym its correct lat/long values. The latter step, called toponym resolution, can also be considered a classification problem, where each of the possible interpretations for each toponym is classified as correct or incorrect. Hence, techniques from supervised machine learning can be applied to improve accuracy. New classification features to improve toponym resolution, termed adaptive context features, are introduced that consider a window of context around each toponym, and use geographic attributes of toponyms in the window to aid in their correct resolution. Adaptive parameters controlling the window's breadth and depth afford flexibility in managing a tradeoff between feature computation speed and resolution accuracy, allowing the features to potentially apply to a variety of textual domains. Extensive experiments with three large datasets of streaming news demonstrate the new features' effectiveness over two widely-used competing methods.

References

z-Ornelas(2010)}Abas10R. Abascal-Mena and E. López-Ornelas. Geo information extraction and processing from travel narratives. In ELPUB'10: Proceedings of the 14th International Conference on Electronic Publishing, pages 363--373, Helsinki, Finland, June 2010.Google Scholar
M. D. Adelfio, M. D. Lieberman, H. Samet, and K. A. Firozvi. Ontuition: Intuitive data exploration via ontology navigation. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 540--541, San Jose, CA, Nov. 2010. Google ScholarDigital Library
E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-Where: Geotagging web content. In SIGIR'04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 273--280, Sheffield, UK, July 2004. Google ScholarDigital Library
I. Anastácio, B. Martins, and P. Calado. Classifying documents according to locational relevance. In EPIA'09: Proceedings of the 14th Portuguese Conference on Artificial Intelligence, pages 598--609, Aveiro, Portugal, Oct. 2009. Google ScholarDigital Library
W. G. Aref and H. Samet. Efficient processing of window queries in the pyramid data structure. In PODS'90: Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 265--272, Nashville, TN, Apr. 1990. Google ScholarDigital Library
P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In SIGIR'11: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 135--144, Beijing, China, July 2011. Google ScholarDigital Library
L. Breiman. Random forests. Machine Learning, 45 (1): 5--32, Oct. 2001. Google ScholarDigital Library
D. Buscaldi and B. Magnini. Grounding toponyms in an Italian local news corpus. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
ves}Alen10R. O. de Alencar, C. A. Davis, Jr., and M. A. Gonçalves. Geographical classification of documents using evidence from Wikipedia. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
R. O. Duda, P. E. Hart, and D. G. Stork. phPattern Classification. John Wiley & Sons, New York, second edition, 2001. Google ScholarDigital Library
E. Garbin and I. Mani. Disambiguating toponyms in news. In HLT/EMNLP'05: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 363--370, Vancouver, Canada, Oct. 2005. Google ScholarDigital Library
Y.-H. Hu and L. Ge. A supervised machine learning approach to toponym disambiguation. In A. Scharl and K. Tochtermann, editors, The Geospatial Web, pages 117--128. Springer, London, 2007.Google ScholarCross Ref
U. Irmak and R. Kraft. A scalable machine-learning approach for semi-structured named entity recognition. In WWW'10: Proceedings of the 19th International World Wide Web Conference, pages 461--470, Raleigh, NC, Apr. 2010. Google ScholarDigital Library
W. Kienreich, M. Granitzer, and M. Lux. Geospatial anchoring of encyclopedia articles. In IV'06: Proceedings of the 10th International Conference on Information Visualization, pages 211--215, London, July 2006. Google ScholarDigital Library
J. L. Leidner. Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. PhD thesis, University of Edinburgh, Edinburgh, Scotland, 2007.Google Scholar
H. Li, R. K. Srihari, C. Niu, and W. Li. InfoXtract location normalization: a hybrid approach to geographic references in information extraction. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 39--44, Edmonton, Canada, May 2003. Google ScholarDigital Library
M. D. Lieberman and J. Lin. You are where you edit: Locating Wikipedia users through edit histories. In ICWSM'09: Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media, pages 106--113, San Jose, CA, May 2009.Google Scholar
M. D. Lieberman and H. Samet. Multifaceted toponym recognition for streaming news. In SIGIR'11: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 843--852, Beijing, China, July 2011. Google ScholarDigital Library
M. D. Lieberman, H. Samet, J. Sankaranarayanan, and J. Sperling. STEWARD: Architecture of a spatio-textual search engine. In GIS'07: Proceedings of the 15th ACM International Symposium on Geographic Information Systems, pages 186--193, Seattle, WA, Nov. 2007. Google ScholarDigital Library
Lieberman, Samet, and Sankaranarayanan}Lieb10M. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging: Using proximity, sibling, and prominence clues to understand comma groups. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
Lieberman, Samet, and Sankaranarayanan}Lieb10bM. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging with local lexicons to build indexes for textually-specified spatial data. In ICDE'10: Proceedings of the 26th International Conference on Data Engineering, pages 201--212, Long Beach, CA, Mar. 2010.Google ScholarCross Ref
I. Mani, J. Hitzeman, J. Richer, and D. Harris. ACE 2005 English SpatialML Annotations. Linguistic Data Consortium, Philadelphia, PA, Jan. 2008. LDC Catalog Number LDC2008T03.Google Scholar
B. Martins, H. Manguinhas, and J. Borbinha. Extracting and exploring the geo-temporal semantics of textual resources. In ICSC'08: Proceedings of the 2nd IEEE International Conference on Semantic Computing, pages 1--9, Santa Clara, CA, Aug. 2008. Google ScholarDigital Library
o, and Calado}Mart10B. Martins, I. Anastácio, and P. Calado. A machine learning approach for resolving place references in text. In AGILE'10: Proceedings of the 13th AGILE International Conference on Geographic Information Science, pages 221--236, Guimaraes, Portugal, May 2010.Google ScholarCross Ref
NTS Realty Holdings. NTS Realty Holdings limited partnership announces refinancing of debt on eight multifamily properties. URL http://www.earthtimes.org/articles/show/nts-realty-holdings-limited-partnership,1062375.shtml. Accessed 16 Jan 2010.Google Scholar
R. C. Pasley, P. D. Clough, and M. Sanderson. Geo-tagging for imprecise regions of different sizes. pages 77--82.Google Scholar
R. S. Purves, P. Clough, C. B. Jones, A. Arampatzis, B. Bucher, D. Finch, G. Fu, H. Joho, A. K. Syed, S. Vaid, and B. Yang. The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet. IJGIS: International Journal of Geographical Information Science, 21 (7): 717--745, Aug. 2007. Google ScholarDigital Library
T. Qin, R. Xiao, L. Fang, X. Xie, and L. Zhang. An efficient location extraction algorithm by leveraging web contextual information. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 53--60, San Jose, CA, Nov. 2010. Google ScholarDigital Library
G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the spatial reader scopes of news sources using local lexicons. In GIS'10: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 43--52, San Jose, CA, Nov. 2010. Google ScholarDigital Library
E. Rauch, M. Bukatin, and K. Baker. A confidence-based framework for disambiguating geographic terms. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 50--54, Edmonton, Canada, May 2003. Google ScholarDigital Library
Hjaltason, Morgan, and Tanin}Same03H. Samet, H. Alborzi, F. Brabec, C. Esperança, G. R. Hjaltason, F. Morgan, and E. Tanin. Use of the SAND spatial browser for digital government applications. CACM: Communications of the ACM, 46 (1): 61--64, Jan. 2003. Google ScholarDigital Library
H. Samet, B. E. Teitler, M. D. Adelfio, and M. D. Lieberman. Adapting a map query interface for a gesturing touch screen interface. In WWW'11: Proceedings of the 20th International World Wide Web Conference, pages 257--260, Hyderabad, India, Mar. 2011. Google ScholarDigital Library
J. Sankaranarayanan, H. Samet, B. Teitler, M. D. Lieberman, and J. Sperling. TwitterStand: News in tweets. In GIS'09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42--51, Seattle, WA, Nov. 2009. Google ScholarDigital Library
C. A. Shaffer, H. Samet, and R. C. Nelson. QUILT: a geographic information system based on quadtrees. IJGIS: International Journal of Geographical Information Science, 4 (2): 103--131, Apr. 1990.Google ScholarCross Ref
D. A. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In ECDL'01: Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries, pages 127--136, Darmstadt, Germany, Sept. 2001. Google ScholarDigital Library
J. Strötgen, M. Gertz, and P. Popov. Extraction and exploration of spatio-temporal information in documents. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
B. E. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling. NewsStand: A new view on news. In GIS'08: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 144--153, Irvine, CA, Nov. 2008. Google ScholarDigital Library
R. Tobin, C. Grover, K. Byrne, J. Reid, and J. Walsh. Evaluation of georeferencing. In GIR'10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, Feb. 2010. Google ScholarDigital Library
W. Tobler. A computer movie simulating urban growth in the detroit region. Economic Geography, 46 (2): 234--240, 1970.Google ScholarCross Ref
A. Weichselbraun. A utility centered approach for evaluating and optimizing geo-tagging. In KDIR'09: Proceedings of the 1st International Conference on Knowledge Discovery and Information Retrieval, Madeira, Portugal, Oct. 2009.Google Scholar
B. P. Wing and J. Baldridge. Simple supervised document geolocation with geodesic grids. In ACL'11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 955--964, Portland, OR, June 2011. Google ScholarDigital Library
A. G. Woodruff and C. Plaunt. GIPSY: Automated geographic indexing of text documents. JASIS: Journal of the American Society for Information Science, 45 (9): 645--655, Oct. 1994. Google ScholarDigital Library

Index Terms

Adaptive context features for toponym resolution in streaming news
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

A Coherent Unsupervised Model for Toponym Resolution
WWW '18: Proceedings of the 2018 World Wide Web Conference

Toponym Resolution, the task of assigning a location mention in a document to a geographic referent (i.e., latitude/longitude), plays a pivotal role in analyzing location-aware content. However, the ambiguities of natural language and a huge number of ...
Read More
Multifaceted toponym recognition for streaming news
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

News sources on the Web generate constant streams of information, describing many aspects of the events that shape our world. In particular, geography plays a key role in the news, and enabling geographic retrieval of news articles involves recognizing ...
Read More
Structured toponym resolution using combined hierarchical place categories
GIR '13: Proceedings of the 7th Workshop on Geographic Information Retrieval

Determining geographic interpretations for place names, or toponyms, involves resolving multiple types of ambiguity. Place names commonly occur within lists and data tables, whose authors frequently omit qualifications (such as city or state containers) ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
August 2012
1236 pages
ISBN:9781450314725
DOI:10.1145/2348283
General Chair:
William Hersh
Oregon Health & Science University, USA
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, USA
,
Yoelle Maarek
Yahoo! Research, Israel
,
Mark Sanderson
Royal Melbourne Institute of Technology, Australia
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adaptive context
geotagging
machine learning
streaming news
toponym resolution
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 58
  Total Citations
  View Citations
- 544
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive context features for toponym resolution in streaming news

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Coherent Unsupervised Model for Toponym Resolution

Multifaceted toponym recognition for streaming news

Structured toponym resolution using combined hierarchical place categories