skip to main content
10.1145/1463434.1463459acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Mapping geographic coverage of the web

Published:05 November 2008Publication History

ABSTRACT

In this paper, we describe a methodology to estimate the geographic coverage of the web without the need for secondary knowledge or complex geo-tagging. This is achieved by randomly selecting toponyms from the Ordnance Survey 50K gazetteer to create search queries and thus gather document counts from various web sources for Great Britain. The same gazetteer is then used to geo-code the results and enable mapping. To validate our approach, and demonstrate the effects of geo/non-geo and geo/geo ambiguity, we mapped the selected toponyms to Geograph, a community project that contains user generated geo-tagged photographs of the UK. Although success varies with resolution, the proposed approach is likely sufficient to be reliably used by applications exploring the geographic coverage of the web for cases where references to settlements are likely to be common. In our case, we applied the method to produce maps of web coverage for a range of sources at a resolution of 30km.

References

  1. Amitay, E., N. Har'El, R. Sivan, and A. Soffer, Web-a-Where: Geotagging Web Content, in Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. 2004, ACM: Sheffield, United Kingdom. p. 273--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Backstrom, L., J. Kleinberg, R. Kumar, and J. Novak, Spatial Variation in Search Engine Queries, in Proceeding of the 18th international conference on World Wide Web. 2008, ACM: Beijing, China. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brunner, T. (2008), 'Geographic Information Retrieval: Identifikation der geographischen Lage von Zeitungsartikeln', Master's thesis, Geographisches Institut.Google ScholarGoogle Scholar
  4. Census, General Register Office for Scotland, Census: Standard Area Statistics (Scotland) {Computer File}. 2001, ESRC/JISC Census Programme, Census Dissemination Unit, MIMAS (University of Manchester).Google ScholarGoogle Scholar
  5. Census, Office for National Statistics, Census: Standard Area Statistics (England and Wales) {Computer File}. 2001, ESRC/JISC Census Programme, Census Dissemination Unit, MIMAS (University of Manchester).Google ScholarGoogle Scholar
  6. Chakrabrati, S., Mining the Web: Analysis of Hypertext and Semi Structured Data. 2002: Morgan Kaufmann.Google ScholarGoogle Scholar
  7. Cimiano, P. and S. Staab, Learning by Googling, in SIGKDD Explorations (Newsletter). 2004. p. 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dodge, M. and R. Kitchin, Mapping Cyberspace. 2001, New York: Routledge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Egenhofer, M. Toward the Semantic Geospatial Web. in 10th ACM International Symposium on Advances in Geographic Information Systems 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Goodchild, M. F., Citizens as Sensors: The World of Volunteered Geography. GeoJournal, 2007. 69(4): p. 211--221.Google ScholarGoogle Scholar
  11. Gulli, A. and A. Signorini. The Indexable Web Is More Than 11.5 Billion Pages. in WWW '05: Special Interest tracks and posters of the 14th International Conference on World Wide Web. 2005: ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hill, L. L., Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints, in Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries. 2000, Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Himmelstein, M., Local Search: The Internet Is the Yellow Pages. Computer, 2005. 38(2): p. 26--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jones, C., B., H. Alani, and D. Tudhope, Geographical Information Retrieval with Ontologies of Place, in Proceedings of the International Conference on Spatial Information Theory: Foundations of Geographic Information Science. 2001, Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jones, C. B.; Purves, R. S.; Clough, P. D. & Joho., H., 'Modelling vague places with knowledge from the Web', International Journal of Geographical Information Science, 2008, 22(10), 1045--1065. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Keller, F. and M. Lapata, Using the Web to Obtain Frequencies for Unseen Bigrams. Computational Linguistics, 2003. 29(3): p. 459--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kilgarriff, A. and G. Grefenstette, Introduction to the Special Issue on the Web as Corpus. Computational Linguistics, 2003. 29(3): p. 333--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Larson, R. Geographic Information Retrieval and Spatial Browsing. in Geographic Information Systems and Libraries: Patrons, Maps, and Spatial Information. 1996: Urbana-Champaign: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign.Google ScholarGoogle Scholar
  19. Li, H., R. K. Srihari, C. Niu, and W. Li, Infoxtract Location Normalization: A Hybrid Approach to Geographic References in Information Extraction, in Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1. 2003, Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lin, J. and A. Halavais, Geographical Distribution of Blogs in the United States. Webeology, 2006. 3(4).Google ScholarGoogle Scholar
  21. Markowetz, A., T. Brinkhoff, and B. Seeger. Geographic Information Retrieval. in 3rd International Workshop on Web Dynamics {online: http://dbs.mathematik.uni-marburg.de/publications/myPapers/2004/WebDyn2004.pdf}. 2004.Google ScholarGoogle Scholar
  22. McCurley, K. S. Geospatial Mapping and Navigation of the Web. in Proceedings of the 10th international conference on World Wide Web. 2001. Hong Kong, Hong Kong: ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mikheev, A., M. Moens, and C. Grover, Named Entity Recognition without Gazetteers, in Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics. 1999, Association for Computational Linguistics: Bergen, Norway. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Monroe, G., J. French, and A. Powell, Obtaining Language Models of Web Collections Using Query-Based Sampling Techniques, in Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3 - Volume 3. 2002, IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Purves, R., P. Clough, and H. Joho. Identifying Imprecise Regions for Geographic Information Retrieval Using the Web. in GISRUK 2005 - 13th Annual Conference on GIS Research UK. 2005.Google ScholarGoogle Scholar
  26. Rauch, E., M. Bukatin, and K. Baker, A Confidence-Based Framework for Disambiguating Geographic Terms, in Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1. 2003, Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Resnik, P. and N. A. Smith, The Web as a Parallel Corpus. Computational Linguistics, 2003. 29(3): p. 349--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sanderson, M. and J. Kohler. Analyzing Geographic Queries. in SIGIR 2004 - Workshop on Geographic Information Retrieval. 2004.Google ScholarGoogle Scholar
  29. Schockaert, S. and M. De Cock. Neighborhood Restrictions in Geographic Ir. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007. Amsterdam, The Netherlands: ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Smith, D. A. and G. S. Mann. Bootstrapping Toponym Classifiers. in The HLT-NAACL Workshop on Analysis of Geographic References. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Srivastava, J. and R. Cooley, Web Business Intelligence: Mining the Web for Actionable Knowledge. 2003, INFORMS. p. 191--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tezuka, T. and K. Tanaka. Landmark Extraction: A Web Mining Approach. in COSIT 2005 - Conference on Spatial Information Theory. 2005.Google ScholarGoogle Scholar
  33. Tobler, W. R. (1979), 'Smooth Pycnophylactic Interpolation for Geographical Regions', Journal of the American Statistical Association 74(367), 519--530.Google ScholarGoogle Scholar
  34. Zook, M., The Geographies of the Internet, in Annual Review of Information Science and Technology, B. Cronin, Editor. 2005. p. 53--78.Google ScholarGoogle Scholar

Index Terms

  1. Mapping geographic coverage of the web

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            GIS '08: Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
            November 2008
            559 pages
            ISBN:9781605583235
            DOI:10.1145/1463434

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 November 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate220of1,116submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader