Skip to main content

Automatically Geotagging Articles in the Welsh Newspapers Online Collection

  • Conference paper
  • First Online:
  • 445 Accesses

Abstract

The National Library of Wales’ Welsh Newspapers Online collection comprises over 16 million articles from historic newspapers. It is stored in NLW’s institutional repository, and is a rich source of historic text. The text of the articles has been extracted from the digitised images using OCR. This project investigates methods of determining which articles can be automatically located to places within Wales. We use machine learning, text mining and the OpenStreetMap data as a gazetteer.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: Geotagging web content. In: Proceedings of SIGIR’04, pp. 273–280 (2004)

    Google Scholar 

  2. Bird, S.: Nltk: The natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72 (2006)

    Google Scholar 

  3. Buscaldi, D., Rosso, P.: Map-based versus knowledge-based toponym disambiguation. In: Proceedings of GIR’08, pp. 19–22 (2008)

    Google Scholar 

  4. Haklay, M., Weber, P.: OpenStreetMap: user-generated street maps. IEEE Pervasive Comput. 7(4), 12–18 (2008)

    Article  Google Scholar 

  5. Leidner, J.L., Lieberman, M.D.: Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Spec. 3(2), 5–11 (2011)

    Article  Google Scholar 

  6. Lieberman, M.D., Samet, H., Sankaranayananan, J.: Geotagging: using proximity, sibling, and prominence clues to understand comma groups. In: GIR’10, pp. 6:1–6:8 (2010)

    Google Scholar 

  7. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  8. Sultanik, E.A., Fink, C.: Rapid geotagging and disambiguation of social media text via an indexed gazetteer. Proc. ISCRAM 12, 1–10 (2012)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Aberystwyth University and the National Library of Wales for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean Sapstead .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Sapstead, S., Daniel, I., Clare, A. (2015). Automatically Geotagging Articles in the Welsh Newspapers Online Collection. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXII. SGAI 2015. Springer, Cham. https://doi.org/10.1007/978-3-319-25032-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25032-8_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25030-4

  • Online ISBN: 978-3-319-25032-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics