Abstract
The National Library of Wales’ Welsh Newspapers Online collection comprises over 16 million articles from historic newspapers. It is stored in NLW’s institutional repository, and is a rich source of historic text. The text of the articles has been extracted from the digitised images using OCR. This project investigates methods of determining which articles can be automatically located to places within Wales. We use machine learning, text mining and the OpenStreetMap data as a gazetteer.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: Geotagging web content. In: Proceedings of SIGIR’04, pp. 273–280 (2004)
Bird, S.: Nltk: The natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72 (2006)
Buscaldi, D., Rosso, P.: Map-based versus knowledge-based toponym disambiguation. In: Proceedings of GIR’08, pp. 19–22 (2008)
Haklay, M., Weber, P.: OpenStreetMap: user-generated street maps. IEEE Pervasive Comput. 7(4), 12–18 (2008)
Leidner, J.L., Lieberman, M.D.: Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Spec. 3(2), 5–11 (2011)
Lieberman, M.D., Samet, H., Sankaranayananan, J.: Geotagging: using proximity, sibling, and prominence clues to understand comma groups. In: GIR’10, pp. 6:1–6:8 (2010)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Sultanik, E.A., Fink, C.: Rapid geotagging and disambiguation of social media text via an indexed gazetteer. Proc. ISCRAM 12, 1–10 (2012)
Acknowledgments
We would like to thank Aberystwyth University and the National Library of Wales for their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sapstead, S., Daniel, I., Clare, A. (2015). Automatically Geotagging Articles in the Welsh Newspapers Online Collection. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXII. SGAI 2015. Springer, Cham. https://doi.org/10.1007/978-3-319-25032-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-25032-8_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25030-4
Online ISBN: 978-3-319-25032-8
eBook Packages: Computer ScienceComputer Science (R0)