MapReader: a computer vision pipeline for the semantic exploration of maps at scale

Authors:
Kasra Hosseini

The Alan Turing Institute, UK

The Alan Turing Institute, UK
View Profile

,
Daniel C. S. Wilson

The Alan Turing Institute, UK

The Alan Turing Institute, UK
View Profile

,
Kaspar Beelen

The Alan Turing Institute, UK

The Alan Turing Institute, UK
View Profile

,
Katherine McDonough

The Alan Turing Institute, UK

The Alan Turing Institute, UK
View Profile

GeoHumanities '22: Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial HumanitiesNovember 2022Pages 8–19https://doi.org/10.1145/3557919.3565812

Published:11 November 2022Publication History

GeoHumanities '22: Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities

Pages 8–19

ABSTRACT

We present MapReader, a free, open-source software library written in Python for analyzing large map collections. MapReader allows users with little computer vision expertise to i) retrieve maps via web-servers; ii) preprocess and divide them into patches; iii) annotate patches; iv) train, fine-tune, and evaluate deep neural network models; and v) create structured data about map content. We demonstrate how MapReader enables historians to interpret a collection of ≈16K nineteenth-century maps of Britain (≈30.5M patches), foregrounding the challenge of translating visual markers into machine-readable data. We present a case study focusing on rail and buildings. We also show how the outputs from the MapReader pipeline can be linked to other, external datasets. We release ≈62K manually annotated patches used here for training and evaluating the models.

References

Martin Abadi et al. 2016. Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.Google Scholar
James R Akerman. 2017. Decolonizing the map: Cartography from colony to nation. University of Chicago Press.Google Scholar
Mariona Coll Ardanuy, Kaspar Beelen, Jon Lawrence, Katherine McDonough, Federico Nanni, Joshua Rhodes, Giorgia Tolfo, and Daniel CS Wilson. 2021. Station to station: linking and enriching historical british railway data. Proceedings http://ceur-ws.org ISSN, 1613, 0073.Google Scholar
Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, and Federico Nanni. 2020. A deep learning approach to geographical candidate selection through toponym matching. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, 385--388.Google ScholarDigital Library
Taylor Arnold and Lauren Tilton. 2019. Distant viewing: analyzing large visual corpora. Digital Scholarship in the Humanities, 34, Supplement_1, i3-i16.Google ScholarCross Ref
Brian Baily. 2007. The extraction of digital vector data from historic land use maps of great britain using image processing techniques. E-perimetron, 2, 4, 209--223.Google Scholar
Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18, 9, 509--517.Google ScholarDigital Library
Dan Bogart. 2014. The transport revolution in industrialising britain. The Cambridge economic history of modern Britain, 1, 368--391.Google Scholar
Benedikt Budig. 2018. Extracting spatial information from historical maps: algorithms and interaction. BoD-Books on Demand.Google Scholar
Yao-Yi Chiang, Weiwei Duan, Stefan Leyk, Johannes H Uhl, and Craig A Knoblock. 2020. Historical map applications and processing technologies. In Using Historical Maps in Scientific Studies. Springer, 9--36.Google Scholar
Yao-Yi Chiang, Weiwei Duan, Stefan Leyk, Johannes H Uhl, and Craig A Knoblock. 2020. Training deep learning models for geographic feature recognition from historical maps. In Using historical maps in scientific studies. Springer, 65--98.Google Scholar
Yao-Yi Chiang and Craig A Knoblock. 2010. Strabo: a system for extracting road vector data from raster maps. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, 544--545.Google ScholarDigital Library
Nan Z Da. 2019. The computational case against computational literary studies. Critical inquiry, 45, 3, 601--639.Google Scholar
Alexey Dosovitskiy et al. 2020. An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.Google Scholar
Matthew H Edney. 2019. Cartography: The ideal and its history. University of Chicago Press.Google ScholarCross Ref
Manfred M. Fischer and Arthur Getis, (Eds.) 2010. Spatial autocorrelation. Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications. Springer Berlin Heidelberg, Berlin, Heidelberg, 255--278. isbn: 978-3-642-03647-7. Google ScholarCross Ref
J Brian Harley. 1988. Silences and secrecy: the hidden agenda of cartography in early modern europe. Imago mundi, 40, 1, 57--76.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770--778.Google ScholarCross Ref
Stephan Heblich, Alex Trew, and Yanos Zylberberg. 2021. East-side story: historical pollution and persistent neighborhood sorting. Journal of Political Economy, 129, 5, 1508--1552.Google ScholarCross Ref
Robert Hecht, Hendrik Herold, Martin Behnisch, and Mathias Jehling. 2018. Mapping long-term dynamics of population and dwellings based on a multi-temporal analysis of urban morphologies. ISPRS International Journal of Geo-Information, 8, 1, 2.Google ScholarCross Ref
Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science, 313, 5786, 504--507.Google Scholar
Kasra Hosseini, Katherine McDonough, Daniel van Strien, Olivia Vane, and Daniel CS Wilson. 2021. Maps of a nation? the digitized ordnance survey for new historical research. Journal of Victorian Culture, 26, 2, 284--299.Google ScholarCross Ref
Kasra Hosseini, Federico Nanni, and Mariona Coll Ardanuy. 2020. Deezymatch: a flexible deep learning approach to fuzzy string matching. In Proceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations, 62--69.Google ScholarCross Ref
Kasra Hosseini, Daniel C.S. Wilson, Kaspar Beelen, and Katherine McDonough. Mapreader_data_sigspatial_2022. Version v0.3.3. Zenodo, (Oct. 2022). Google ScholarCross Ref
Jeremy Howard and Sylvain Gugger. 2020. Fastai: a layered api for deep learning. Information, 11, 2, 108.Google ScholarCross Ref
Hugo Larochelle, Yoshua Bengio, Jérôme Louradour, and Pascal Lamblin. 2009. Exploring strategies for training deep neural networks. Journal of machine learning research, 10, 1.Google ScholarDigital Library
Stefan Leyk, Johannes H Uhl, Dylan S Connor, Anna E Braswell, Nathan Mietkiewicz, Jennifer K Balch, and Myron Gutmann. 2020. Two centuries of settlement and urban development in the united states. Science advances, 6, 23, eaba2937.Google Scholar
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012--10022.Google ScholarCross Ref
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.Google Scholar
Adam Paszke et al. 2019. Pytorch: an imperative style, high-performance deep learning library. (2019). arXiv: 1912.01703 [cs.LG].Google Scholar
F. Pedregosa et al. 2011. Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825--2830.Google ScholarDigital Library
Rémi Petitpierre, Frédéric Kaplan, and Isabella di Lenardo. 2021. Generic semantic segmentation of historical maps. Proceedings http://ceur-ws.org ISSN, 1613, 0073.Google Scholar
Mingxing Tan and Quoc Le. 2019. Efficientnet: rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105--6114.Google Scholar
Johannes H Uhl, Dylan S Connor, Stefan Leyk, and Anna E Braswell. 2020. Urban spatial development in the united states from 1910 to 2010: a novel data-driven perspective. Available at SSRN 3537768.Google Scholar
Johannes H Uhl, Stefan Leyk, Yao-Yi Chiang, and Craig A Knoblock. 2022. Towards the automated large-scale reconstruction of past road networks from historical maps. Computers, environment and urban systems, 94, 101794.Google Scholar
Johannes H Uhl, Stefan Leyk, Caitlin M McShane, Anna E Braswell, Dylan S Connor, and Deborah Balk. 2021. Fine-grained, spatiotemporal datasets measuring 200 years of land development in the united states. Earth system science data, 13, 1, 119--153.Google Scholar
Xiaofang Wang, Dan Kondratyuk, Eric Christiansen, Kris M Kitani, Yair Alon, and Elad Eban. 2020. Wisdom of committees: an overlooked approach to faster and more accurate models. arXiv preprint arXiv:2012.01988.Google Scholar
Melvin Wevers and Thomas Smits. 2020. The visual digital turn: using neural networks to study historical images. Digital Scholarship in the Humanities, 35, 1, 194--207.Google Scholar
Kären Wigen and Caroline Winterer. 2020. Time in maps: from the Age of Discovery to our digital era. University of Chicago Press.Google Scholar
Ross Wightman. 2019. Pytorch image models. https://github.com/rwightman/pytorch-image-models. (2019). Google ScholarCross Ref
I Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546.Google Scholar
Hang Zhang et al. 2022. Resnest: split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2736--2746.Google Scholar

Index Terms

MapReader: a computer vision pipeline for the semantic exploration of maps at scale

Recommendations

Multiple instance learning

The characteristics specific of MIL problems are formally identified and described.MIL methods and applications are reviewed in the light of the problem characteristics.Comparative experiments show the impact of problem characteristics on 16 reference ...
Read More
Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection
GEOAI '21: Proceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map ...
Read More
Interacting with digital cultural heritage collections via annotations: the CULTURA approach
DocEng '13: Proceedings of the 2013 ACM symposium on Document engineering

This paper introduces the main characteristics of the digital cultural collections that constitute the use cases presently in use in the CULTURA environment. A section on related work follows giving an account on efforts on the management of digital ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GeoHumanities '22: Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities
November 2022
36 pages
ISBN:9781450395335
DOI:10.1145/3557919
Editors:
Ludovic Moncla
LIRIS UMR CNRS 5205, INSA Lyon, France
,
Bruno Martins
University of Lisbon, Portugal
,
Katherine McDonough
The Alan Turing Institute, London, United Kingdom
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 November 2022
Check for updates
Author Tags
classification
computer vision
deep learning
digital libraries and archives
historical maps
supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate15of21submissions,71%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 288
  Total Downloads
- Downloads (Last 12 months)190
- Downloads (Last 6 weeks)93
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MapReader: a computer vision pipeline for the semantic exploration of maps at scale

GeoHumanities '22: Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multiple instance learning

Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection

Interacting with digital cultural heritage collections via annotations: the CULTURA approach