ABSTRACT
We present MapReader, a free, open-source software library written in Python for analyzing large map collections. MapReader allows users with little computer vision expertise to i) retrieve maps via web-servers; ii) preprocess and divide them into patches; iii) annotate patches; iv) train, fine-tune, and evaluate deep neural network models; and v) create structured data about map content. We demonstrate how MapReader enables historians to interpret a collection of ≈16K nineteenth-century maps of Britain (≈30.5M patches), foregrounding the challenge of translating visual markers into machine-readable data. We present a case study focusing on rail and buildings. We also show how the outputs from the MapReader pipeline can be linked to other, external datasets. We release ≈62K manually annotated patches used here for training and evaluating the models.
- Martin Abadi et al. 2016. Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.Google Scholar
- James R Akerman. 2017. Decolonizing the map: Cartography from colony to nation. University of Chicago Press.Google Scholar
- Mariona Coll Ardanuy, Kaspar Beelen, Jon Lawrence, Katherine McDonough, Federico Nanni, Joshua Rhodes, Giorgia Tolfo, and Daniel CS Wilson. 2021. Station to station: linking and enriching historical british railway data. Proceedings http://ceur-ws.org ISSN, 1613, 0073.Google Scholar
- Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, and Federico Nanni. 2020. A deep learning approach to geographical candidate selection through toponym matching. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, 385--388.Google ScholarDigital Library
- Taylor Arnold and Lauren Tilton. 2019. Distant viewing: analyzing large visual corpora. Digital Scholarship in the Humanities, 34, Supplement_1, i3-i16.Google ScholarCross Ref
- Brian Baily. 2007. The extraction of digital vector data from historic land use maps of great britain using image processing techniques. E-perimetron, 2, 4, 209--223.Google Scholar
- Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18, 9, 509--517.Google ScholarDigital Library
- Dan Bogart. 2014. The transport revolution in industrialising britain. The Cambridge economic history of modern Britain, 1, 368--391.Google Scholar
- Benedikt Budig. 2018. Extracting spatial information from historical maps: algorithms and interaction. BoD-Books on Demand.Google Scholar
- Yao-Yi Chiang, Weiwei Duan, Stefan Leyk, Johannes H Uhl, and Craig A Knoblock. 2020. Historical map applications and processing technologies. In Using Historical Maps in Scientific Studies. Springer, 9--36.Google Scholar
- Yao-Yi Chiang, Weiwei Duan, Stefan Leyk, Johannes H Uhl, and Craig A Knoblock. 2020. Training deep learning models for geographic feature recognition from historical maps. In Using historical maps in scientific studies. Springer, 65--98.Google Scholar
- Yao-Yi Chiang and Craig A Knoblock. 2010. Strabo: a system for extracting road vector data from raster maps. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, 544--545.Google ScholarDigital Library
- Nan Z Da. 2019. The computational case against computational literary studies. Critical inquiry, 45, 3, 601--639.Google Scholar
- Alexey Dosovitskiy et al. 2020. An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.Google Scholar
- Matthew H Edney. 2019. Cartography: The ideal and its history. University of Chicago Press.Google ScholarCross Ref
- Manfred M. Fischer and Arthur Getis, (Eds.) 2010. Spatial autocorrelation. Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications. Springer Berlin Heidelberg, Berlin, Heidelberg, 255--278. isbn: 978-3-642-03647-7. Google ScholarCross Ref
- J Brian Harley. 1988. Silences and secrecy: the hidden agenda of cartography in early modern europe. Imago mundi, 40, 1, 57--76.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770--778.Google ScholarCross Ref
- Stephan Heblich, Alex Trew, and Yanos Zylberberg. 2021. East-side story: historical pollution and persistent neighborhood sorting. Journal of Political Economy, 129, 5, 1508--1552.Google ScholarCross Ref
- Robert Hecht, Hendrik Herold, Martin Behnisch, and Mathias Jehling. 2018. Mapping long-term dynamics of population and dwellings based on a multi-temporal analysis of urban morphologies. ISPRS International Journal of Geo-Information, 8, 1, 2.Google ScholarCross Ref
- Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science, 313, 5786, 504--507.Google Scholar
- Kasra Hosseini, Katherine McDonough, Daniel van Strien, Olivia Vane, and Daniel CS Wilson. 2021. Maps of a nation? the digitized ordnance survey for new historical research. Journal of Victorian Culture, 26, 2, 284--299.Google ScholarCross Ref
- Kasra Hosseini, Federico Nanni, and Mariona Coll Ardanuy. 2020. Deezymatch: a flexible deep learning approach to fuzzy string matching. In Proceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations, 62--69.Google ScholarCross Ref
- Kasra Hosseini, Daniel C.S. Wilson, Kaspar Beelen, and Katherine McDonough. Mapreader_data_sigspatial_2022. Version v0.3.3. Zenodo, (Oct. 2022). Google ScholarCross Ref
- Jeremy Howard and Sylvain Gugger. 2020. Fastai: a layered api for deep learning. Information, 11, 2, 108.Google ScholarCross Ref
- Hugo Larochelle, Yoshua Bengio, Jérôme Louradour, and Pascal Lamblin. 2009. Exploring strategies for training deep neural networks. Journal of machine learning research, 10, 1.Google ScholarDigital Library
- Stefan Leyk, Johannes H Uhl, Dylan S Connor, Anna E Braswell, Nathan Mietkiewicz, Jennifer K Balch, and Myron Gutmann. 2020. Two centuries of settlement and urban development in the united states. Science advances, 6, 23, eaba2937.Google Scholar
- Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012--10022.Google ScholarCross Ref
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.Google Scholar
- Adam Paszke et al. 2019. Pytorch: an imperative style, high-performance deep learning library. (2019). arXiv: 1912.01703 [cs.LG].Google Scholar
- F. Pedregosa et al. 2011. Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825--2830.Google ScholarDigital Library
- Rémi Petitpierre, Frédéric Kaplan, and Isabella di Lenardo. 2021. Generic semantic segmentation of historical maps. Proceedings http://ceur-ws.org ISSN, 1613, 0073.Google Scholar
- Mingxing Tan and Quoc Le. 2019. Efficientnet: rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105--6114.Google Scholar
- Johannes H Uhl, Dylan S Connor, Stefan Leyk, and Anna E Braswell. 2020. Urban spatial development in the united states from 1910 to 2010: a novel data-driven perspective. Available at SSRN 3537768.Google Scholar
- Johannes H Uhl, Stefan Leyk, Yao-Yi Chiang, and Craig A Knoblock. 2022. Towards the automated large-scale reconstruction of past road networks from historical maps. Computers, environment and urban systems, 94, 101794.Google Scholar
- Johannes H Uhl, Stefan Leyk, Caitlin M McShane, Anna E Braswell, Dylan S Connor, and Deborah Balk. 2021. Fine-grained, spatiotemporal datasets measuring 200 years of land development in the united states. Earth system science data, 13, 1, 119--153.Google Scholar
- Xiaofang Wang, Dan Kondratyuk, Eric Christiansen, Kris M Kitani, Yair Alon, and Elad Eban. 2020. Wisdom of committees: an overlooked approach to faster and more accurate models. arXiv preprint arXiv:2012.01988.Google Scholar
- Melvin Wevers and Thomas Smits. 2020. The visual digital turn: using neural networks to study historical images. Digital Scholarship in the Humanities, 35, 1, 194--207.Google Scholar
- Kären Wigen and Caroline Winterer. 2020. Time in maps: from the Age of Discovery to our digital era. University of Chicago Press.Google Scholar
- Ross Wightman. 2019. Pytorch image models. https://github.com/rwightman/pytorch-image-models. (2019). Google ScholarCross Ref
- I Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546.Google Scholar
- Hang Zhang et al. 2022. Resnest: split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2736--2746.Google Scholar
Index Terms
- MapReader: a computer vision pipeline for the semantic exploration of maps at scale
Recommendations
Multiple instance learning
The characteristics specific of MIL problems are formally identified and described.MIL methods and applications are reviewed in the light of the problem characteristics.Comparative experiments show the impact of problem characteristics on 16 reference ...
Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection
GEOAI '21: Proceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge DiscoveryMany historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map ...
Interacting with digital cultural heritage collections via annotations: the CULTURA approach
DocEng '13: Proceedings of the 2013 ACM symposium on Document engineeringThis paper introduces the main characteristics of the digital cultural collections that constitute the use cases presently in use in the CULTURA environment. A section on related work follows giving an account on efforts on the management of digital ...
Comments