skip to main content
10.1145/3078081.3078106acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdatechConference Proceedingsconference-collections
research-article

Unearthing the Recent Past: Digitising and Understanding Statistical Information from Census Tables

Published:01 June 2017Publication History

ABSTRACT

Censuses comprise a wealth of information at a large (national) scale that allow governments (who commission them) and the public to have a detailed snapshot of how people live (geographical distribution and characteristics). In addition to underpinning socio-economic research, the study of historical Census statistics provides a unique opportunity to understand several characteristics in a country and its heritage. This paper presents an overview of a complete account of the background, challenges, implemented preprocessing, recognition and post-processing pipeline, and the information-rich results obtained through a pilot digitisation project on the 1961 Census of England and Wales (the first time computers were used to process data and output very detailed information, a vital part of which is only available in the form of degraded historical computer printouts). The experience gained and the resulting methodology can also be used for digitising and understanding tabular information in a large variety of application scenarios.

References

  1. Office for National Statistics, United Kingdom, https://www.ons.gov.uk/Google ScholarGoogle Scholar
  2. Hu, J., Kashi, R.S., Lopresti, D., Wilfong, G.T. 2002. Evaluating the performance of table processing algorithms. International Journal on Document Analysis and Recognition, Volume 4, Issue 3 (March 2002), pp 140--153. Google ScholarGoogle ScholarCross RefCross Ref
  3. Lopresti, D., Nagy, G. 1999. Automated Table Processing: An (Opinionated) Survey. Proceedings of the 3rd International Workshop on Graphics Recognition (Jaipur, India, 26-27 September 1999). pp 109--134.Google ScholarGoogle Scholar
  4. Costa e Silva, A., Jorge, A.M., Torgo, L. 2006. Design of an end-to-end method to extract information from tables. International Journal of Document Analysis and Recognition (IJDAR), Volume 8, Issue 2 (June 2006), pp 144--171. Google ScholarGoogle ScholarCross RefCross Ref
  5. Zanibbi, R., Blostein, D., Cordy, J.R. 2004. A survey of table recognition: Models, observations, transformations, and inferences. Document Analysis and Recognition, Volume 7, Issue 1 (March 2004), pp 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lopresti, D., Nagy, G. 2001. A Tabular Survey of Automated Table Processing. Graphics Recognition Recent Advances, Volume 1941 of the series Lecture Notes in Computer Science (April 2001), pp 93--120.Google ScholarGoogle Scholar
  7. ABBYY FineReader Engine 11, http://www.abbyy.com/ocr-sdkGoogle ScholarGoogle Scholar
  8. Clausner C., Pletschacher S., and Antonacopoulos A. 2011. Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments. Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011) (Beijing, China, September 2011), pp. 48--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pletschacher S., and Antonacopoulos A. 2010. The PAGE (Page Analysis and Ground-Truth Elements) Format Framework. Proceedings of the 20th International Conference on Pattern Recognition (ICPR2010) (Istanbul, Turkey, August 23-26, 2010), IEEE-CS Press, pp. 257--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Clausner C., Pletschacher S., and Antonacopoulos A. 2011. Scenario Driven In-Depth Performance Evaluation of Document Layout Analysis Methods. Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011) (Beijing, China, September 2011), pp. 1404--1408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tesseract OCR, https://github.com/tesseract-ocrGoogle ScholarGoogle Scholar
  12. PRImA Text Evaluation Tool, University of Salford, United Kingdom, http://www.primaresearch.org/tools/PerformanceEvaluationGoogle ScholarGoogle Scholar
  13. InFuse, UK Data Service, http://infuse.ukdataservice.ac.uk/Google ScholarGoogle Scholar

Index Terms

  1. Unearthing the Recent Past: Digitising and Understanding Statistical Information from Census Tables

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      DATeCH2017: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage
      June 2017
      179 pages
      ISBN:9781450352659
      DOI:10.1145/3078081

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      DATeCH2017 Paper Acceptance Rate29of37submissions,78%Overall Acceptance Rate60of86submissions,70%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader