skip to main content
10.1145/3151509.3151524acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

Deep Convolutional Neural Networks for Image Resolution Detection

Published: 10 November 2017 Publication History

Abstract

In this paper, we present a novel approach based on convolutional neural networks (CNNs) to estimate the paper format (pixels per inch) of digitized document images. This format information is often required by commercial document analysis software. A correct estimation of format helps high-level tasks such as OCR and layout analysis. The contribution of this work is two-fold: First, it presents an algorithm for the estimation of paper formats. Second, it is the first publicly available collection of documents (aggregated from public datasets) useful as research benchmark. The collection is a mixture of modern and historical documents with a Pixel Per Inch (PPI) value range from 177 up to 711. The task is modeled as a regression task, leading to more flexible results than in a classification task (one class per format, e.g., A3, A4). For example, if an unknown format is presented to the network, it returns a useful output. Furthermore, more categories can be easily learned by curriculum learning without modifying the network structure itself. On the proposed dataset, the network is able to estimate the PPI values with only an average deviation (from the ground truth) of 14.8 PPI. On a private dataset, stemming from health insurance companies, an average deviation of 6.8 PPI points has been calculated.

References

[1]
Muhammad Zeshan Afzal, Joan Pastor-Pellicer, Faisal Shafait, Thomas M Breuel, Andreas Dengel, and Marcus Liwicki. 2015. Document Image Binarization using LSTM: A Sequence Learning Approach. In HIP 2015. ACM, 79--84.
[2]
Riaz Ahmad, Muhammad Zeshan Afzal, Sheikh Faisal Rashid, Marcus Liwicki, and Thomas Breuel. 2015. Scale and rotation invariant OCR for Pashto cursive script using MDLSTM network. In ICDAR 2015. IEEE, 1101--1105.
[3]
Sheraz Ahmed, Muhammad Imran Malik, Muhammad Zeshan Afzal, Koichi Kise, Masakazu Iwamura, Andreas Dengel, and Marcus Liwicki. 2016. A Generic Method for Automatic Ground Truth Generation of Camera-captured Documents. arXiv preprint arXiv:1605.01189 (2016).
[4]
Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak, Shiekh Faisal Rashid, Muhammad Zeeshan Afzal, and Thomas M Breuel. 2016. Evaluation of cursive and non-cursive scripts using recurrent neural networks. Neural Computing and Applications 27, 3 (2016), 603--613.
[5]
Thomas M Breuel, Adnan Ul-Hasan, Mayce Ali Al-Azawi, and Faisal Shafait. 2013. High-Performance OCR for Printed English and Fraktur Using LSTM Networks. In ICDAR 2013. IEEE, 683--687.
[6]
Kai Chen, Cheng-Lin Liu, Mathias Seuret, Marcus Liwicki, Jean Hennebert, and Rolf Ingold. 2016. Page segmentation for historical document images based on superpixel classification with unsupervised feature learning. In DAS 2016. IEEE, 299--304.
[7]
A DENGEL and B KLEIN. [n. d.]. smartFIX: A requirements-driven system for document analysis and understanding, LOPRESTI D., HU J., KASHI R., Eds. In Proceedings of the 5th IAPR International Workshop on Document Analysis Systems, Princeton (New Jersey USA). 433--444.
[8]
Andreas Fischer, Micheal Baechler, Angelika Garz, Marcus Liwicki, and Rolf Ingold. 2014. A combined system for text line extraction and handwriting recognition in historical documents. In DAS 2014. IEEE, 71--75.
[9]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep Sparse Rectifier Neural Networks. In Aistats, Vol. 15. 275.
[10]
James Hartley. 2004. Designing instructional and informational text. Handbook of research on educational communications and technology (2004), 917--947.
[11]
Sheng He, Petros Sammara, Jan Burgers, and Lambert Schomaker. 2014. Towards style-based dating of historical documents. In ICFHR 2014. IEEE, 265--270.
[12]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).
[13]
Thomas Konidaris, Anastasios L Kesidis, and Basilis Gatos. 2016. A segmentation-free word spotting method for historical printed documents. Pattern Analysis and Applications 19, 4 (2016), 963--976.
[14]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[15]
Marcus Liwicki, Alex Graves, Horst Bunke, and Jürgen Schmidhuber. 2007. A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In Proc. 9th Int. Conf. on Document Analysis and Recognition, Vol. 1. 367--371.
[16]
Joan Pastor-Pellicer, Muhammad Zeshan Afzal, Marcus Liwicki, and María José Castro-Bleda. 2016. Complete System for Text Line Extraction Using Convolutional Neural Networks and Watershed Transform. In DAS 2016. IEEE, 30--35.
[17]
Joan Pastor-Pellicer, S España-Boquera, Francisco Zamora-Martínez, M Zeshan Afzal, and Maria Jose Castro-Bleda. 2015. Insights on the use of convolutional neural networks for document image binarization. In International Work-Conference on Artificial Neural Networks. Springer International Publishing, 115--126.
[18]
Mathias Seuret, Michele Alberti, Rolf Ingold, and Marcus Liwicki. 2017. PCA-Initialized Deep Neural Networks Applied To Document Image Analysis. arXiv preprint arXiv.1702.00177 (2017).
[19]
Adnan Ul-Hasan, Muhammad Zeshan Afzal, Faisal Shafait, Marcus Liwicki, and Thomas M Breuel. 2015. A sequence learning approach for multiple script identification. In ICDAR 2015. IEEE, 1046--1050.

Cited By

View all
  • (2022)A Digitization Pipeline for Mixed-Typed Documents Using Machine Learning and Optical Character RecognitionThe Transdisciplinary Reach of Design Science Research10.1007/978-3-031-06516-3_15(195-207)Online publication date: 25-May-2022
  • (2018)Recognizing Challenging Handwritten Annotations with Fully Convolutional Networks2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)10.1109/ICFHR-2018.2018.00014(25-31)Online publication date: Aug-2018
  • (2018)Automatic Recognition of Pavement Degradation: Case of Rif ChainRecent Developments in Pavement Design, Modeling and Performance10.1007/978-3-030-01908-2_11(135-144)Online publication date: 31-Oct-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing
November 2017
129 pages
ISBN:9781450353908
DOI:10.1145/3151509
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • FamilySearch: FamilySearch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Swiss National Science Foundation

Conference

HIP2017

Acceptance Rates

HIP '17 Paper Acceptance Rate 19 of 33 submissions, 58%;
Overall Acceptance Rate 52 of 90 submissions, 58%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Digitization Pipeline for Mixed-Typed Documents Using Machine Learning and Optical Character RecognitionThe Transdisciplinary Reach of Design Science Research10.1007/978-3-031-06516-3_15(195-207)Online publication date: 25-May-2022
  • (2018)Recognizing Challenging Handwritten Annotations with Fully Convolutional Networks2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)10.1109/ICFHR-2018.2018.00014(25-31)Online publication date: Aug-2018
  • (2018)Automatic Recognition of Pavement Degradation: Case of Rif ChainRecent Developments in Pavement Design, Modeling and Performance10.1007/978-3-030-01908-2_11(135-144)Online publication date: 31-Oct-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media