research-article

Deep Convolutional Neural Networks for Image Resolution Detection

Authors:

Muhammad Zeshan Afzal,

Markus Ebbecke,

Marcus LiwickiAuthors Info & Claims

HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing

Pages 77 - 82

https://doi.org/10.1145/3151509.3151524

Published: 10 November 2017 Publication History

Abstract

In this paper, we present a novel approach based on convolutional neural networks (CNNs) to estimate the paper format (pixels per inch) of digitized document images. This format information is often required by commercial document analysis software. A correct estimation of format helps high-level tasks such as OCR and layout analysis. The contribution of this work is two-fold: First, it presents an algorithm for the estimation of paper formats. Second, it is the first publicly available collection of documents (aggregated from public datasets) useful as research benchmark. The collection is a mixture of modern and historical documents with a Pixel Per Inch (PPI) value range from 177 up to 711. The task is modeled as a regression task, leading to more flexible results than in a classification task (one class per format, e.g., A3, A4). For example, if an unknown format is presented to the network, it returns a useful output. Furthermore, more categories can be easily learned by curriculum learning without modifying the network structure itself. On the proposed dataset, the network is able to estimate the PPI values with only an average deviation (from the ground truth) of 14.8 PPI. On a private dataset, stemming from health insurance companies, an average deviation of 6.8 PPI points has been calculated.

References

[1]

Muhammad Zeshan Afzal, Joan Pastor-Pellicer, Faisal Shafait, Thomas M Breuel, Andreas Dengel, and Marcus Liwicki. 2015. Document Image Binarization using LSTM: A Sequence Learning Approach. In HIP 2015. ACM, 79--84.

Digital Library

[2]

Riaz Ahmad, Muhammad Zeshan Afzal, Sheikh Faisal Rashid, Marcus Liwicki, and Thomas Breuel. 2015. Scale and rotation invariant OCR for Pashto cursive script using MDLSTM network. In ICDAR 2015. IEEE, 1101--1105.

Digital Library

[3]

Sheraz Ahmed, Muhammad Imran Malik, Muhammad Zeshan Afzal, Koichi Kise, Masakazu Iwamura, Andreas Dengel, and Marcus Liwicki. 2016. A Generic Method for Automatic Ground Truth Generation of Camera-captured Documents. arXiv preprint arXiv:1605.01189 (2016).

[4]

Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak, Shiekh Faisal Rashid, Muhammad Zeeshan Afzal, and Thomas M Breuel. 2016. Evaluation of cursive and non-cursive scripts using recurrent neural networks. Neural Computing and Applications 27, 3 (2016), 603--613.

Digital Library

[5]

Thomas M Breuel, Adnan Ul-Hasan, Mayce Ali Al-Azawi, and Faisal Shafait. 2013. High-Performance OCR for Printed English and Fraktur Using LSTM Networks. In ICDAR 2013. IEEE, 683--687.

[6]

Kai Chen, Cheng-Lin Liu, Mathias Seuret, Marcus Liwicki, Jean Hennebert, and Rolf Ingold. 2016. Page segmentation for historical document images based on superpixel classification with unsupervised feature learning. In DAS 2016. IEEE, 299--304.

[7]

A DENGEL and B KLEIN. [n. d.]. smartFIX: A requirements-driven system for document analysis and understanding, LOPRESTI D., HU J., KASHI R., Eds. In Proceedings of the 5th IAPR International Workshop on Document Analysis Systems, Princeton (New Jersey USA). 433--444.

[8]

Andreas Fischer, Micheal Baechler, Angelika Garz, Marcus Liwicki, and Rolf Ingold. 2014. A combined system for text line extraction and handwriting recognition in historical documents. In DAS 2014. IEEE, 71--75.

Digital Library

[9]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep Sparse Rectifier Neural Networks. In Aistats, Vol. 15. 275.

[10]

James Hartley. 2004. Designing instructional and informational text. Handbook of research on educational communications and technology (2004), 917--947.

[11]

Sheng He, Petros Sammara, Jan Burgers, and Lambert Schomaker. 2014. Towards style-based dating of historical documents. In ICFHR 2014. IEEE, 265--270.

[12]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).

[13]

Thomas Konidaris, Anastasios L Kesidis, and Basilis Gatos. 2016. A segmentation-free word spotting method for historical printed documents. Pattern Analysis and Applications 19, 4 (2016), 963--976.

Digital Library

[14]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[15]

Marcus Liwicki, Alex Graves, Horst Bunke, and Jürgen Schmidhuber. 2007. A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In Proc. 9th Int. Conf. on Document Analysis and Recognition, Vol. 1. 367--371.

[16]

Joan Pastor-Pellicer, Muhammad Zeshan Afzal, Marcus Liwicki, and María José Castro-Bleda. 2016. Complete System for Text Line Extraction Using Convolutional Neural Networks and Watershed Transform. In DAS 2016. IEEE, 30--35.

[17]

Joan Pastor-Pellicer, S España-Boquera, Francisco Zamora-Martínez, M Zeshan Afzal, and Maria Jose Castro-Bleda. 2015. Insights on the use of convolutional neural networks for document image binarization. In International Work-Conference on Artificial Neural Networks. Springer International Publishing, 115--126.

[18]

Mathias Seuret, Michele Alberti, Rolf Ingold, and Marcus Liwicki. 2017. PCA-Initialized Deep Neural Networks Applied To Document Image Analysis. arXiv preprint arXiv.1702.00177 (2017).

[19]

Adnan Ul-Hasan, Muhammad Zeshan Afzal, Faisal Shafait, Marcus Liwicki, and Thomas M Breuel. 2015. A sequence learning approach for multiple script identification. In ICDAR 2015. IEEE, 1046--1050.

Digital Library

Cited By

Matschak TRampold FHellmeier MPrinz CTrang S(2022)A Digitization Pipeline for Mixed-Typed Documents Using Machine Learning and Optical Character RecognitionThe Transdisciplinary Reach of Design Science Research10.1007/978-3-031-06516-3_15(195-207)Online publication date: 25-May-2022
https://doi.org/10.1007/978-3-031-06516-3_15
Kolsch AMishra AVarshneya SAfzal MLiwicki M(2018)Recognizing Challenging Handwritten Annotations with Fully Convolutional Networks2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)10.1109/ICFHR-2018.2018.00014(25-31)Online publication date: Aug-2018
https://doi.org/10.1109/ICFHR-2018.2018.00014
Meziane SBahi LOuadif L(2018)Automatic Recognition of Pavement Degradation: Case of Rif ChainRecent Developments in Pavement Design, Modeling and Performance10.1007/978-3-030-01908-2_11(135-144)Online publication date: 31-Oct-2018
https://doi.org/10.1007/978-3-030-01908-2_11

Recommendations

A dyadic multi-resolution deep convolutional neural wavelet network for image classification

For almost the past four decades, image classification has gained a lot of attention in the field of pattern recognition due to its application in various fields. Given its importance, several approaches have been proposed up to now. In this paper, we ...
Convolutional neural networks for wavelet domain super resolution

Proposed a super resolution method with higher reconstruction accuracy than before.Cast super resolution as a problem of estimating sparse wavelet detail coefficients.Estimated sparse wavelet coefficients using a convolutional neural network (CNN)...
Unconstrained Age Estimation with Deep Convolutional Neural Networks
ICCVW '15: Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW)

We propose an approach for age estimation from unconstrained images based on deep convolutional neural networks (DCNN). Our method consists of four steps: face detection, face alignment, DCNN-based feature extraction and neural network regression for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing

November 2017

129 pages

ISBN:9781450353908

DOI:10.1145/3151509

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

FamilySearch: FamilySearch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Swiss National Science Foundation

Conference

HIP2017

HIP2017: The 4th International Workshop on Historical Document Imaging and Processing

November 10 - 11, 2017

Kyoto, Japan

Acceptance Rates

HIP '17 Paper Acceptance Rate 19 of 33 submissions, 58%;

Overall Acceptance Rate 52 of 90 submissions, 58%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
65
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Matschak TRampold FHellmeier MPrinz CTrang S(2022)A Digitization Pipeline for Mixed-Typed Documents Using Machine Learning and Optical Character RecognitionThe Transdisciplinary Reach of Design Science Research10.1007/978-3-031-06516-3_15(195-207)Online publication date: 25-May-2022
https://doi.org/10.1007/978-3-031-06516-3_15
Kolsch AMishra AVarshneya SAfzal MLiwicki M(2018)Recognizing Challenging Handwritten Annotations with Fully Convolutional Networks2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)10.1109/ICFHR-2018.2018.00014(25-31)Online publication date: Aug-2018
https://doi.org/10.1109/ICFHR-2018.2018.00014
Meziane SBahi LOuadif L(2018)Automatic Recognition of Pavement Degradation: Case of Rif ChainRecent Developments in Pavement Design, Modeling and Performance10.1007/978-3-030-01908-2_11(135-144)Online publication date: 31-Oct-2018
https://doi.org/10.1007/978-3-030-01908-2_11

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten