research-article

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing: Percentile Based Binarization

Authors:
Vladimir Rybalkin

Microelectronic Systems Design, Research Group, University of Kaiserslautern, Germany

Microelectronic Systems Design, Research Group, University of Kaiserslautern, Germany
View Profile

,
Syed Saqib Bukhari

German Research Center for Artificial Intelligence (DFKI), University of Kaiserslautern, Germany

German Research Center for Artificial Intelligence (DFKI), University of Kaiserslautern, Germany
View Profile

,
Muhammad Mohsin Ghaffar

Microelectronic Systems Design, Research Group, University of Kaiserslautern, Germany

Microelectronic Systems Design, Research Group, University of Kaiserslautern, Germany
View Profile

,
Aqib Ghafoor

University of Kaiserslautern, Germany

University of Kaiserslautern, Germany
View Profile

,
Norbert Wehn

Microelectronic Systems Design, Research Group, University of Kaiserslautern, Germany

Microelectronic Systems Design, Research Group, University of Kaiserslautern, Germany
View Profile

,
Andreas Dengel

German Research Center for Artificial Intelligence (DFKI), University of Kaiserslautern, Germany

German Research Center for Artificial Intelligence (DFKI), University of Kaiserslautern, Germany
View Profile

DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018August 2018Article No.: 24Pages 1–8https://doi.org/10.1145/3209280.3209538

Published:28 August 2018Publication History

DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018

Pages 1–8

ABSTRACT

End-to-end Optical Character Recognition (OCR) systems are heavily used to convert document images into machine-readable text. Commercial and open-source OCR systems (like Abbyy, OCRopus, Tesseract etc.) have traditionally been optimized for contemporary documents like books, letters, memos, and other end-user documents. However, these systems are difficult to use equally well for digitizing historical document images, which contain degradations like non-uniform shading, bleed-through, and irregular layout; such degradations usually do not exist in contemporary document images.

The open-source anyOCR is an end-to-end OCR pipeline, which contains state-of-the-art techniques that are required for digitizing degraded historical archives with high accuracy. However, high accuracy comes at a cost of high computational complexity that results in 1) long runtime that limits digitization of big collection of historical archives and 2) high energy consumption that is the most critical limiting factor for portable devices with constrained energy budget. Therefore, we are targeting energy efficient and high throughput acceleration of the anyOCR pipeline. Generalpurpose computing platforms fail to meet these requirements that makes custom hardware design mandatory. In this paper, we are presenting a new concept named iDocChip. It is a portable hybrid hardware-software FPGA-based accelerator that is characterized by low footprint meaning small size, high power efficiency that will allow using it in portable devices, and high throughput that will make it possible to process big collection of historical archives in real time without effecting the accuracy.

In this paper, we focus on binarization, which is the second most critical step in the anyOCR pipeline after text-line recognizer that we have already presented in our previous publication [21]. The anyOCR system makes use of a Percentile Based Binarization method that is suitable for overcoming degradations like non-uniform shading and bleed-through. To the best of our knowledge, we propose the first hardware architecture of the PBB technique. Based on the new architecture, we present a hybrid hardware-software FPGA-based accelerator that outperforms the existing anyOCR software implementation running on i7-4790T in terms of runtime by factor of 21, while achieving energy efficiency of 10 Images/J that is higher than that achieved by low power embedded processors with negligible loss of recognition accuracy.

References

{n. d.}. ABBYY. https://www.abbyy.com/en-eu/. ({n. d.}).Google Scholar
{n. d.}. Digital Multimeter Vilcraft. http://www.produktinfo.conrad.com/datenblaetter/100000-124999/124608-an-01-ml-VOLTCRAFT_VC_870_DMM__K__de_en_fr_nl.pdf. ({n. d.}).Google Scholar
{n. d.}. Efficiently Implementing Dilate and Erode Image Functions - Stephen Ostermiller. https://blog.ostermiller.org/dilate-and-erode. ({n. d.}).Google Scholar
{n. d.}. Narrenschif. http://kallimachos.de/kallimachos/index.php/Narragonien. ({n. d.}).Google Scholar
{n. d.}. OCRopus. https://github.com/tmbdev/ocropy. ({n. d.}).Google Scholar
{n. d.}. Omnipage. www.nuance.de/for-individuals/by-product/omnipage/index. htmwww.nuance.de/for-individuals/by-product/omnipage/index.htm. ({n. d.}).Google Scholar
{n. d.}. Tesseract. https://github.com/tesseract-ocr. ({n. d.}).Google Scholar
M Z Afzal, M. Kramer, Syed Saqib Bukhari, M R Yousefi, Faisal Shafait, and T M Breuel. 2014. Robust Binarization of Stereo and Monocular Document Images Using Percentile Filter. Vol. 1. Springer, 139--149. Google ScholarDigital Library
Luis Alvarez and Luis Mazorra. 1994. Signal and Image Restoration Using Shock Filters and Anisotropic Diffusion. SIAM J. Numer. Anal. 31, 2 (April 1994), 590--605. Google ScholarDigital Library
Thomas M Breuel, Adnan Ul-Hasan, Mayce Al-Azawi, and Faisal Shafait. 2013. High-performance OCR for printed English and Fraktur using LSTM networks. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 683--687. Google ScholarDigital Library
Syed Saqib Bukhari, Ahmad Kadi, Mohammad Ayman Jouneh, Fahim Mahmood Mir, and Andreas Dengel. 2017. anyOCR: An Open-Source OCR System for Historical Archives. The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR2017), Kyoto, Japan (2017).Google ScholarCross Ref
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. ACM, 369--376. Google ScholarDigital Library
J. He, Q. D. M. Do, A. C. Downton, and J. H. Kim. 2005. A comparison of binarization methods for historical archive documents. In Eighth International Conference on Document Analysis and Recognition (ICDAR'05). 538--542 Vol. 1. Google ScholarDigital Library
E. Kavallieratou and S. Stathis. 2006. Adaptive Binarization of Historical Document Images. In 18th International Conference on Pattern Recognition (ICPR'06), Vol. 3. 742--745. Google ScholarDigital Library
Kazuya Kawakami. 2008. Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D. Dissertation. Ph. D. thesis, Technical University of Munich.Google Scholar
F. Kheiri, S. Samavi, and N. Karimi. 2017. Hardware design for binarization and thinning of fingerprint images. ArXiv e-prints (Oct. 2017). arXiv:cs.CV/1710.05749Google Scholar
Claudie Faure Nicole Vincent Khurram Khurshid, Imran Siddiqi. 2009. Comparison of Niblack inspired binarization methods for ancient documents. (2009), 7247 - 7247 - 9 pages.Google Scholar
M. H. Najafi and M. E. Salehi. 2016. A Fast Fault-Tolerant Architecture for Sauvola Local Image Thresholding Algorithm Using Stochastic Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 2 (Feb 2016), 808--812.Google ScholarDigital Library
N. Otsu. 1979. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1 (Jan 1979), 62--66.Google ScholarCross Ref
Puneet and Naresh Garg. 2013. Binarization Techniques used for Grey Scale Images. International Journal of Computer Applications 71, 1 (June 2013), 8--11.Google Scholar
Vladimir Rybalkin, Norbert Wehn, Mohammad Reza Yousefi, and Didier Stricker. 2017. Hardware architecture of bidirectional long short-term memory neural network for optical character recognition. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 1394--1399. Google ScholarDigital Library
J. Sauvola, T. Seppanen, S. Haapakoski, and M. Pietikainen. 1997. Adaptive document binarization. In Proceedings of the Fourth International Conference on Document Analysis and Recognition, Vol. 1. 147--152 vol.1. Google ScholarDigital Library
Brij Mohan Singh, Rahul Sharma, Ankush Mittal, and Debashish Ghosh. 2011. Parallel Implementation of Otsu's Binarization Approach on GPU. International Journal of Computer Applications 32, 2 (October 2011), 16--21.Google Scholar
Brij Mohan Singh, Rahul Sharma, Ankush Mittal, and Debashish Ghosh. 2011. Parallel Implementation of Souvola's Binarization Approach on GPU. International Journal of Computer Applications 32, 2 (October 2011), 28--33.Google Scholar
Robert A. Wagner and Michael J. Fischer. 1974. The String-to-String Correction Problem. J. ACM 21, 1 (Jan. 1974), 168--173. Google ScholarDigital Library
Jeng-Daw Yang, Yung-Sheng Chen, and Wen-Hsing Hsu. 1994. Adaptive Thresholding Algorithm and Its Hardware Implementation. Pattern Recogn. Lett. 15, 2 (Feb. 1994), 141--150. Google ScholarDigital Library
Mohammad Reza Yousefi, Mohammad Reza Soheili, Thomas M Breuel, Ehsanollah Kabir, and Didier Stricker. 2015. Binarization-free ocr for historical documents using lstm networks. In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 1121--1125. Google ScholarDigital Library
Mohammad Reza Yousefi, Mohammad Reza Soheili, Thomas M Breuel, and Didier Stricker. 2015. A comparisonof 1D and 2D LSTM architectures for the recognition of handwritten Arabic. In Document Recognition and Retrieval XXII, Vol. 9402. International Society for Optics and Photonics, 94020H.Google Scholar

Index Terms

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing: Percentile Based Binarization

Recommendations

Efficient Hardware Architectures for 1D- and MD-LSTM Networks
Abstract
Recurrent Neural Networks, in particular One-dimensional and Multidimensional Long Short-Term Memory (1D-LSTM and MD-LSTM) have achieved state-of-the-art classification accuracy in many applications such as machine translation, image caption ...
Read More
Rapid Implementation of Embedded Systems using Xilinx Zynq Platform
SEEDA-CECNSM '16: Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media Conference

In any digital system design, it is crucial to achieve the lowest time-to-market possible. Indeed, that need has pushed large FPGA manufacturers to produce SoCs which will implement reprogrammable logic along with CPU and DSP cores. Especially, during ...
Read More
When Massive GPU Parallelism Ain’t Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network
Multidimensional Long Short-Term Memory (MD-LSTM) neural network is an extension of one-dimensional LSTM for data with more than one dimension. MD-LSTM achieves state-of-the-art results in various applications, including handwritten text recognition, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018
August 2018
311 pages
ISBN:9781450357692
DOI:10.1145/3209280

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Binarization
FPGA
Hardware Architecture
HardwareSoftware Co-Design
Machine Learning
Optical Character Recognition
Zynq
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate178of537submissions,33%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing: Percentile Based Binarization

DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Hardware Architectures for 1D- and MD-LSTM Networks

Rapid Implementation of Embedded Systems using Xilinx Zynq Platform

When Massive GPU Parallelism Ain’t Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing: Percentile Based Binarization

DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Hardware Architectures for 1D- and MD-LSTM Networks

Rapid Implementation of Embedded Systems using Xilinx Zynq Platform

When Massive GPU Parallelism Ain’t Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media