Frame selection for OCR from video stream of book flipping

Chakraborty, Dibyayan; Roy, Partha Pratim; Saini, Rajkumar; Alvarez, Jose M.; Pal, Umapada

doi:10.1007/s11042-016-4292-3

Frame selection for OCR from video stream of book flipping

Published: 06 January 2017

Volume 77, pages 985–1008, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dibyayan Chakraborty¹,
Partha Pratim Roy²,
Rajkumar Saini²,
Jose M. Alvarez³ &
…
Umapada Pal¹

434 Accesses
3 Citations
Explore all metrics

Abstract

Optical Character Recognition (OCR) in video stream of flipping pages is a challenging task because flipping at random speed causes difficulties in identifying the frames that contain the open page image (OPI). Also, low resolution, blurring effect, shadow, etc., add significant noise in selection of proper frames for OCR. In this paper, we focus on identifying a set of representative frames from the video stream of flipping pages without using any explicit hardware and then perform OCR on these frames for recognition. Thus, an end-to-end solution is proposed for video stream of flipping pages. To select an OPI, we present an efficient algorithm that exploits cues from edge information during flipping event. These cues, extracted from the region of interest (ROI) of the frame, determine the flipping or open state of a page. The open state classification is performed by an SVM classifier following training of the edge cue information. After selecting a set of frames for each OPI, a representative frame from OPI set is chosen for OCR. Experiments are performed on videos captured using standard resolution camera. We have obtained 88.81 % accuracy on representative frame selection from the proposed method whereas when compared with GIST (Oliva and Torralba, Int J Comput Vis 42(3):145–175 (2001)), the accuracy was only 51.28 %. To the best of our knowledge this is the first work in this area. After frame selection, we have achieved 83.31 % character recognition accuracy and 78.11 % word recognition accuracy with traditional OCR in our dataset of flipping book.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Breuel TM (2008) The OCRopus open source OCR system. In: Proceedings of DRR
Bosamiya JH, Agrawal P, Roy PP, Balasubramanian R (2015) Script independent scene text segmentation using fast stroke width transform and GrabCut. In: 3rd IAPR Asian conference on pattern recognition (ACPR), pp 151–155
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698
Article Google Scholar
Chakraborty D, Roy PP, Pal U, Alvarez JM (2013) OCR from video stream of book flipping. In: Proceedings of the 2nd Asian conference on pattern recognition, pp 130–134
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Cunzhao S, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2961–2968
Das D, Datong C, Hauptmann AG (2008) Improving multimedia retrieval with a video OCR, pp 68200B-68200B
Fujinami K, Inagawa N (2009) Page-flipping detection and information presentation for implicit interaction with a book. Int J Multimedia Ubiquit Eng 93–112
Hearn D, Baker MP (1994) Computer graphics. Addison-Wesley
Iwamura M, Tsuji T, Horimatsu A, Kise K (2009) Real-time camera-based recognition of characters and pictograms. In: International conference on document analysis and recognition, pp 76–80
Lee CW, Jung K, Kim HJ (2003) Automatic text detection and removal in video sequences. Pattern Recogn Lett 24:2607–2623
Article Google Scholar
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268
Article Google Scholar
Micusik B, Wildenauer H, Kosecka J (2008) Detection and matching of rectilinear structures. In: Proceedings of computer vision and pattern recognition (CVPR), pp 1–7
Mishra A, Karteek A, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2687–2694
Nakashima T, Watanabe Y, Komuro T, Ishikawa M (2009) Book flipping scanning. In: Symposium on user interface software and technology, vol 22, pp 79–80
Neumann L, Matas J (2012) Real-time scene text localization and recognition, pp 3538–3545
Ngo CW, Chan CK (2005) Video text detection and segmentation for optical character recognition. Multimedia Systems 10(3):261–272
Article Google Scholar
Niblack W (1986) An introduction to digital image processing. Prentice Hall, pp 115–116
Ojala T, Pietikäinen M, Mäenpää T (2001) A generalized Local Binary Pattern operator for multiresolution gray scale and rotation invariant texture classification. In: Proceedings of international conference on advances in pattern recognition, pp 399–408
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article MATH Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar
onlineocr.net - Free Online OCR service, convert scanned PDF and images to Word, Text
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Article MathSciNet Google Scholar
Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U (2013) HMM-based multi oriented text recognition in natural scene image. In: 2nd IAPR Asian conference on pattern recognition, pp 288–292
Sato T, Kanade T, Hughes EK, Smith MA (1998) Video OCR for digital news archive, In Proc. In: IEEE international workshop on content-based access of image and video database, pp 52–60
Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236
Article Google Scholar
Shibayama H, Watanabe Y, Ishikawa M (2012) Reconstruction of 3D surface and restoration of flat document image from monocular image sequence. In: Asian conference on computer vision, pp 350–364
Shivakumara P, Bhowmick S, Su B, Tan CL, Pal U (2011) A new gradient based character segmentation method for video text recognition. In: International conference on document analysis and recognition, pp 126–130
Singh M, Kaur A (2015) An efficient hybrid scheme for key frame extraction and text localization in video. In: International conference on advances in computing, communications and informatics, pp 1250–1254
Smith R (2007) An overview of the Tesseract OCR engine. In: Proceedings of 9th international conference on document analysis and recognition, pp 629–633
Su B, Lu S, Tan CL (2013) Robust document image binarization technique for degraded document images. IEEE Trans Image Processing 22(4):1408–1417
Article MathSciNet MATH Google Scholar
Vajda S, Rothacker L, Fink GA (2011) A method for camera-based interactive whiteboard reading
Vapnik V (1995) The nature of statistical learning theory. Springer Verlang
Wu V, Manmatha R, Riseman EM (1997) Finding text in images. In: Proceedings of ACM international conference on digital libraries, pp 23–26
Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In: IEEE conference on computer vision and pattern recognition, pp 4042–4049
Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605
Article MathSciNet MATH Google Scholar
Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Article Google Scholar
Yin XC, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937
Article Google Scholar
Zang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: Proceedings of 8th international association pattern recognition international workshop document analysis systems, pp 5–17
Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. In: Proceedings of 3rd international conference document analysis and recognition, p 146

Download references

Author information

Authors and Affiliations

Computer Vision and Pattern Recognition Unit, ISI Kolkata, Kolkata, India
Dibyayan Chakraborty & Umapada Pal
Department of Computer Science and Engineering, IIT Roorkee, Roorkee, India
Partha Pratim Roy & Rajkumar Saini
Canberra Research Lab, ACT, Canberra, Australia
Jose M. Alvarez

Authors

Dibyayan Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Partha Pratim Roy
View author publications
You can also search for this author in PubMed Google Scholar
Rajkumar Saini
View author publications
You can also search for this author in PubMed Google Scholar
Jose M. Alvarez
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Partha Pratim Roy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chakraborty, D., Roy, P.P., Saini, R. et al. Frame selection for OCR from video stream of book flipping. Multimed Tools Appl 77, 985–1008 (2018). https://doi.org/10.1007/s11042-016-4292-3

Download citation

Received: 22 June 2016
Revised: 07 November 2016
Accepted: 20 December 2016
Published: 06 January 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11042-016-4292-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Frame selection for OCR from video stream of book flipping

Abstract

Access this article

Similar content being viewed by others

Optical Character Recognition for Alphanumerical Character Verification in Video Frames

A Novel Video Reconstruction of Randomized Frames Using ORB Descriptor

Realtime flicker removal for fast video streaming and detection of moving objects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Frame selection for OCR from video stream of book flipping

Abstract

Access this article

Similar content being viewed by others

Optical Character Recognition for Alphanumerical Character Verification in Video Frames

A Novel Video Reconstruction of Randomized Frames Using ORB Descriptor

Realtime flicker removal for fast video streaming and detection of moving objects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation