Abstract
Optical Character Recognition (OCR) in video stream of flipping pages is a challenging task because flipping at random speed causes difficulties in identifying the frames that contain the open page image (OPI). Also, low resolution, blurring effect, shadow, etc., add significant noise in selection of proper frames for OCR. In this paper, we focus on identifying a set of representative frames from the video stream of flipping pages without using any explicit hardware and then perform OCR on these frames for recognition. Thus, an end-to-end solution is proposed for video stream of flipping pages. To select an OPI, we present an efficient algorithm that exploits cues from edge information during flipping event. These cues, extracted from the region of interest (ROI) of the frame, determine the flipping or open state of a page. The open state classification is performed by an SVM classifier following training of the edge cue information. After selecting a set of frames for each OPI, a representative frame from OPI set is chosen for OCR. Experiments are performed on videos captured using standard resolution camera. We have obtained 88.81 % accuracy on representative frame selection from the proposed method whereas when compared with GIST (Oliva and Torralba, Int J Comput Vis 42(3):145–175 (2001)), the accuracy was only 51.28 %. To the best of our knowledge this is the first work in this area. After frame selection, we have achieved 83.31 % character recognition accuracy and 78.11 % word recognition accuracy with traditional OCR in our dataset of flipping book.
















Similar content being viewed by others
References
Breuel TM (2008) The OCRopus open source OCR system. In: Proceedings of DRR
Bosamiya JH, Agrawal P, Roy PP, Balasubramanian R (2015) Script independent scene text segmentation using fast stroke width transform and GrabCut. In: 3rd IAPR Asian conference on pattern recognition (ACPR), pp 151–155
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698
Chakraborty D, Roy PP, Pal U, Alvarez JM (2013) OCR from video stream of book flipping. In: Proceedings of the 2nd Asian conference on pattern recognition, pp 130–134
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Cunzhao S, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2961–2968
Das D, Datong C, Hauptmann AG (2008) Improving multimedia retrieval with a video OCR, pp 68200B-68200B
Fujinami K, Inagawa N (2009) Page-flipping detection and information presentation for implicit interaction with a book. Int J Multimedia Ubiquit Eng 93–112
Hearn D, Baker MP (1994) Computer graphics. Addison-Wesley
Iwamura M, Tsuji T, Horimatsu A, Kise K (2009) Real-time camera-based recognition of characters and pictograms. In: International conference on document analysis and recognition, pp 76–80
Lee CW, Jung K, Kim HJ (2003) Automatic text detection and removal in video sequences. Pattern Recogn Lett 24:2607–2623
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268
Micusik B, Wildenauer H, Kosecka J (2008) Detection and matching of rectilinear structures. In: Proceedings of computer vision and pattern recognition (CVPR), pp 1–7
Mishra A, Karteek A, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2687–2694
Nakashima T, Watanabe Y, Komuro T, Ishikawa M (2009) Book flipping scanning. In: Symposium on user interface software and technology, vol 22, pp 79–80
Neumann L, Matas J (2012) Real-time scene text localization and recognition, pp 3538–3545
Ngo CW, Chan CK (2005) Video text detection and segmentation for optical character recognition. Multimedia Systems 10(3):261–272
Niblack W (1986) An introduction to digital image processing. Prentice Hall, pp 115–116
Ojala T, Pietikäinen M, Mäenpää T (2001) A generalized Local Binary Pattern operator for multiresolution gray scale and rotation invariant texture classification. In: Proceedings of international conference on advances in pattern recognition, pp 399–408
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of spatial envelope. Int J Comput Vis 42(3):145–175
onlineocr.net - Free Online OCR service, convert scanned PDF and images to Word, Text
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U (2013) HMM-based multi oriented text recognition in natural scene image. In: 2nd IAPR Asian conference on pattern recognition, pp 288–292
Sato T, Kanade T, Hughes EK, Smith MA (1998) Video OCR for digital news archive, In Proc. In: IEEE international workshop on content-based access of image and video database, pp 52–60
Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236
Shibayama H, Watanabe Y, Ishikawa M (2012) Reconstruction of 3D surface and restoration of flat document image from monocular image sequence. In: Asian conference on computer vision, pp 350–364
Shivakumara P, Bhowmick S, Su B, Tan CL, Pal U (2011) A new gradient based character segmentation method for video text recognition. In: International conference on document analysis and recognition, pp 126–130
Singh M, Kaur A (2015) An efficient hybrid scheme for key frame extraction and text localization in video. In: International conference on advances in computing, communications and informatics, pp 1250–1254
Smith R (2007) An overview of the Tesseract OCR engine. In: Proceedings of 9th international conference on document analysis and recognition, pp 629–633
Su B, Lu S, Tan CL (2013) Robust document image binarization technique for degraded document images. IEEE Trans Image Processing 22(4):1408–1417
Vajda S, Rothacker L, Fink GA (2011) A method for camera-based interactive whiteboard reading
Vapnik V (1995) The nature of statistical learning theory. Springer Verlang
Wu V, Manmatha R, Riseman EM (1997) Finding text in images. In: Proceedings of ACM international conference on digital libraries, pp 23–26
Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In: IEEE conference on computer vision and pattern recognition, pp 4042–4049
Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605
Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Yin XC, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937
Zang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: Proceedings of 8th international association pattern recognition international workshop document analysis systems, pp 5–17
Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. In: Proceedings of 3rd international conference document analysis and recognition, p 146
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chakraborty, D., Roy, P.P., Saini, R. et al. Frame selection for OCR from video stream of book flipping. Multimed Tools Appl 77, 985–1008 (2018). https://doi.org/10.1007/s11042-016-4292-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4292-3