A novel mutual nearest neighbor based symmetry for text frame classification in video
Research Highlights
► A new wavelet–median moment feature to enhance gap between text and non-text pixel. ► Probable text blocks selection (PTBS) using k-means clustering among 16 blocks. ► Max–Min clustering to obtain dominant and high contrast pixels. ► A new mutual nearest neighbor concept (MMNS) to identify a true text block. ► The combination of PTBS and MNNS for achieving better results.
Introduction
Text frame classification aims to classify frames among a large collection of video frames into text and non-text frames. It is useful in applications such as video browsing, event detection, event boundary detection, text tracking, and text detection and extraction. Due to the semantic gap between low-level features and high-level events, it is difficult to come up with a generic Content-based Image Retrieval (CBIR) method or automatic annotation method to achieve a high accuracy of event detection [1]. In addition, the dynamic nature of events such as sports further complicates the analysis and impedes the implementation of such live event detection. In view of this difficulty, event detection is realized by detecting and recognizing the starting texts of the games or events involved. Therefore, to build a computationally efficient and accurate event detection system, accurate text frame classification is required before text detection and recognition [2]. However, no method exists in the literature that solely works on text frame classification.
While text frame classification invariably makes use of text detection techniques, it differs from the usual text detection methods in the following respects: (1) text frame classification is basically a screening process prior to text detection and recognition, (2) text frame classification should be simple and fast in order to quickly identify a frame as text or non-text, (3) text frame classification helps to reduce computational burden by avoiding expensive text detection methods on given unknown video frames many of which may turn out to just non-text, and (4) many existing text detection methods assume that the given input is a text frame and hence false positives may occur when a non-text frame is fed as input. In this paper, we propose a text frame classification method by dividing a video frame into small windows, which we call “blocks”, to look for probable text pixels among these blocks using a mutual nearest neighborhood symmetry concept. Any block that detects the presence of text is used as an indication that the frame under testing is a text frame. The rest of the paper is outlined as follows: In the next section, we will survey related works. We will present our proposed method in detail in Section 3, followed by a series of experiments in Section 4. Section 5 concludes this paper with discussions on future works.
Section snippets
Related work
The closest related work is that of Li et al. [3] for video text tracking. The system includes a component for text frame classification to find the first text frame in a video stream in order to start text tracking. The method of text frame classification is based on a supervised learning method using a neural network classifier. The method is thus dependent on the training set and requires considerable training time for the use of the neural network classifier. It serves also a different
The proposed method
The text detection methods surveyed in the preceding section cannot be used for text frame classification directly as video contains a large number of text and non-text frames. Besides, text detection methods generally work at the pixel level to locate text in video images and hence it is a time consuming process when non-text frames are fed as input for text detection. Thus there is a necessity for frame screening over a large number of video frames to know text frames before applying an
Experimental results
As there is no standard dataset for text frame classification available in the literature and this is the first attempt on frame classification, we create our own dataset which includes 1220 text frames and 800 non-text frames. We have done all the experimentations on a PC with P4 3 GHz processor with 1 GB RAM running Windows XP operating system. This dataset includes a variety of frames such as scene text, graphic text, various font sizes and scripts, various resolutions and background. We
Conclusion and future work
In this work, we have proposed a novel method for complex classification problem of text frames from large database containing key frames of both text and non-text. To the best of our knowledge, this is the first work that attempts to solve this text frame classification problem. The proposed Max–Min clustering approach helps in obtaining dominant and high contrast pixels to form text representatives for identifying text block. The main contribution of the work is in introducing the mutual
Acknowledgment
This work is done jointly by NUS and ISI, Kolkata, India. This research is also supported in part by the MDA grant R252-000-325-279 and A⁎STAR grant R252-000-402-305. Our special thanks to the anonymous reviewers for their constructive suggestions to improve the quality of the paper.
P. Shivakumara is a Research Fellow in the Department of Computer Science, School of Computing, National University of Singapore. He received B.Sc., M.Sc., M.Sc. Technology by research and Ph.D. degrees in computer science, respectively, in 1995, 1999, 2001 and 2005 from University of Mysore, Mysore, Karnataka, India. In addition to this, he has obtained educational degree (B.Ed.) in 1996 from Bangalore University, Bangalore, India.
From 1999 to 2005, he was Project Associate in the Department
References (35)
- et al.
Text information extraction in images and video: a survey
Pattern Recognition
(2004) Neural network-based text location in color images
Pattern Recognition Lett.
(2001)- et al.
Automatic text location in images and video frames
Pattern Recognition
(1998) - et al.
Fast and robust text detection in images and video frames
Image Vision Comput.
(2005) - et al.
Accurate video text detection through classification of low and high contrast images
Pattern Recognition
(2010) - et al.
Structuring low quality videotaped lectures for cross-reference browsing by video text analysis
Pattern Recognition
(2008) - et al.
A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning
Signal Process.: Image Commun.
(2004) - et al.
A new robust algorithm for video text extraction
Pattern Recognition
(2003) - et al.
Text detection, localization and tracking in compressed video
Signal Process.: Image Commun.
(2007) - C. Xu, J. Wang, K. Wan, Y. Li, L. Duan, Live sports event detection based on broadcast video and web-casting text, in:...
Automatic text detection and tracking in digital video
IEEE Trans. Image Process.
Automatic detection and recognition of signs from ,atural Scenes
IEEE Trans. Image Process.
Cited by (22)
A decisive content based image retrieval approach for feature fusion in visual and textual images
2019, Knowledge-Based SystemsLive detection of text in the natural environment using Convolutional Neural Network
2019, Future Generation Computer SystemsText/non-text image classification in the wild with convolutional neural networks
2017, Pattern RecognitionCitation Excerpt :We test our proposed method on ICDAR2003 to show that it works well on focused text images. In order to acquire intuitive and fair comparison results of the methods proposed in [10,11], we use the classification rate and the average processing time (APT) as the metrics. The results of different methods are listed in Table 3, which show that our method outperforms the video text frame classification methods [10,11].
Natural neighbor: A self-adaptive neighborhood method without parameter K
2016, Pattern Recognition LettersCitation Excerpt :Additionally, analogous to KNN and RkNN, the mutual k-nearest neighbor (MkNN) capture the inter-connectivity of adjacent regions. Brito et al. firstly use the connectivity properties of mutual nearest neighborhood graphs [15], and recently it is effectively used in classification [16,17] and clustering [18]. MkNN method reduces the computational complexity for large data sets.
Piece-wise linearity based method for text frame classification in video
2015, Pattern RecognitionCitation Excerpt :All the experiments are conducted on a PC with a Core i5 2.60 GHz Processor having 4 GB RAM running on the Windows 7 operating system. We have used Recall (R), Precision (P), F-measure (F), False Positive rate (FP) and Average Time Processing (AVT) as measures to show that text detection methods may not be suitable for text frame classification by testing on both text and non-text frames and have followed the instructions stated in [21]. More details about the three measures can be found in [21].
New Gradient-Spatial-Structural Features for video script identification
2015, Computer Vision and Image UnderstandingCitation Excerpt :This method explores wavelet-moments features with mutual nearest neighbor clustering to identify the blocks. According to the results reported in [56], the method gives a good precision for text block classification. We are inspired by the work presented in [57–59] for text recognition from natural scene images, where the authors propose a multiple hypotheses framework for text detection and recognition without segmenting text lines, words and characters.
P. Shivakumara is a Research Fellow in the Department of Computer Science, School of Computing, National University of Singapore. He received B.Sc., M.Sc., M.Sc. Technology by research and Ph.D. degrees in computer science, respectively, in 1995, 1999, 2001 and 2005 from University of Mysore, Mysore, Karnataka, India. In addition to this, he has obtained educational degree (B.Ed.) in 1996 from Bangalore University, Bangalore, India.
From 1999 to 2005, he was Project Associate in the Department of Studies in Computer Science, University of Mysore, where he conducted research on document image analysis, including document image mosaicing, character recognition, skew detection, face detection and face recognition. He worked as a Research Fellow in the field of image processing and multimedia in the Department of Computer Science, School of Computing, National University of Singapore, from 2005 to 2007. He also worked as a Research Consultant in Nanyang Technological University, Singapore for a period of 6 months on image classification in 2007. He has published around 90 research papers in national, international conferences and journals. He has been reviewer for several conferences and journals.
His research interests are in the area of image processing, pattern recognition, including text extraction from video, document image processing, biometric applications and automatic writer identification.
Anjan Dutta received the B.Sc. degree in Mathematics from the University of Calcutta, Kolkata, India in the year 2006 and MCA degree in Computer Applications from the West Bengal University of Technology, Kolkata, India in the year 2009. Currently he is doing his Master degree in Computer Vision and Artificial Intelligence from the Universitat Autònoma de Barcelona, Barcelona, Spain and also at the same time he is working as a Ph.D. student in the Computer Vision Centre, Barcelona, Spain under the supervision of Dr. Josep Lladós and Dr. Umapada Pal. His main research interests include Graphics Recognition, Structural Pattern Recognition using graph matching technique.
Trung Quy Phan is pursuing the graduate degree from the Department of Computer Science, School of Computing, National University of Singapore, Singapore.
He is currently a Research Assistant with the School of Computing, National University of Singapore, Singapore. His current research interests include image and video analysis.
Chew Lim Tan is a Professor in the Department of Computer Science, School of Computing, National University of Singapore. He received his B.Sc. (Hons.) degree in physics in 1971 from University of Singapore, his M.Sc. degree in radiation studies in 1973 from University of Surrey, UK, and his Ph.D. degree in computer science in 1986 from University of Virginia, U.S.A. His research interests include document image analysis, text and natural language processing, neural networks and genetic programming. He has published more than 300 research publications in these areas. He is an associate editor of Pattern Recognition, associate editor of Pattern Recognition Letters, an editorial member of the International Journal on Document Analysis and Recognition. He is a member of the Governing Board of the International Association of Pattern Recognition (IAPR). He is also a senior member of IEEE.
Umapada Pal received his Ph.D. from Indian Statistical Institute and his Ph.D. work was on the development of Printed Bangla OCR system. He did his Post Doctoral research on the segmentation of touching English numerals at Institut National de Recherche en Informatique et en Automatique (INRIA), France. During July 1997–January 1998 he visited GSF-Forschungszentrum fur Umwelt und Gesundheit GmbH, Germany to work as a guest scientist in a project on image analysis. From January 1997, he is a Faculty member of the Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata. His primary research is Digital Document Processing. He has published 160 research papers in various international journals, conference proceedings and edited volumes. In 1995, he received student best paper award from Chennai Chapter of Computer Society of India. He received a merit certificate from Indian Science Congress Association in 1996. Because of his significant impact in the Document Analysis research domain of Indian language, TC-10 and TC-11 committees of International Association for Pattern Recognition (IAPR) presented ‘ICDAR Outstanding Young Researcher Award’ to Dr. Pal in 2003. In 2005–2006 Dr. Pal has received JSPS fellowship from Japan government. Dr. Pal has been serving as a program committee member of many conferences including International Conference on Document Analysis and Recognition (ICDAR), International Workshop on Document Image Analysis for Libraries (DIAL), International Workshop on Frontiers of Handwritten Recognition (IWFHR), International Conference on Pattern recognition (ICPR), etc. Also, he is the Asian PC-Chair for 10th ICDAR to be held at Barcelona, Spain in 2009. He has served as the guest editor of special issue of VIVEK journal on Document image analysis of Indian scripts. Also currently he is co-editing a Special issue of the journal of Electronic Letters on Computer Vision and Image Analysis. He is a life member of Indian unit of IAPR (IUPRAI) and senior life member of Computer Society of India.