skip to main content
10.1145/1027527.1027724acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Incremental detection of text on road signs from video with application to a driving assistant system

Published: 10 October 2004 Publication History

Abstract

This paper proposes a fast and robust framework for incrementally detecting text on road signs from natural scene video. The new framework makes two main contributions. First, the framework applies a Divide-and-Conquer strategy to decompose the original task into two sub-tasks, that is, localization of road signs and detection of text. The algorithms for the two sub-tasks are smoothly incorporated into a unified framework through a real time tracking algorithm. Second, the framework provides a novel way for text detection from video by integrating 2D features in each video frame (e.g., color, edges, texture) with 3D information available in a video sequence (e.g., object structure). The feasibility of the proposed framework has been evaluated on the video sequences captured from a moving vehicle. The new framework can be applied to a driving assistant system and other tasks of text detection from video.

References

[1]
Chen, D., Odobez, J.M., and Bourlard, H. Text detection and recognition in images and video frames. Pattern Recognition, 37, 3 (Mar. 2004), 595--608.
[2]
Chen, X., Yang, J., Zhang, J., and Waibel, A. Automatic detection and recognition of signs from natural scenes. IEEE Trans. on IP, 13, 1 (Jan. 2004), 87--99.
[3]
Clark, P., and Mirmehdi, M. Estimating the orientation and recovery of text planes in a single image. In Proc. of the 12th British Machine Vision Conference, 2001, 421--430.
[4]
Fang, C.-Y., Fuh, C-.S., Chen, S.-W., and Yen, P.-S. A road sign recognition system based on dynamic visual model. In Proc. of the CVPR, 2003, I: 750--755.
[5]
Gandhi, T., Kasturi, R., and Antani, S. Application of planar motion segmentation for scene text extraction. In Proc. of the ICPR, 2000, I: 445--449.
[6]
Haritaoglu, E.D., and Haritaoglu, I. Real time image enhancement and segmentation for sign/text detection. In Proc. of the ICIP, 2003, III: 993--996.
[7]
Jain, A.K., and Yu, B. Automatic text location in images and video frames. Pattern Recognition, 31, 12 (Dec. 1998), 2055--2076.
[8]
Kastrinaki, V., Zervakis, M., and Kalaitzakis, K. A survey of video processing techniques for traffic applications. Image and Vision Computing, 21, 4 (Apr. 2003), 359--381.
[9]
Lee, C.W., Jung, K., and Kim, H.J. Automatic text detection and removal in video sequences. Pattern Recognition Letters, 24, 15 (Nov. 2003), 2607--2623.
[10]
Li, H., Doermann, D. and Kia, O. Automatic text detection and tracking in digital video. IEEE Trans. on IP, 9, 1(Jan. 2000), 147--156.
[11]
Lienhart, R. Automatic text recognition for video indexing. In Proc. of ACM Multimedia (Nov. 1996), 11--20.
[12]
Lienhart, R., and Wernicke, A. Localizing and segmenting text in images and videos. IEEE Trans. on CSVT, 12,4 (Apr. 2002), 256--268.
[13]
Lucas, B. D., and Kanade, T. An iterative image registration technique with an application to stereo vision. In Proc. of the IJCAI (1981), 674--679.
[14]
http://www.fhwa.dot.gov/, Manual on Uniform Traffic Control Devices.
[15]
Myers, G. Bolles, R., Luong, Q.-T., and Herson, J. Recognition of text in 3-D scenes. In Proc. of the 4th Symp. on Document Image Understanding Technology(2001), pp. 23--25.
[16]
Sato, T., Kanade, T., Hughes, E.K., and Smith, M.A. Video OCR for digital news archives. In Proc. of the IEEE Int. Workshop on Content-Based Access of Image and Video Database (1998), 52--60.
[17]
Shi, J., and Tomasi, C. Good features to track. In Proc. of the CVPR (1994), I:593--600.
[18]
Wu, V., Manmatha, R., and Riseman, E,M. TextFinder: an automatic system to detect and recognize text in images, IEEE Trans. on PAMI, 21, 11 (Nov. 1999), 1224--1229.
[19]
Wu, Y., Yu, T., and Hua, G. Tracking Appearances with occlusions. In Proc. of the CVPR (2003), II: 789--795.
[20]
Zhang, D., and Chang, S. A Bayesian framework for fusing multiple word knowledge models in videotext recognition. In Proc. of the CVPR (2003), II: 528--533.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia
October 2004
1028 pages
ISBN:1581138938
DOI:10.1145/1027527
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. driving assistant system
  2. incremental text detection
  3. natural scene video
  4. road sign

Qualifiers

  • Article

Conference

MM04

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Dual Relation Network for Scene Text RecognitionIEEE Transactions on Multimedia10.1109/TMM.2022.317110825(4094-4107)Online publication date: 2023
  • (2022)Automated TSR Using DNN Approach for Intelligent VehiclesThe New Advanced Society10.1002/9781119884392.ch4(67-90)Online publication date: 18-Mar-2022
  • (2020)Robustly detect different types of text in videosNeural Computing and Applications10.1007/s00521-020-04729-6Online publication date: 27-Jan-2020
  • (2018)Multiorientation scene text detection via coarse-to-fine supervision-based convolutional networksJournal of Electronic Imaging10.1117/1.JEI.27.3.03303227:03(1)Online publication date: 7-Jun-2018
  • (2018)Online Video Text Detection with Markov Decision Process2018 13th IAPR International Workshop on Document Analysis Systems (DAS)10.1109/DAS.2018.20(103-108)Online publication date: Apr-2018
  • (2017)Real-Time Traffic Sign Detection Based on Multi-Frame Video ImagesComputer Science and Application10.12677/CSA.2017.7505707:05(463-472)Online publication date: 2017
  • (2017)Video Text Extraction and MiningMining Multimedia Documents10.1201/9781315399744-14(173-191)Online publication date: 2-May-2017
  • (2017)Tracking Based Multi-Orientation Scene Text DetectionIEEE Transactions on Image Processing10.1109/TIP.2017.269510426:7(3235-3248)Online publication date: 1-Jul-2017
  • (2017)A Unified Video Text Detection Method with Network Flow2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)10.1109/ICDAR.2017.62(331-336)Online publication date: Nov-2017
  • (2016)Scene text detection in video by learning locally and globallyProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3060991(2647-2652)Online publication date: 9-Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media