Abstract.
Detection and recognition of textual information in an image or video sequence is important for many applications. The increased resolution and capabilities of digital cameras and faster mobile processing allow for the development of interesting systems. We present an application based on the capture of information presented at a slide-show presentation or at a poster session. We describe the development of a system to process the textual and graphical information in such presentations. The application integrates video and image processing, document layout understanding, optical character recognition (OCR), and pattern recognition. The digital imaging device captures slides/poster images, and the computing module preprocesses and annotates the content. Various problems related to metric rectification, key-frame extraction, text detection, enhancement, and system integration are addressed. The results are promising for applications such as a mobile text reader for the visually impaired. By using powerful text-processing algorithms, we can extend this framework to other applications, e.g., document and conference archiving, camera-based semantics extraction, and ontology creation.
Similar content being viewed by others
References
Clarke JC, Carlsson S, Zisserman A (1996) Detecting and tracking linear features efficiently In: Proceedings of the British Machine Vision Conference (BMVC 1996)
Comaniciu D, Meer P (1999) Mean shift analysis and applications. In: IEEE international conference on computer vision, pp 1197-1203
Clark P, Mirmehdi M (2001) Estimating the orientation and recovery of text planes in a single image. In: Proceedings of the British Machine Vision Conference, pp 421-430
Clark P, Mirmehdi M (2002) Recognising text in real scenes. In: Int J Doc Anal Recog 4(4):243-257
Capel D, Zisserman A (2000) Super-resolution enhancement of text image sequence. In: International conference on pattern recognition, 1:600-605
Devernay F (1995) A non-maxima suppression method for edge detection with sub-pixel accuracy. Technical report RR 2724, INRIA
Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: 7th international conference on document analysis and recognition, 1:606-617
Forsyth DA, Ponce J (2003) Computer vision: a modern approach. Prentice Hall, Englewood Cliffs, NJ
Ferreira S, Thillou C, Gosselin B (2003) From picture to speech: an innovative OCR application for embedded environment. In: Proceedings of the 14th ProRISC workshop on circuits, systems and signal processing (ProRISC 2003)
Foroosh H (Shekarforoush), Zerubia J, Berthod M (2002) Extension of phase correlation to sub-pixel registration. In: IEEE Trans Image Process 11(3):188-200
Gumerov N, Zandifar A, Duraiswami R, Davis LS (2004) Structure of applicable surfaces from single views. European conference on computer vision (ECCV2004), pp 482-496
Hartley R, Zissermann A (2000) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, UK
Jain AK, Bhattacharjee S (1992) Text segmentation using Gabor filters for automatic document processing. In: Mach Vis Appl 5(3):169-184
Kuglin C, Hines D (1975) The phase correlation image alignment method. In: Proceedings of the international conference on cybernetics, 12:163-165
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. In: IEEE Trans Circuits Syst Video Technol 12(4):256-268
Liebowitz D, Zisserman A (1998) Metric rectification for perspective images of planes. In: IEEE conference on computer vision and pattern recognition, pp 482-488
Liebowitz D (2001) Camera calibration and reconstrcution of geomtery from images. In: PhD dissertation, Oxford University
Pilu M (2001) Extraction of illusory linear clues in perspectively skewed documents. In: IEEE conference on computer vision and pattern recognition, pp 363-368
Mirmehdi M, Palmer PL, Kittler J (1997) Towards optimal zoom for automatic target recognition. In: Proceedings of the 10th SCIA, 1:447-453
Newman W, Dance C, Taylor A, Taylor S, Taylor M, Aldhous T (1999) CamWorks: a video-based tool for efficient capture from paper source documents. In: Procedeeings of ICMCS, pp 647-653
Faugeras O (1995) Stratification of 3-D vision: projetcive, affine, and metric representations. In: J Opt Soc Am 12(3):465-484
Intel Image Processing Open Computer Vision (OpenCV) Library http://www.intel.com/mrl/research/opencv
Scansoft2000 (OCR software) http://www.scansoft.com/devkit/docimage.asp
Taylor MJ, Dance CR (1998) Enhancement of document images from cameras. In: Proceedings of IS&T/SPIE EIDR V, pp 230-241
Torkkola K (2002) Discriminative features for document classification. In: Proceedings of the 16th international conference on pattern recognition, 1:472-475
Trier OD, Taxt T (1995) Evaluation of binarization methods for document images. In: IEEE Trans Pattern Anal Mach Intell 17(3):312-315
Van Hateren JH, Van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in the primary visual cortex. In: Proc R Soc Lond B 265(1394):359-366
Wallick MN, Lobo NDV, Shah M (2000) Computer vision framework for analyzing projections from video of lectures. In: Proceedings of the ISCA 9th international conference on intellegent systems
Wallick MN, Lobo NDV, Shah M (2001) A system for placing videotaped and digital lectures online. In: IEEE 2001 international symposium on intelligent multimedia, video and speech processing (ISIMP)
Wu V, Manmatha R, Riseman EM (1999) extFinder: an automatic system to detect and recognize text in images. In: IEEE Trans Pattern Anal Mach Intell 21(11):1224-1229
Zandifar A, Chahine A, Duraiswami R, Davis LS (2002) Video-based interface to textual information for the visually impaired. In: IEEE international conference on multimodal interfaces (ICMI), pp 325-330
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 18 December 2003, Revised: 1 November 2004, Published online: 2 February 2005
Rights and permissions
About this article
Cite this article
Zandifar, A., Duraiswami, R. & Davis, L.S. A video-based framework for the analysis of presentations/posters. IJDAR 7, 178–187 (2005). https://doi.org/10.1007/s10032-004-0137-0
Issue Date:
DOI: https://doi.org/10.1007/s10032-004-0137-0