A video-based framework for the analysis of presentations/posters

Zandifar, A.; Duraiswami, R.; Davis, L. S.

doi:10.1007/s10032-004-0137-0

A. Zandifar¹,
R. Duraiswami¹ &
L. S. Davis¹

132 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract.

Detection and recognition of textual information in an image or video sequence is important for many applications. The increased resolution and capabilities of digital cameras and faster mobile processing allow for the development of interesting systems. We present an application based on the capture of information presented at a slide-show presentation or at a poster session. We describe the development of a system to process the textual and graphical information in such presentations. The application integrates video and image processing, document layout understanding, optical character recognition (OCR), and pattern recognition. The digital imaging device captures slides/poster images, and the computing module preprocesses and annotates the content. Various problems related to metric rectification, key-frame extraction, text detection, enhancement, and system integration are addressed. The results are promising for applications such as a mobile text reader for the visually impaired. By using powerful text-processing algorithms, we can extend this framework to other applications, e.g., document and conference archiving, camera-based semantics extraction, and ontology creation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Clarke JC, Carlsson S, Zisserman A (1996) Detecting and tracking linear features efficiently In: Proceedings of the British Machine Vision Conference (BMVC 1996)
Comaniciu D, Meer P (1999) Mean shift analysis and applications. In: IEEE international conference on computer vision, pp 1197-1203
Clark P, Mirmehdi M (2001) Estimating the orientation and recovery of text planes in a single image. In: Proceedings of the British Machine Vision Conference, pp 421-430
Clark P, Mirmehdi M (2002) Recognising text in real scenes. In: Int J Doc Anal Recog 4(4):243-257
Capel D, Zisserman A (2000) Super-resolution enhancement of text image sequence. In: International conference on pattern recognition, 1:600-605
Devernay F (1995) A non-maxima suppression method for edge detection with sub-pixel accuracy. Technical report RR 2724, INRIA
Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: 7th international conference on document analysis and recognition, 1:606-617
Forsyth DA, Ponce J (2003) Computer vision: a modern approach. Prentice Hall, Englewood Cliffs, NJ
Ferreira S, Thillou C, Gosselin B (2003) From picture to speech: an innovative OCR application for embedded environment. In: Proceedings of the 14th ProRISC workshop on circuits, systems and signal processing (ProRISC 2003)
Foroosh H (Shekarforoush), Zerubia J, Berthod M (2002) Extension of phase correlation to sub-pixel registration. In: IEEE Trans Image Process 11(3):188-200
Gumerov N, Zandifar A, Duraiswami R, Davis LS (2004) Structure of applicable surfaces from single views. European conference on computer vision (ECCV2004), pp 482-496
Hartley R, Zissermann A (2000) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, UK
Jain AK, Bhattacharjee S (1992) Text segmentation using Gabor filters for automatic document processing. In: Mach Vis Appl 5(3):169-184
Kuglin C, Hines D (1975) The phase correlation image alignment method. In: Proceedings of the international conference on cybernetics, 12:163-165
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. In: IEEE Trans Circuits Syst Video Technol 12(4):256-268
Liebowitz D, Zisserman A (1998) Metric rectification for perspective images of planes. In: IEEE conference on computer vision and pattern recognition, pp 482-488
Liebowitz D (2001) Camera calibration and reconstrcution of geomtery from images. In: PhD dissertation, Oxford University
Pilu M (2001) Extraction of illusory linear clues in perspectively skewed documents. In: IEEE conference on computer vision and pattern recognition, pp 363-368
Mirmehdi M, Palmer PL, Kittler J (1997) Towards optimal zoom for automatic target recognition. In: Proceedings of the 10th SCIA, 1:447-453
Newman W, Dance C, Taylor A, Taylor S, Taylor M, Aldhous T (1999) CamWorks: a video-based tool for efficient capture from paper source documents. In: Procedeeings of ICMCS, pp 647-653
Faugeras O (1995) Stratification of 3-D vision: projetcive, affine, and metric representations. In: J Opt Soc Am 12(3):465-484
Intel Image Processing Open Computer Vision (OpenCV) Library http://www.intel.com/mrl/research/opencv
Scansoft2000 (OCR software) http://www.scansoft.com/devkit/docimage.asp
Taylor MJ, Dance CR (1998) Enhancement of document images from cameras. In: Proceedings of IS&T/SPIE EIDR V, pp 230-241
Torkkola K (2002) Discriminative features for document classification. In: Proceedings of the 16th international conference on pattern recognition, 1:472-475
Trier OD, Taxt T (1995) Evaluation of binarization methods for document images. In: IEEE Trans Pattern Anal Mach Intell 17(3):312-315
Van Hateren JH, Van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in the primary visual cortex. In: Proc R Soc Lond B 265(1394):359-366
Wallick MN, Lobo NDV, Shah M (2000) Computer vision framework for analyzing projections from video of lectures. In: Proceedings of the ISCA 9th international conference on intellegent systems
Wallick MN, Lobo NDV, Shah M (2001) A system for placing videotaped and digital lectures online. In: IEEE 2001 international symposium on intelligent multimedia, video and speech processing (ISIMP)
Wu V, Manmatha R, Riseman EM (1999) extFinder: an automatic system to detect and recognize text in images. In: IEEE Trans Pattern Anal Mach Intell 21(11):1224-1229
Zandifar A, Chahine A, Duraiswami R, Davis LS (2002) Video-based interface to textual information for the visually impaired. In: IEEE international conference on multimodal interfaces (ICMI), pp 325-330

Download references

Author information

Authors and Affiliations

Perceptual Interfaces and Reality Lab (PIRL), University of Maryland, College Park, MD 20742, USA
A. Zandifar, R. Duraiswami & L. S. Davis

Authors

A. Zandifar
View author publications
You can also search for this author in PubMed Google Scholar
R. Duraiswami
View author publications
You can also search for this author in PubMed Google Scholar
L. S. Davis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Zandifar.

Additional information

Received: 18 December 2003, Revised: 1 November 2004, Published online: 2 February 2005

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zandifar, A., Duraiswami, R. & Davis, L.S. A video-based framework for the analysis of presentations/posters. IJDAR 7, 178–187 (2005). https://doi.org/10.1007/s10032-004-0137-0

Download citation

Issue Date: July 2005
DOI: https://doi.org/10.1007/s10032-004-0137-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A video-based framework for the analysis of presentations/posters

Abstract.

Access this article

Similar content being viewed by others

Guided Search 6.0: An updated model of visual search

Cognitive Impairment Detection Based on Frontal Camera Scene While Performing Handwriting Tasks

The Pascal Visual Object Classes Challenge: A Retrospective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A video-based framework for the analysis of presentations/posters

Abstract.

Access this article

Similar content being viewed by others

Guided Search 6.0: An updated model of visual search

Cognitive Impairment Detection Based on Frontal Camera Scene While Performing Handwriting Tasks

The Pascal Visual Object Classes Challenge: A Retrospective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation