Skip to main content
Log in

A video-based framework for the analysis of presentations/posters

  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract.

Detection and recognition of textual information in an image or video sequence is important for many applications. The increased resolution and capabilities of digital cameras and faster mobile processing allow for the development of interesting systems. We present an application based on the capture of information presented at a slide-show presentation or at a poster session. We describe the development of a system to process the textual and graphical information in such presentations. The application integrates video and image processing, document layout understanding, optical character recognition (OCR), and pattern recognition. The digital imaging device captures slides/poster images, and the computing module preprocesses and annotates the content. Various problems related to metric rectification, key-frame extraction, text detection, enhancement, and system integration are addressed. The results are promising for applications such as a mobile text reader for the visually impaired. By using powerful text-processing algorithms, we can extend this framework to other applications, e.g., document and conference archiving, camera-based semantics extraction, and ontology creation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Clarke JC, Carlsson S, Zisserman A (1996) Detecting and tracking linear features efficiently In: Proceedings of the British Machine Vision Conference (BMVC 1996)

  2. Comaniciu D, Meer P (1999) Mean shift analysis and applications. In: IEEE international conference on computer vision, pp 1197-1203

  3. Clark P, Mirmehdi M (2001) Estimating the orientation and recovery of text planes in a single image. In: Proceedings of the British Machine Vision Conference, pp 421-430

  4. Clark P, Mirmehdi M (2002) Recognising text in real scenes. In: Int J Doc Anal Recog 4(4):243-257

  5. Capel D, Zisserman A (2000) Super-resolution enhancement of text image sequence. In: International conference on pattern recognition, 1:600-605

  6. Devernay F (1995) A non-maxima suppression method for edge detection with sub-pixel accuracy. Technical report RR 2724, INRIA

  7. Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: 7th international conference on document analysis and recognition, 1:606-617

  8. Forsyth DA, Ponce J (2003) Computer vision: a modern approach. Prentice Hall, Englewood Cliffs, NJ

  9. Ferreira S, Thillou C, Gosselin B (2003) From picture to speech: an innovative OCR application for embedded environment. In: Proceedings of the 14th ProRISC workshop on circuits, systems and signal processing (ProRISC 2003)

  10. Foroosh H (Shekarforoush), Zerubia J, Berthod M (2002) Extension of phase correlation to sub-pixel registration. In: IEEE Trans Image Process 11(3):188-200

  11. Gumerov N, Zandifar A, Duraiswami R, Davis LS (2004) Structure of applicable surfaces from single views. European conference on computer vision (ECCV2004), pp 482-496

  12. Hartley R, Zissermann A (2000) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, UK

  13. Jain AK, Bhattacharjee S (1992) Text segmentation using Gabor filters for automatic document processing. In: Mach Vis Appl 5(3):169-184

  14. Kuglin C, Hines D (1975) The phase correlation image alignment method. In: Proceedings of the international conference on cybernetics, 12:163-165

  15. Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. In: IEEE Trans Circuits Syst Video Technol 12(4):256-268

  16. Liebowitz D, Zisserman A (1998) Metric rectification for perspective images of planes. In: IEEE conference on computer vision and pattern recognition, pp 482-488

  17. Liebowitz D (2001) Camera calibration and reconstrcution of geomtery from images. In: PhD dissertation, Oxford University

  18. Pilu M (2001) Extraction of illusory linear clues in perspectively skewed documents. In: IEEE conference on computer vision and pattern recognition, pp 363-368

  19. Mirmehdi M, Palmer PL, Kittler J (1997) Towards optimal zoom for automatic target recognition. In: Proceedings of the 10th SCIA, 1:447-453

  20. Newman W, Dance C, Taylor A, Taylor S, Taylor M, Aldhous T (1999) CamWorks: a video-based tool for efficient capture from paper source documents. In: Procedeeings of ICMCS, pp 647-653

  21. Faugeras O (1995) Stratification of 3-D vision: projetcive, affine, and metric representations. In: J Opt Soc Am 12(3):465-484

  22. Intel Image Processing Open Computer Vision (OpenCV) Library http://www.intel.com/mrl/research/opencv

  23. Scansoft2000 (OCR software) http://www.scansoft.com/devkit/docimage.asp

  24. Taylor MJ, Dance CR (1998) Enhancement of document images from cameras. In: Proceedings of IS&T/SPIE EIDR V, pp 230-241

  25. Torkkola K (2002) Discriminative features for document classification. In: Proceedings of the 16th international conference on pattern recognition, 1:472-475

  26. Trier OD, Taxt T (1995) Evaluation of binarization methods for document images. In: IEEE Trans Pattern Anal Mach Intell 17(3):312-315

  27. Van Hateren JH, Van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in the primary visual cortex. In: Proc R Soc Lond B 265(1394):359-366

  28. Wallick MN, Lobo NDV, Shah M (2000) Computer vision framework for analyzing projections from video of lectures. In: Proceedings of the ISCA 9th international conference on intellegent systems

  29. Wallick MN, Lobo NDV, Shah M (2001) A system for placing videotaped and digital lectures online. In: IEEE 2001 international symposium on intelligent multimedia, video and speech processing (ISIMP)

  30. Wu V, Manmatha R, Riseman EM (1999) extFinder: an automatic system to detect and recognize text in images. In: IEEE Trans Pattern Anal Mach Intell 21(11):1224-1229

  31. Zandifar A, Chahine A, Duraiswami R, Davis LS (2002) Video-based interface to textual information for the visually impaired. In: IEEE international conference on multimodal interfaces (ICMI), pp 325-330

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Zandifar.

Additional information

Received: 18 December 2003, Revised: 1 November 2004, Published online: 2 February 2005

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zandifar, A., Duraiswami, R. & Davis, L.S. A video-based framework for the analysis of presentations/posters. IJDAR 7, 178–187 (2005). https://doi.org/10.1007/s10032-004-0137-0

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-004-0137-0

Keywords

Navigation