Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss

Gagnon, Langis; Foucher, Samuel; Heritier, Maguelonne; Lalonde, Marc; Byrns, David; Chapdelaine, Claude; Turner, James; Mathieu, Suzanne; Laurendeau, Denis; Nguyen, Nath Tan; Ouellet, Denis

doi:10.1007/s10209-008-0141-0

Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss

Long Paper
Published: 05 February 2009

Volume 8, pages 199–218, (2009)
Cite this article

Universal Access in the Information Society Aims and scope Submit manuscript

Langis Gagnon¹,
Samuel Foucher¹,
Maguelonne Heritier¹,
Marc Lalonde¹,
David Byrns¹,
Claude Chapdelaine¹,
James Turner²,
Suzanne Mathieu²,
Denis Laurendeau³,
Nath Tan Nguyen³ &
…
Denis Ouellet³

393 Accesses
19 Citations
6 Altmetric
Explore all metrics

Abstract

This paper presents the status of a R&D project targeting the development of computer-vision tools to assist humans in generating and rendering video description for people with vision loss. Three principal issues are discussed: (1) production practices, (2) needs of people with vision loss, and (3) current system design, core technologies and implementation. The paper provides the main conclusions of consultations with producers of video description regarding their practices and with end-users regarding their needs, as well as an analysis of described productions that lead to propose a video description typology. The current status of a prototype software is also presented (audio-vision manager) that uses many computer-vision technologies (shot transition detection, key-frame identification, key-face recognition, key-text spotting, visual motion, gait/gesture characterization, key-place identification, key-object spotting and image categorization) to automatically extract visual content, associate textual descriptions and add them to the audio track with a synthetic voice. A proof of concept is also briefly described for a first adaptive video description player which allows end users to select various levels of video description.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Show me where the action is!

Article Open access 02 September 2020

When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience

Introduction

References

Canadian Radio-television and Telecommunications Communication: Broadcasting Decision CRTC 2002-384. http://www.crtc.gc.ca/archive/ENG/Decisions/2002/db2002-384.htm (2002)
Piety, P.J.: The language system of audio description: an investigation as a discursive process. J. Vis. Impair. Blind. 98(8), 1–36 (2004)
Google Scholar
Turner, J.M.: Some characteristics of audio description and the corresponding moving image. In: Preston, C.M., Medford, N.J. (eds.) Proceedings of the 61st ASIS Annual Meeting, Pittsburgh, 24–29 October 1998, Information Today, pp. 108–117 (1998)
Turner, J.M., Colinet, E.: Using audio description for indexing moving images. Knowl. Org. 31(4), 222–230 (2004)
Google Scholar
Office of Communication: ITC guidance on standards for audio description. http://www.ofcom.org.uk/static/archive/itc/itc_publications/codes_guidance/audio_description/index.asp.html (2000)
Canadian Network for Inclusive Cultural Exchange: Online video description guidelines. http://cnice.utoronto.ca/guidelines/video.php (2005)
Guidelines for video description. http://www.joeclark.org/access/description/ad-principles.html
Mathieu, S.: Audiovision Interactive et Adaptable, Technical Report for the E-inclusion Research Network (2007)
Gagnon, L., Foucher, S., Laliberté, F., Lalonde, M., Beaulieu, M.: Towards an application of content-based video indexing to computer-assisted descriptive video. In: Proceedings of Computer and Robot Vision 2006, 8 pp (on CD-ROM) (2006)
Héritier, M., Gagnon, L., Foucher, S.: Places clustering of full-length film key-frames using latent aspects modeling over SIFT matches. IEEE Trans. Circuits Syst. Video Technol. (to appear) (2008)
Foucher, S., Gagnon, L.: Automatic detection and clustering of actor faces based on spectral clustering techniques. In: Proceedings of Computer and Robot Vision 2007, 8 pp (on CD-ROM) (2007)
Lalonde, M., Gagnon, L.: Key-text spotting in documentary videos using Adaboost. In: Proceedings of the IS&T/SPIE Symposium on Electronic Imaging: Applications of Neural Networks and Machine Learning in Image Processing X (SPIE #6064B) (2006)
Branje, C., Marshall, S., Tyndall, A., Fels, D.I.: LiveDescribe. In: Proceedings of the AMCIS 2006 (2006)
TRECVID. http://www-nlpir.nist.gov/projects/trecvid/
State-of-the-art on Multimedia Search Engines, Technical Report D2.1. Chorus Project Consortium (2007)
CIMWOS project. http://www.xanthi.ilsp.gr/cimwos
SCHEMA network of excellence. http://www.iti.gr/SCHEMA/index.html
VIZIR project. http://vizir.ims.tuwien.ac.at/index.html
Center for Digital Video Processing. http://www.cdvp.dcu.i.e
CALIPH and EMIR project. http://caliph-emir.sourceforge.net
IBM VideoAnnEx project. http://www.research.ibm.com/VideoAnnEx
Ricoh MovieTool project. http://www.ricoh.co.jp/src/multimedia/MovieTool
IBM Marvel project. http://mp7.watson.ibm.com/marvel
MADIS project. http://madis.crim.ca
Gagnon, L., Foucher, S., Gouaillier, V., Brousseau, J., Boulianne, G., Osterrath, F., Chapdelaine, C., Brun, C., Dutrisac, J., St-Onge, F., Champagne, B., Lu, X.: MPEG-7 Audio-Visual Indexing Test-Bed for Video Retrieval, IS&T/SPIE Electronic Imaging 2004: Internet Imaging V (SPIE #5304), pp. 319–329 (2003)
Foucher, S., Héritier, M., Lalonde, M., Byrns, D., Chapdelaine, C., Gagnon, L.: Proof-of-concept software tools for video content extraction applied to computer-assisted descriptive video, and results of consultations with producers, technical report, CRIM-07/04-07, 2007 (2007)
Mathieu, S., Turner, J.M.: Audiovision interactive et adaptable, technical report, 2007. http://hdl.handle.net/1866/1307 (2007)
Turner, J.M., Mathieu, S.: Audio description for indexing films, World Library and Information Congress (IFLA), Durban. http://members.e-inclusion.crim.ca/files/articles/IFLA-en.pdf (2007)
Fels, D.I., Udo, J.P., Diamond, J.E., Diamond, J.I.: A first person narrative approach to video description for animated comedy. J. Vis. Impair. Blind. 100(5), 295–305 (2006)
Google Scholar
Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimed. 4(4), 492–499 (2000)
Article Google Scholar
Bovik, A.C. (ed.): Handbook of Image and Video Processing. Academic Press, New York (2000)
Schaffalitzky, F., Zisserman, A.: Automated location matching in movies. Comput. Vis. Image Underst. 42:236–264 (2003)
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR (1999)
Bosch, A., Zisserman, A., Munoz, S.: Scene Classification via pLSA. In: ECCV (2006)
Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling Scenes with Local Descriptors and Latent Aspects. In: ICCV (2005)
Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2005)
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects categories in image collection, MIT AI Lab Memo AIM-2005-005 (2005)
Lowe, D.G.: Distinctive Image Features from Scale-invariant Keypoints. In: IJCV (2004)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Article MATH Google Scholar
Ng, A.Y., Jordan, M., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: NIPS (2002)
Viola, P., Jones, M.: Robust real-time face detection. IJCV 57(2) (2004)
Gagnon, L., Laliberté, F., Foucher, S., Laurendeau, D., Branzan Albu, A.: A System for Tracking and Recognizing Pedestrian Faces using a Network of Loosely Coupled Cameras, SPIE Defense and Security: Visual Information Processing XV (SPIE #6246), Orlando (2006)
Yang, J., Zhang, D., Frangi, A.F., Yanf, J.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. Trans. Pattern Anal. Mach. Intell. 26(1), 131–137 (2004)
Article Google Scholar
Kong, H., Li, X., Wang, L., Teoh, E.K., Wang, J.G., Venkateswarlu, R.: Generalized 2D principal component analysis. In: IEEE International Joint Conference on Neural Networks (IJCNN) (2005)
Zhang, D., Zhou, Z.H., Chen, S.: Diagonal principal component analysis for face recognition. Pattern Recognit. 39(1), 140–142 (2006)
Article Google Scholar
Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: VideoOCR: indexing digital news libraries by recognition of superimposed caption. ACM J. Multime. Syst. 7(5), 385–395 (1999)
Article Google Scholar
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuits Syst. Video Technol. 12(4), 256–268 (2002)
Article Google Scholar
Chen, X., Yuille, A.L.: Detecting and Reading Text in Natural Scenes. In: CVPR, vol. II, pp. 366–373 (2004)
http://www.up.univ-mrs.fr/veronis/data/bigrammes.html
Ouellet, D., Nguyen, N.T., Dung, V.V., Laurendeau, D.: Gait and Gesture Description, Technical Report, Laval University (2007)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
Tomasi, C., Kanade, T.: Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU-CS-91-132 (1991)
Birchfield, S.: KLT: an Implementation of the Kanade-Lucas-Tomasi Feature Tracker. http://www.ces.clemson.edu/~stb/klt
Bailer, W., Schallauer, P., Thallinger, G.: Camera Motion Detection, Joanneum Research. In: TRECVID (2005)
Birchfield, S.: Derivation of Kanade-Lucas-Tomasi Tracking Equation. http://www.ces.clemson.edu/~stb/klt/birchfield-klt-derivation.pdf (unpublished) (1997)
Bezdec, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York (1981)
Rote, G.: Computing the minimum Hausdorff distance between two point sets on a line under translation. Inf. Process. Lett. 38, 123–127 (1991)
Article MATH MathSciNet Google Scholar
AVISynth. http://avisynth.org

Download references

Acknowledgments

This work is supported in part by the Department of Canadian Heritage (http://www.pch.gc.ca) through Canadian Culture Online, and the Ministère du développement économique, de l’innovation et de l’exportation (MDEIE) of the Gouvernement du Québec (http://www.mdeie.gouv.qc.ca). The authors are very grateful to the reviewers for their constructive comments, which helped improve the quality of the paper.

Author information

Authors and Affiliations

R&D Department, Computer Research Institute of Montreal (CRIM), 550 Sherbrooke West, Suite 100, Montreal, QC, H3A 1B9, Canada
Langis Gagnon, Samuel Foucher, Maguelonne Heritier, Marc Lalonde, David Byrns & Claude Chapdelaine
École de bibliothéconomie et des sciences de l’information, Université de Montréal, Montreal, QC, H3C 3J7, Canada
James Turner & Suzanne Mathieu
Department of Electrical and Computer Engineering, Laval University, Quebec, QC, G1K 7P4, Canada
Denis Laurendeau, Nath Tan Nguyen & Denis Ouellet

Authors

Langis Gagnon
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Foucher
View author publications
You can also search for this author in PubMed Google Scholar
Maguelonne Heritier
View author publications
You can also search for this author in PubMed Google Scholar
Marc Lalonde
View author publications
You can also search for this author in PubMed Google Scholar
David Byrns
View author publications
You can also search for this author in PubMed Google Scholar
Claude Chapdelaine
View author publications
You can also search for this author in PubMed Google Scholar
James Turner
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne Mathieu
View author publications
You can also search for this author in PubMed Google Scholar
Denis Laurendeau
View author publications
You can also search for this author in PubMed Google Scholar
Nath Tan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Denis Ouellet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Langis Gagnon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gagnon, L., Foucher, S., Heritier, M. et al. Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss. Univ Access Inf Soc 8, 199–218 (2009). https://doi.org/10.1007/s10209-008-0141-0

Download citation

Published: 05 February 2009
Issue Date: August 2009
DOI: https://doi.org/10.1007/s10209-008-0141-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss

Abstract

Access this article

Similar content being viewed by others

Show me where the action is!

When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience

Introduction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss

Abstract

Access this article

Similar content being viewed by others

Show me where the action is!

When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience

Introduction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation