Skip to main content
Log in

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A framework, called Table of Content-Analytical Index (ToCAI), for the content description of multimedia material is presented. The idea for such a description scheme (DS) comes out from the structures used for indexing technical books (containing a Table of Content, typically placed at the beginning of the book, where the list of topics is organized hierarchically into chapters, sections, and an Analytical Index, typically placed at the end of the book, where keywords are listed alphabetically). The ToCAI description scheme provides similarly a hierarchical description of the time sequential structure of a multimedia document (ToC), suitable for browsing, and an “Analytical Index” (AI) of audio-visual key items for the document, suitable for effective retrieval. Besides two other sub-description schemes are proposed to specify the program category and the description of other metadata associated to the multimedia document in the general DS. The detailed structure of the DS is presented by means of a UML diagram. Moreover, some suitable automatic extraction methods for the identification of the values associated to the descriptors that compose the ToCAI are presented and discussed. Finally, a browsing application example is also proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Adami, A. Bugatti, R. Leonardi, P. Migliorati, and L. Rossi, “ISO/IEC JTC1/SC29/WG11/M4586: The TOCAI DS for audio-visual documents. Structure and concepts,” MPEG-7, Seoul, Korea, March 1999.

    Google Scholar 

  2. N. Adami and R. Leonardi, “Identification of editing effects in image sequences by statistical modelling,” in Proc. of the 1999 Picture Coding Symposium, Portland, OR, U.S.A., April 1999.

  3. P. De Souza, “A statistical approach to the design of an adaptive self-normalizing silence detector, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 3, No. 31, pp. 678–684, 1983.

    Google Scholar 

  4. A. Ferman, A. Tekalp, and R. Mehrotra, “Effective content representation for video,” in Proc. IEEE International Conference Image Processing, Chicago, IL, Oct. 1998.

  5. M. Fowler, UML Distilled, Addison-Wesley, 1997.

  6. J. Foote, “A similarity measure for automatic audio classification,” Proc. AAAI'97 Spring Symposium on Intelligent Integration and Use of Text, Image, Video and Audio Corpora, 1997.

  7. O.N. Gerek and Y. Altunbasak, “Key frame selection from MPEG video,” in Proc. SPIE Visual Communications and Image Processing, 1997, Vol. 3024, pp. 920–925.

    Google Scholar 

  8. L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpon, “An improved endpoint detector for isolatedword recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 4, No. 29, pp. 777–785, 1981.

    Google Scholar 

  9. C. Montaci and M. Caraty, “A silence/noise/music/speech algorithm,” in International Conference on Spoken Language Processing, Sidney, 1998.

  10. MPEG Requirement Group, MPEG7, “Context and Objectives. ISO/IEC JTC1/SC29/WG11 N2460,” MPEG98, Atlantic City, USA, Oct. 1998.

    Google Scholar 

  11. MPEG Requirement Group, MPEG7, “Requirements. ISO/IEC JTC1/SC29/WG11 N2461,” MPEG98, Atlantic City, USA, Oct. 1998.

    Google Scholar 

  12. L. Rabiner and R. Schafer, Digital Processing of Speech Signals, Prentice Hall, Alan Oppenheim editor.

  13. Y. Rui, T. Huang, and S. Mehrotra, “Browsing and retrieving video content in a unified framework,” in Proc. IEEE Workshop on Multimedia Signal Processing, Dec. 1998.

  14. C. Saraceno, “Content-based representation and analysis of video sequences by joint audio and visual characterization,” Ph.D. thesis, Brescia, 1998.

  15. C. Saraceno and R. Leonardi, “Indexing audio-visual databases through a joint audio and video processing,” International Journal of Imaging Systems and Technology, 1998. Vol. 9, No. 5, pp. 320–331.

    Google Scholar 

  16. C. Saraceno and R. Leonardi, “Identification of story units in audio-visual sequences by joint audio and video processing,” in Proc. International Conference on Image Processing, Chicago, IL, U.S.A., Oct. 1998.

  17. J. Saunders, “Real Time discrimination of broadcast music/speech,” in Proc. ICASSP-1996, 1996, pp. 993–996.

  18. I.K. Sethi and N. Patel, “A statistical approach to scene change detection,” in Proc. of the SPIE Conf. on Storage and Retrieval for Image and Video Databases III, SPIE-2420. Feb. 1995, pp. 329–338.

  19. S. Smoliar and L. Wilcox, “Indexing the content of multimedia documents,” in Proc. Second International Conference on Visual Information Systems, San Diego, CA, 1997.

  20. T. Zhang and C.-C. Jay Kuo, “Audio-guided audiovisual data segmentation and indexing,” in IS&T/SPIE's Symposium on Electronic Imaging Science & Technology—Conference on Storage and Retrieval for Image and Video Databases. San Jose, Jan. 1999, Vol. 7, No. 3656, pp. 316–327.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adami, N., Bugatti, A., Leonardi, R. et al. The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents. Multimedia Tools and Applications 14, 153–173 (2001). https://doi.org/10.1023/A:1011347200133

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011347200133

Navigation