Skip to main content
Log in

A Survey of MPEG-1 Audio, Video and Semantic Analysis Techniques

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Digital audio & video data have become an integral part of multimedia information systems. To reduce storage and bandwidth requirements, they are commonly stored in a compressed format, such as MPEG-1. Increasing amounts of MPEG encoded audio and video documents are available online and in proprietary collections. In order to effectively utilise them, we need tools and techniques to automatically analyse, segment, and classify MPEG video content. Several techniques have been developed both in the audio and visual domain to analyse videos. This paper presents a survey of audio and visual analysis techniques on MPEG-1 encoded media that are useful in supporting a variety of video applications. Although audio and visual feature analyses have been carried out extensively, they become useful to applications only when they convey a semantic meaning of the video content. Therefore, we also present a survey of works that provide semantic analysis on MPEG-1 encoded videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. B. Adams, C. Dorai, and S. Venkatesh, “Study of lowercaseShot Length and Motion as Contributing Factors to Movie Tempo,” in Proc. ACM Multimedia, 2000, pp. 353–355.

  2. G. Ahanger and T.D.C. Little, “A lowercase Survey of Technologies for Parsing and Indexing Digital Video,” J. Visual Communication and Image Representation, Vol. 7, No. 1, pp. 28–43, 1996.

    Article  Google Scholar 

  3. E. Ardizzone, M. La Cascia, A. Avanzato, and A. Bruna, “Video lowercase Indexing Using MPEG lowercase Motion Compensation Vectors,” in Proc. IEEE Intl. Conf. on Multimedia Computing and Systems, Florence, Italy, 1999, Vol. I, pp. 725–729.

  4. F. Arman, A. Hsu, and M.-Y. Chiu, “Image processing on compressed data for large video databases,” in Proc. ACM Multimedia, 1993, pp. 267–272.

  5. S. Barrass, “Bilby—A tool for foraging in MPEG audio,” Technical Report No. 01/98, CSIRO Mathematical and Information Sciences, Nov. 1997.

  6. A.D. Bimbo, E. Vicario, and D. Zingoni, “Symbolic lowercase Description and Visual Querying of Image Sequences Using Spatio-Temporal Logic,” in IEEE Trans. on Knowledge and Data Engineering, 1995, Vol. 7, No. 4, pp. 609–621.

  7. G. Boccignone, M. De Santo, and G. Percannella, “Joint lowercase Audio-Video Processing of MPEG lowercase Encoded Sequences,” in Proc. IEEE Intl. Conf. on Multimedia Computing and Systems (ICMCS), 1999, Vol. 2, pp. 225–229.

  8. D. Bordwell and K. Thompson, Film Art: An Introduction, 5th edn, McGraw-Hill: New York, 1997.

    Google Scholar 

  9. A. Bregman, Auditory Scene Analysis, MIT Press, 1990.

  10. R. Brunelli, O. Mich, and C.M. Modena, “A lowercase Survey on Video Indexing,” Instituto per la Ricerca Scientifica e Tecnologica, Italy, IRST Technical Report 9612-06.

  11. S.-F. Chang, “Compressed-domain techniques for image video indexing and manipulation,”in Proc. IEEE Intl. Conf. on Image Processing (ICIP-95), Washington DC, USA, Oct. 1995, pp. 314–317.

  12. S.-F. Chang, “Compressed-domain content-based image and video retrieval,” Symposium on Multimedia Communications and Video Coding, New York, USA, Oct. 1995.

  13. Y. Chen and E.K. Wong, “A knowledge-based approach to video content classification,” in Proc. SPIE: Storage and Retrieval for Media Databases 2001, 4315, San Jose, CA, Jan. 24–28, 2001.

  14. N. Dimitrova, “Multimedia lowercase Content Analysis and Indexing for Filtering and Retrieval Applications,” J. Informing Science, Special Issue on Multimedia Informing Technologies-Part 1, Vol. 2, No. 4, 1999.

  15. J. Feng, K-T. Lo, and H. Mehrpour, “Scene change detection algorithm for MPEG video sequence,” in Proc. IEEE Intl. Conf. on Image Processing (ICIP’96), Sep. 1996, Vol. II, pp. 821–824.

  16. J. Foote, “An overview of audio information retrieval,” J. Multimedia Systems, Springer Verlag, Vol. 7, No. 1, pp. 2–10, Jan. 1999.

    Google Scholar 

  17. N. Gamaz, X. Huang, and S. Panchanathan, “Robust scene-change detection in MPEG compressed domain,” Can. J. Elect. & Comp. Eng., Vol. 23, Nos. 1/2, pp. 95–99, 1998.

    Google Scholar 

  18. A. Girgensohn and J. Foote, “Video classification using transform coefficients,” in IEEE Int. Conference on Acoustics, Speech, and Signal Process. (ICASSP’99), Phoenix, USA, 1999, pp. 3045–3048.

  19. J. Grey, “Multidimensional lowercase Scaling of Musical Timbres,” J. the Acoustical Society of America, Vol. 61, No. 5, pp. 1270–1277, 1976.

    Google Scholar 

  20. L. Gu, K. Tsui, and D. Keightley, “Dissolve detection in MPEG compressed video,” in Proc. IEEE Intl. Conf. on Intelligent Processing Systems, Oct. 1997, pp. 1692–1696.

  21. L. Gu, “Scene analysis of video sequences in the MPEG domain,” in Proc. IASTED Intl. Conf. on Signal and Image Processing, Las Vegas, USA, Oct. 1998.

  22. L. Gu and D. Bone, “Skin colour region detection in MPEG video sequences,” in Proc. Intl Conf. on Image Analysis and Processing, Venice, Italy, Sep. 1999.

  23. L. Gu, “Text detection and extraction in MPEG video sequences,” in Proc. Intl. Workshop on Content-Based Multimedia Indexing, Italy, Sept. 2001.

  24. R. Hammoud, L. Chen, and D. Fontaine, “An lowercase Extensible Spatial-Temporal Model for Semantic Video Segmentation,” in Proc. First International Forum on Multimedia and Image Processing, Anchorage, Alaska, 1998.

  25. A. Hanjalic, R.L. Lagendijk, and J. Biemond, “Automatically lowercase Segmenting Movies into Logical Story Units,” in Lecture Notes in Computer Science 1614: Visual Information and Information Systems, D.P. Huijsmans and A.W.M. Smeulders (Eds.); ISBN 3-540-66079-8, Springer Verlag, 1999, pp. 229–236, (Proceedings of the Third International Conference VISUAL ‘99, Amsterdam, Netherlands, June 1999)

  26. B.G. Haskell, A.P. Puri, and A.N. Netravali, Digital Video: An Introduction to MPEG-2, Chapman & Hall: New York, Chapt. 4, 1997.

    Google Scholar 

  27. I. Ide, R. Hamada, S. Sakai, and H. Tanaka, “Scene lowercase Identification in News Video by Character Region Segmentation,” Communications of the ACM, Vol. 43, No. 2, pp. 42–47, 2000.

    Google Scholar 

  28. F. Idris and S. Panchanathan, “Review of image and video indexing techniques,” J. Visual Communication and Image Representation, Vol. 8, No. 2, pp. 146–166, June 1997.

    Article  Google Scholar 

  29. ISO/IEC 11172-2, Information technology—Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s—Part 2: Video, 1993.

  30. ISO/IEC 11172-3, Information Technology—Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s—Part 3: Audio, 1993.

  31. H Jiang, A. Helal, A.K. Elmagarmid, and A. Joshi, “Scene change detection techniques for video database systems,” Multimedia Systems, Vol. 6, pp. 186–195, 1998.

    Article  Google Scholar 

  32. E.K. Kang, S.J. Kim, and J.S. Choi, “Video retrieval based on scene change detection in compressed streams,” in IEEE Transactions on Consumer Electronics, 1999, Vol. 45, No. 3, pp. 932–936.

  33. V. Kobla, D. DeMenthon, and D. Doermann, “Identifying sports videos using replay, text and camera motion features,” in Proc. of the SPIE Conference on Storage and Retrieval for Media Databases, Jan, 2000, Vol. 3972, pp. 332–343.

  34. V. Kobla, D. Doermann, and A. Rosenfeld, “Compressed domain video segmentation,” Technical Report CS-TR-3688, Center for Automation Research, University of Maryland, 1996.

  35. I. Koprinska and S. Carrato, “Temporal video segmentation: a survey,” Signal Processing: Image Communication, Vol. 16, pp. 477–500, 2001.

    Article  Google Scholar 

  36. M.H. Lee and G. Crebbin, “Classified vector quantisation with variable block-size DCT models,” in IEE Proc.-Vis. Image Signal Process, Feb. 1994, Vol. 141, No. 1, pp 39–48.

  37. M.H. Lee and G. Crebbin, “Image sequence coding using quadtree-based block-matching motion compensation and classified vector quantisation,” in IEE Proc.-Vis. Image Signal Process, Dec. 1994, Vol. 141, No. 6, pp. 453–460.

  38. M.H. Lee and G. Reynolds, “Edge lowercase Detection Using DCT Coefficients in MPEG Video, Technical Report 2001/28, CSIRO Mathematical and Information Sciences, Feb. 2001.

  39. H.-C.H. Liu and G.L. Zick, “Scene decomposition of MPEG compressed video,” Digital Video Compression: Algorithms and Technologies, SPIE vol. 2419, pp. 26–37, Feb. 1995.

  40. G. Lu, “Indexing and retrieval of audio: A survey,” Multimedia Tools and Applications, Vol. 15, pp. 269–290, 2001.

    Article  Google Scholar 

  41. M.K. Mandal, F. Idris, and S. Panchanathan, “A critical evaluation of image and video indexing techniques in the compressed domain,” Image and Vision Computing, Vol. 17, pp. 513–529, 1999.

    Article  Google Scholar 

  42. MediaWare Solutions: http://www.mediaware.com.au

  43. J. Meng and S.-F. Chang, Tools for compressed-domain indexing and editing, SPIE Conference on Storage and Retrieval for Image and Video Database, San Jose, USA, Vol. 2670, pp. 180–191, Feb. 1996.

  44. J. Meng and S.-F. Chang, “CVEPS—A compressed video editing and parsing system,” in Proc. ACM Multimedia, Boston, USA, Nov. 1996, pp. 43–53,.

  45. J. Meng, Y. Juan, and S.F. Chang, “Scene lowercase Change Detection in a MPEG Compressed Video Sequence, SPIE Symposium on Electronic Imaging: Science & Technology—Digital Video Compression: Algorithms and Technologies, San Jose, USA, Feb. 1995, Vol. 2419, pp. 14–25.

  46. H.J. Meng, D. Zhong, and S.-F. Chang, “Search and editing MPEG-lowercase Compressed Video in a Distributed Online Environment, Multimedia Systems, Vol. 7, pp. 282–293, 1999.

    Article  Google Scholar 

  47. P. Morguet and M. Lang, “A universal HMM-based approach to image sequence classification,” in Proc. Intl. Conf. on Image Process. (ICIP’97), 1997, pp. III/146–III/149.

  48. Y. Nakajima, Y. Lu, M. Sugano, A. Yoneyama, H. Yanagihara, and A. Kurematsu, “A lowercase Fast Audio Classification From MPEG,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Phoenix, Arizona, USA, 1999, Vol. IV, pp. 3005–3008.

  49. M. Naphade, T. Kristjansson, B. Frey, and T.S. Huang, “Probabilistic multimedia objects (multijects): A novel approach to indexing and retrieval in multimedia systems,” in Proceedings of the fifth IEEE International Conference on Image Processing, Chicago, IL, Oct 1998, Vol. 3, pp. 536–540.

  50. S. Nepal and U. Srinivasan, “Spatiao-temporal modelling and query video databases using high-level concepts,” in 6th IFIP Working Conference on Visual Database System (VDB6), Brisbane, Australia (to appear), May 2002, pp. 29–31.

  51. S. Nepal, U. Srinivasan, and G. Reynolds, “Automatic Detection of “Goal” Segments in Basketball Videos,” ACM Multimedia 2001, pp. 261–269, Sept–Oct 2001.

  52. S. Nepal, U. Srinivasan, and G. Reynolds, “Semantic-lowercase Based Retrieval Model for Digital Audio and Video,” in IEEE International Conference on Multimedia and Exposition (ICME 2001), Aug. 2001, pp. 301–304.

  53. P. Noll, “MPEG digital audio coding,” in IEEE Signal Processing Magazine, Sept. 1997, pp. 59–81.

  54. N. O’connor, C. Czirjek, S. Deasy, S. Marlow, N. Murphy, and A. Smeaton, “News lowercase Story Segmentation in the Fischlar Video Indexing System,” in Proc. of SSIP, 2001.

  55. D. Pan, “A lowercase Tutorial on MPEG/lowercase Audio Compression,” in IEEE Multimedia (summer 1995, Vol. 2, No. 2, pp. 60–74.

  56. N.V. Patel and I.K. Sethi, “Audio characterization for video indexing,” in Proc. SPIE, Storage and Retrieval for Still Image and Video Databases IV, San Jose, CA, USA, 1996, Vol. 2670, pp. 373–384.

  57. S. Pfeiffer, “Pause concepts for audio segmentation at different semantic levels,” in Proc. ACM Multimedia 2001, Ottawa, Ontario, Canada, 2001, pp. 187–193.

  58. S. Pfeiffer, J. Robert-Ribes, and D. Kim, “Audio lowercase Content Extraction from MPEG-lowercase encoded sequences,” in Proc. Fifth Joint Conference on Information Sciences, Atlantic City, New Jersey, 1999, Vol. II, pp. 513–516.

  59. S. Pfeiffer and T. Vincent, “Formalisation of MPEG-1 compressed domain audio features,” Technical Report No. 196/01, CSIRO Mathematical and Information Sciences, Dec. 2001.

  60. Y. Rui, A. Gupta, and A. Acero, “Automatically lowercase Extracting Highlights for TV Baseball Programs,” ACM Multimedia, pp. 105–115, 2000.

  61. M. De Santo, G. Percannella, C. Sansone, and M. Vento, “in M. Tucci Ed. Multimedia Databases and Image Communication Second International Workshop, MDIC 2001,” in Proceedings Lecture Notes in Computer Science, Amalfi, Italy, Sept. 2001, Vol. 2184, pp. 192–201.

  62. D.D. Saur, Y.-P. Tan, S.R. Kulkarni, and P.J. Ramadge, “Automatic analysis and annotation of basketball video,” in Proc. Storage and Retrieval for Image and Video Databases V, Feb. 1997, Vol. SPIE-3022 pp. 176–187.

  63. T. Sencar and G. Bozdagi, “Video segmentation based on MPEG bitstream,” in Proc. of the first European Workshop on Content-Based Multimedia Indexing, Oct. 1999, pp 11–18.

  64. K. Shen and E.J. Delp, “A fast algorithm for video parsing using MPEG compressed sequences,” in Proc. IEEE Intl. Conf. on Image Processing (ICIP’95), Oct. 1995, pp. 252–255.

  65. B. Shen and I.K. Sethi, “Direct feature extraction from compressed images,” in Proc. SPIE Storage & Retrieval for Image and Video Databases IV, 1996, Vol. 2670, pp. 404–414.

  66. B. Shen and I.K. Sethi, “Convolution-based edge detection for image/video in block DCT domain,” in J. Visual Communications Image Representation, Dec. 1996, Vol. 7, No. 4, pp. 411–423.

    Article  Google Scholar 

  67. U. Srinivasan, L. Gu, K. Tsui, and W.G. Simpson-Young, “A data model to support content-based search in digital libraries,” The Australian Computer Journal Vol. 29, No. 4, pp. 141–147, 1997.

    Google Scholar 

  68. U. Srinivasan, S. Nepal, and G. Reynolds, “Modelling lowercase High Level Semantics for Video Data Management,” in Proceedings of ISIMP 2001, Hog Kong, May 2001, pp. 291–295.

  69. G. Sudhir, John C.M. Lee, and Anil K. Jain, “Automatic lowercase Classification of Tennis Video for High-level Content-based Retrieval,” Technical Report HKUST-CS97-2, The Hong Kong University of Science and Technology, Hong Kong, 7 Aug. 1997.

  70. G. Sudir, J.C.M. Lee, and A.K. Jain, “Automatic classification of tennis video for high-level content-based retrieval,” Int. Workshop on Content-Based Access of Image and Video Database (CAIVD’98), 1998, pp. 81–90.

  71. G. Tzanetakis and P. Cook, “Sound analysis using MPEG compressed audio,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing ICASSP Istanbul, Turkey, 2000, Vol. 2, pp. 761–764.

  72. S. Venugopal, K.R. Ramakrishnan, S.H. Srinivas, and N. Balakrishnan, “Audio scene analysis and scene change detection in the MPEG compressed domain,” in IEEE Third Workshop on Multimedia Signal Processing (MMSP), 1999, pp. 191–196.

  73. M. Viswanathan, H.S.M. Beigi, and F. Maali, Information lowercase Access Using Speech, Speaker and Face Recognition, in Proc. IEEE Intl. Conf. on Multimedia and Expo, ICME 2000, New York City, USA, 2000, pp. 493–497.

  74. H. Wang and S.F. Chang, “A highly efficient system for automatic face region detection in MPEG video,” in IEEE Trans. Circuit and System for Video Technol., 1997, Vol. 7, No. 4, pp. 615–628.

    Article  Google Scholar 

  75. Y. Wang, Z. Liu and J.-C. Huang, “Multimedia lowercase Content Analysis,” in IEEE Signal Processing Magazine, 2000, Vol. 17, No. 6, pp. 12–36.

  76. Y. Wang and M. Vilermo, “A compressed domain beat detector using MP3 audio bitstreams,” in Proc. ACM Multimedia, Ottawa, Ontario, Canada, 2001, pp. 194–202, 30 Sept.–5 Oct.

  77. G. Wei, L. Agnihotri, and N. Dimitrova, “TV program classification based on face and text processing,” in Proc. IEEE Intl. Conf. on Multimedia and Expo, New York, USA, 2000.

  78. J. Yang, W. Lu, and A. Waible, “Skin-colour modelling and adaptation,” Technical Report CMU-CS-97-146, School of Computer Science, Carnegie Mellon University, 1997.

  79. L. Yapp and G. Zick, “Speech recognition on MPEG/.audio encoded files,” in Proc. IEEE Intl. Conf. on Multimedia Computing and Systems (ICMCS), Ottawa, Canada, 1997, pp. 624–625.

  80. B.-L. Yeo and B. Liu, “On the extraction of DC sequence from MPEG compressed video,” in Proc. Intl. conf. on Image Processing, Oct. 1995, pp. 2260–2263.

  81. B.-L. Yeo and B. Liu, “Rapid scene analysis on compressed video,” in IEEE Trans. Circuit and Systems for Video Technol., Dec. 1995, Vol. 5, No. 6, pp. 533–544.

    Article  Google Scholar 

  82. B.-L. Yeo, “Efficient processing of compressed images and video,” Ph.D. thesis, Dept. of Electrical Engineering, Princeton University, January 1996.

  83. A. Yoshitaka, Y. Hosoda, M. Hirakawa, and T. Ichikawa, “Content-lowercase Based Retrieval of Video Data Based on Spatiotemporal Correlation of Objects,” in Proc. IEEE Multimedia Computing and Systems, 1998, pp. 208–213.

  84. A. Yoshitaka and T. Ichikawa, “A lowercase Survey on Content-Based Retrieval for Multimedia Databases,” in IEEE Trans. on Knowledge and Data Engineering, 1999, Vol. 11, No. 1, pp. 81–93.

    Article  Google Scholar 

  85. H.J. Zhang, C.Y. Low, and S.W. Smoliar, “Video parsing and browsing using compressed data,” Multimedia Tools and Applications. Vol. 1, No. 1, pp. 89–111, 1995.

    Article  Google Scholar 

  86. D. Zhong and S.-F. Chang, “Structure lowercase Analysis of Sports Video Using Domain Models,” ICME 2001, Tokyo, Japan, 2001.

    Google Scholar 

  87. W. Zhou, A. Vellaikal, and C.C.J. Kuo, “Rule-based video classification system for basketball video indexing,” in Proc. ACM Multimedia, www.acm.org/sigs/sigmm/MM2000/ep/zhou/, 2000, pp. 213–216.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srinivasan, U., Pfeiffer, S., Nepal, S. et al. A Survey of MPEG-1 Audio, Video and Semantic Analysis Techniques. Multimed Tools Appl 27, 105–141 (2005). https://doi.org/10.1007/s11042-005-2716-6

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-005-2716-6

Keywords

Navigation