Skip to main content
Log in

Saying What it Means: Semi-Automated (News) Media Annotation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper considers the automated and semi-automated annotation of audiovisual media in a new type of production framework, A4SM (Authoring System for Syntactic, Semantic and Semiotic Modelling). We present the architecture of the framework, describe a prototypical camera, a handheld device for basic semantic annotation, and an editing suite to demonstrate how video material can be annotated in real time and how this information can not only be used for retrieval but also can be used during the different phases of the production process itself. We then outline the underlying XML Schema based content description structures of A4SM and discuss the pros and cons of our approach of evolving semantic networks as the basis for audio-visual content description.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. T.G. Aguierre Smith and G. Davenport, “The stratification system. A design environment for random access video,” in ACM Workshop on Networking and Operating System Support for Digital Audio and Video, San Diego, California, 1992.

  2. P. Aigrain, P. Joly, and V. Longueville, “Medium knowledge-based macro-segmentation of video into sequences,” in IJCAI 95—Workshop on Intelligent Multimedia Information Retrieval, M. Maybury (Ed.), Montréal, 1995, pp. 5–16.

  3. R. Arnheim, “Art and visual perception: A psychology of the creative eye,” Faber & Faber: London, 1956.

    Google Scholar 

  4. M. Bertini, A. Del Bimbo, and P. Pala, “Content based annotation and retrieval of news videos,” in Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2000), New York City, NY, USA, 2000, pp. 483–488.

  5. G.R. Bloch, “Elements d'une machine de montage pour l'audio-visuel,” Ph.D., Ecole Nationale Supérieure Des Télécommunications, 1986.

  6. P.J. Bloom, “High-quality digital audio in the entertainment industry: An overview of achievements and challenges,” IEEE Acoust. Speech Signal Process. Mag., Vol. 2, 1985, pp. 2–25.

    Google Scholar 

  7. J. Borchers and M. Mühlhäuser, “Design patterns for interactive musical systems,” IEEE Multimedia Magazine, Vol. 5,No. 3, 1998, pp. 36–46.

    Google Scholar 

  8. D. Bordwell, “Making meaning—inference and rhetoric in the interpretation of cinema,” Harvard University Press: Cambridge, MA.

  9. R.J. Brachman and H.J. Levesque, “Readings in knowledge representation,” Morgan Kaufmann Publishers: San Mateo, CA, 1983.

    Google Scholar 

  10. KM Brooks, “Metalinear cinematic narrative: Theory, process, and tool,” Ph.D. Thesis, MIT, 1999.

  11. C. Colombo, A. Del Bimbo, and P. Pala, “Semantics in visual information retrieval,” IEEE Multimedia, Vol. 6,No. 3, 1999, pp. 38–53.

    Google Scholar 

  12. M. Davis, “Media streams: Representing video for retrieval and repurposing,” Ph.D. MIT, 1995.

  13. A. Del Bimbo, “Visual information retrieval,” Morgan Kaufmann: San Francisco, USA, 1999.

    Google Scholar 

  14. U. Eco, “A theory of semiotics,” The Macmillan Press: London, 1997.

    Google Scholar 

  15. H. Fehlis, Hybrides Trackingsystem für virtuelle Studios, Fernseh + Kinotechnik, Bd. 53, Nr. 5, 1999.

  16. J-L. Gauvain, L. Lamel, and G. Adda, “Transcribing broadcast news for audio and video indexing,” Communications of the ACM, Vol. 43,No. 2, 2000, pp. 64–70.

    Google Scholar 

  17. A. Gupta and R. Jain, “Visual information retrieval,” Communications of the ACM, Vol. 40, 1997, pp. 71–79.

    Google Scholar 

  18. J. Greimas, “Structural semantics: An attempt at a method,” University of Nebraska Press: Lincoln, 1983.

    Google Scholar 

  19. F.G. Halasz, “Reflection on notecards: Seven issues for the next generation of hypermedia systems,” Communications of the ACM, Vol. 31,No. 7, 1988.

  20. K. Hirata, “Towards formalizing Jazz Piano knowledge with a deductive object-oriented approach,” in Proceedings of Artificial intelligence and Music, IJCAI, Montreal, 1995, pp. 77–80.

  21. J. Hunter and L. Armstrong, “A comparison of schemas for video metadata representation,” in Proceedings of the WWW8, Toronto, 1999.

  22. J. Hunter and C. Lagoze, “Combining RDF and XML schemas to enhance interoperability between metadata application profiles,” in The Tenth International World Wide Web Conference, Hong Kong, 2001, pp. 457–466.

  23. I. Ide, R. Hamada, S. Sakai, and H. Tanaka, “An attribute based news video indexing,” in Workshop Proceedings of the 9th ACM International Conference on Multimedia, Marina del Rey, California, 2000, pp. 195–200.

  24. ISO MPEG-7, “Overview of the MPEG-7 standard (version4.0),” Doc. ISO/MPEG N3752, MPEG La Baule Meeting, 2000.

  25. ISO MPEG-7, Text of ISO/IEC FCD 15938-2 Information Technology—Multimedia Content Description Interface—Part 2: Description Definition Language, ISO/IEC JTC 1/SC 29/WG 11 N4288, 19/09/2001a.

  26. ISO MPEG-7, Text of ISO/IEC 15938-5/FCD Information Technology—Multimedia Content Description Interface—Part 5: Multimedia Description Schemes, ISO/IEC JTC 1/SC 29/WG 11 N4242, 23/10/2001b.

  27. S.E. Johnson, P. Jourlin, K. Spärk Jones, and P.C. Woodland, “Audio indexing and retrieval of complete broadcast news shows,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 43,No. 2, 2000, pp. 1163–1177.

    Google Scholar 

  28. F. Kubala, S. Colbath, D. Liu, A. Srivastava, and J. Makhoul, “Integrated technologies for indexing spoken language,” Communications of the ACM, Vol. 43,No. 2, 2000, pp. 57–63.

    Google Scholar 

  29. K. Lemström and J. Tarhio, “Searching monophonic patterns within polyphonic sources,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 2, 2000, pp. 1163–1177.

    Google Scholar 

  30. C. Lindley, “A video annotation methodology for interactive video sequence generation,” BCS Computer Graphics & Displays Group Conference on Digital Content Creation, Bradford, UK, 2000.

  31. M. Melucci and N. Orio, “SMILE: A system for content-based musical information retrieval environments,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 2, 2000, pp. 1261–1279.

    Google Scholar 

  32. T.J. Mills, D. Pye, N.J. Hollinghurst, and K.R. Wood, “At&TV: Broadcast television and radio retrieval,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 2, 2000, pp. 1135–1144.

    Google Scholar 

  33. F. Nack, “AUTEUR: The application of video semantics and theme representation in automated video editing,” Ph.D. Lancaster University, 1996.

  34. F. Nack and A. Parkes, “The application of video semantics and theme representation in automated video editing,” Multimedia Tools and Applications, H. Zhang (Ed.), Vol. 4,No. 1, 1997, pp 57–83.

    Google Scholar 

  35. F. Nack and A. Steinmetz, “Approaches on intelligent video production,” in Proceedings of ECAI-98 Workshop on AI/A life and Entertainment, Brighton, 1998.

  36. F. Nack and A. Lindsay, “Everything you wanted to know about MPEG-7: Part I & II IEEE MultiMedia,” pp. 65–77, IEEE Computer Society, 1999, pp 64-73.

  37. F. Nack and C. Lindley, “Environments for the production and maintenance of interactive stories,” Workshop on Digital Storytelling, Darmstadt, Germany, 15-16/6/2000.

  38. A. Nagasaka and Y. Tanaka, “Automatic video indexing and full-search for video appearance,” in Visual Database Systems, E. Knuth and I.M. Wegener (Eds.), Elsevier Science Publishers: Amsterdam, 1992, pp. 113–127.

    Google Scholar 

  39. F. Pachet and D. Cazzly, “A taxonomy of musical genres,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 2, 2000, pp. 1238–1245.

    Google Scholar 

  40. A.P. Parkes, “An artificial intelligence approach to the conceptual description of videodisc images,” Ph.D. Thesis, Lancaster University, 1989.

  41. A.P. Parkes, “Settings and the settings structure: The description and automated propagation of networks for perusing videodisk image states,” in SIGIR '89, N.J. Belkin and C.J. van Rijsbergen (Eds.), Cambridge, MA, 1989, pp. 229–238.

  42. S. Pfeiffer, S. Fischer, and Effelsberg, “Automatic audio content analysis,” in Proceedings of the ACM Multimedia 96, New York, 1996, pp. 21–30.

  43. RDF Schema, 2001. http://www.w3.org/TR/rdf-schema/.

  44. J. Robertson, A. De Quincey, T. Stapleford, and G. Wiggins, “Real-time music generation for a virtual environment,” in Proceedings of ECAI-98 Workshop on AI/A life and Entertainment, 1998, Brighton.

  45. W. Sack, “Coding news and popular culture,” in The International Joint Conference on Artificial Intelligence (IJCA93) Workshop on Models of Teaching and Models of Learning. Chambery, Savoie, France, 1993.

  46. S. Santini and R. Jaim, “Integrated browsing and querying for image databases,” IEEE MultiMedia, IEEE Computer Society, 2000, pp. 26–39.

  47. Semantic Web, 2001. http://www.w3.org/2001/sw/.

  48. SMPTE Dynamic Data Dictonary Structure, 6. Draft, 1999.

  49. J.F. Sowa, “Conceptual structures: Information processing in mind and machine,” Addison-Wesley Publishing Company: Reading, MA, 1984.

    Google Scholar 

  50. TALC, 1999. http://www.de.ibm.com/ide/solutions/dmsc/.

  51. Y. Tonomura, A. Akutsu, Y. Taniguchi, and G. Suzuki, “Structured video computing,” IEEE MultiMedia, Vol. 1,No. 3, 1994, pp. 34–43.

    Google Scholar 

  52. H.D. Wactler, A.G. Hauptmann, M.G. Christel, R.G. Houghton, and A.M. Olligschlaeger, “Complementary video and audio analysis for broadcast news archives,” Communications of the ACM, Vol. 43,No. 2, 2000, pp. 42–47.

    Google Scholar 

  53. E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search, and retrieval of audio,” IEEE Multimedia Magazine, Vol. 3,No. 3, 1996, pp. 27–36.

    Google Scholar 

  54. X-Link, 2001. http://www.w3.org/TR/xlink/.

  55. XML Schema, 2000. XML Schema Part 0: Primer, W3C Candidate Recommendation, 24 October 2000, http://www.w3.org/TR/2000/CR-xmlschema-1-20001024/; XML Schema Part 1: Structures, W3C Recommendation, 24 October 2000, http://www.w3.org/TR/2000/CR-xmlschema-1-20001024/; XML Schema Part 2: Datatypes, W3C Recommendation, 24 October 2000, http://www.w3.org/TR/2000/CR-xmlschema-1-20001024/.

  56. XML Schema, 2001. XML Schema Part 0: Primer, W3C Recommendation, 2 May 2001 http://www.w3.org/TR/xmlschema-0/; XML Schema Part 1: Structures, W3C Recommendation, 2 May 2001 http://www.w3.org/TR/xmlschema-1/; XML Schema Part 2: Datatypes, W3C Recommendation, 2 May 2001 http://www.w3.org/TR/xmlschema-2/.

  57. M.M. Yeung, B. Yeo, W. Wolf, and B. Liu, “Video browsing using clustering and scene transitions on compressed sequences,” in Proceedings IS&T/SPIE '95 Multimedia Computing and Networking, San Jose, SPIE (2417), 1995, pp. 399–413.

  58. H. Zhang, Y. Gong, and S.W. Smoliar, “Automated parsing of news video,” in IEEE International Conference on Multimedia Computing and Systems, Boston: IEEE Computer Society Press, 1994, pp. 45–54.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nack, F., Putz, W. Saying What it Means: Semi-Automated (News) Media Annotation. Multimedia Tools and Applications 22, 263–302 (2004). https://doi.org/10.1023/B:MTAP.0000017031.26875.f7

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:MTAP.0000017031.26875.f7

Navigation