Saying What it Means: Semi-Automated (News) Media Annotation

Nack, Frank; Putz, Wolfgang

doi:10.1023/B:MTAP.0000017031.26875.f7

Saying What it Means: Semi-Automated (News) Media Annotation

Published: March 2004

Volume 22, pages 263–302, (2004)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Frank Nack¹ &
Wolfgang Putz²

86 Accesses
8 Citations
Explore all metrics

Abstract

This paper considers the automated and semi-automated annotation of audiovisual media in a new type of production framework, A4SM (Authoring System for Syntactic, Semantic and Semiotic Modelling). We present the architecture of the framework, describe a prototypical camera, a handheld device for basic semantic annotation, and an editing suite to demonstrate how video material can be annotated in real time and how this information can not only be used for retrieval but also can be used during the different phases of the production process itself. We then outline the underlying XML Schema based content description structures of A4SM and discuss the pros and cons of our approach of evolving semantic networks as the basis for audio-visual content description.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

T.G. Aguierre Smith and G. Davenport, “The stratification system. A design environment for random access video,” in ACM Workshop on Networking and Operating System Support for Digital Audio and Video, San Diego, California, 1992.
P. Aigrain, P. Joly, and V. Longueville, “Medium knowledge-based macro-segmentation of video into sequences,” in IJCAI 95—Workshop on Intelligent Multimedia Information Retrieval, M. Maybury (Ed.), Montréal, 1995, pp. 5–16.
R. Arnheim, “Art and visual perception: A psychology of the creative eye,” Faber & Faber: London, 1956.
Google Scholar
M. Bertini, A. Del Bimbo, and P. Pala, “Content based annotation and retrieval of news videos,” in Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2000), New York City, NY, USA, 2000, pp. 483–488.
G.R. Bloch, “Elements d'une machine de montage pour l'audio-visuel,” Ph.D., Ecole Nationale Supérieure Des Télécommunications, 1986.
P.J. Bloom, “High-quality digital audio in the entertainment industry: An overview of achievements and challenges,” IEEE Acoust. Speech Signal Process. Mag., Vol. 2, 1985, pp. 2–25.
Google Scholar
J. Borchers and M. Mühlhäuser, “Design patterns for interactive musical systems,” IEEE Multimedia Magazine, Vol. 5,No. 3, 1998, pp. 36–46.
Google Scholar
D. Bordwell, “Making meaning—inference and rhetoric in the interpretation of cinema,” Harvard University Press: Cambridge, MA.
R.J. Brachman and H.J. Levesque, “Readings in knowledge representation,” Morgan Kaufmann Publishers: San Mateo, CA, 1983.
Google Scholar
KM Brooks, “Metalinear cinematic narrative: Theory, process, and tool,” Ph.D. Thesis, MIT, 1999.
C. Colombo, A. Del Bimbo, and P. Pala, “Semantics in visual information retrieval,” IEEE Multimedia, Vol. 6,No. 3, 1999, pp. 38–53.
Google Scholar
M. Davis, “Media streams: Representing video for retrieval and repurposing,” Ph.D. MIT, 1995.
A. Del Bimbo, “Visual information retrieval,” Morgan Kaufmann: San Francisco, USA, 1999.
Google Scholar
U. Eco, “A theory of semiotics,” The Macmillan Press: London, 1997.
Google Scholar
H. Fehlis, Hybrides Trackingsystem für virtuelle Studios, Fernseh + Kinotechnik, Bd. 53, Nr. 5, 1999.
J-L. Gauvain, L. Lamel, and G. Adda, “Transcribing broadcast news for audio and video indexing,” Communications of the ACM, Vol. 43,No. 2, 2000, pp. 64–70.
Google Scholar
A. Gupta and R. Jain, “Visual information retrieval,” Communications of the ACM, Vol. 40, 1997, pp. 71–79.
Google Scholar
J. Greimas, “Structural semantics: An attempt at a method,” University of Nebraska Press: Lincoln, 1983.
Google Scholar
F.G. Halasz, “Reflection on notecards: Seven issues for the next generation of hypermedia systems,” Communications of the ACM, Vol. 31,No. 7, 1988.
K. Hirata, “Towards formalizing Jazz Piano knowledge with a deductive object-oriented approach,” in Proceedings of Artificial intelligence and Music, IJCAI, Montreal, 1995, pp. 77–80.
J. Hunter and L. Armstrong, “A comparison of schemas for video metadata representation,” in Proceedings of the WWW8, Toronto, 1999.
J. Hunter and C. Lagoze, “Combining RDF and XML schemas to enhance interoperability between metadata application profiles,” in The Tenth International World Wide Web Conference, Hong Kong, 2001, pp. 457–466.
I. Ide, R. Hamada, S. Sakai, and H. Tanaka, “An attribute based news video indexing,” in Workshop Proceedings of the 9th ACM International Conference on Multimedia, Marina del Rey, California, 2000, pp. 195–200.
ISO MPEG-7, “Overview of the MPEG-7 standard (version4.0),” Doc. ISO/MPEG N3752, MPEG La Baule Meeting, 2000.
ISO MPEG-7, Text of ISO/IEC FCD 15938-2 Information Technology—Multimedia Content Description Interface—Part 2: Description Definition Language, ISO/IEC JTC 1/SC 29/WG 11 N4288, 19/09/2001a.
ISO MPEG-7, Text of ISO/IEC 15938-5/FCD Information Technology—Multimedia Content Description Interface—Part 5: Multimedia Description Schemes, ISO/IEC JTC 1/SC 29/WG 11 N4242, 23/10/2001b.
S.E. Johnson, P. Jourlin, K. Spärk Jones, and P.C. Woodland, “Audio indexing and retrieval of complete broadcast news shows,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 43,No. 2, 2000, pp. 1163–1177.
Google Scholar
F. Kubala, S. Colbath, D. Liu, A. Srivastava, and J. Makhoul, “Integrated technologies for indexing spoken language,” Communications of the ACM, Vol. 43,No. 2, 2000, pp. 57–63.
Google Scholar
K. Lemström and J. Tarhio, “Searching monophonic patterns within polyphonic sources,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 2, 2000, pp. 1163–1177.
Google Scholar
C. Lindley, “A video annotation methodology for interactive video sequence generation,” BCS Computer Graphics & Displays Group Conference on Digital Content Creation, Bradford, UK, 2000.
M. Melucci and N. Orio, “SMILE: A system for content-based musical information retrieval environments,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 2, 2000, pp. 1261–1279.
Google Scholar
T.J. Mills, D. Pye, N.J. Hollinghurst, and K.R. Wood, “At&TV: Broadcast television and radio retrieval,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 2, 2000, pp. 1135–1144.
Google Scholar
F. Nack, “AUTEUR: The application of video semantics and theme representation in automated video editing,” Ph.D. Lancaster University, 1996.
F. Nack and A. Parkes, “The application of video semantics and theme representation in automated video editing,” Multimedia Tools and Applications, H. Zhang (Ed.), Vol. 4,No. 1, 1997, pp 57–83.
Google Scholar
F. Nack and A. Steinmetz, “Approaches on intelligent video production,” in Proceedings of ECAI-98 Workshop on AI/A life and Entertainment, Brighton, 1998.
F. Nack and A. Lindsay, “Everything you wanted to know about MPEG-7: Part I & II IEEE MultiMedia,” pp. 65–77, IEEE Computer Society, 1999, pp 64-73.
F. Nack and C. Lindley, “Environments for the production and maintenance of interactive stories,” Workshop on Digital Storytelling, Darmstadt, Germany, 15-16/6/2000.
A. Nagasaka and Y. Tanaka, “Automatic video indexing and full-search for video appearance,” in Visual Database Systems, E. Knuth and I.M. Wegener (Eds.), Elsevier Science Publishers: Amsterdam, 1992, pp. 113–127.
Google Scholar
F. Pachet and D. Cazzly, “A taxonomy of musical genres,” in RIAO' 2000 Conference Proceedings, Collége de France, Paris, France, Vol. 2, 2000, pp. 1238–1245.
Google Scholar
A.P. Parkes, “An artificial intelligence approach to the conceptual description of videodisc images,” Ph.D. Thesis, Lancaster University, 1989.
A.P. Parkes, “Settings and the settings structure: The description and automated propagation of networks for perusing videodisk image states,” in SIGIR '89, N.J. Belkin and C.J. van Rijsbergen (Eds.), Cambridge, MA, 1989, pp. 229–238.
S. Pfeiffer, S. Fischer, and Effelsberg, “Automatic audio content analysis,” in Proceedings of the ACM Multimedia 96, New York, 1996, pp. 21–30.
RDF Schema, 2001. http://www.w3.org/TR/rdf-schema/.
J. Robertson, A. De Quincey, T. Stapleford, and G. Wiggins, “Real-time music generation for a virtual environment,” in Proceedings of ECAI-98 Workshop on AI/A life and Entertainment, 1998, Brighton.
W. Sack, “Coding news and popular culture,” in The International Joint Conference on Artificial Intelligence (IJCA93) Workshop on Models of Teaching and Models of Learning. Chambery, Savoie, France, 1993.
S. Santini and R. Jaim, “Integrated browsing and querying for image databases,” IEEE MultiMedia, IEEE Computer Society, 2000, pp. 26–39.
Semantic Web, 2001. http://www.w3.org/2001/sw/.
SMPTE Dynamic Data Dictonary Structure, 6. Draft, 1999.
J.F. Sowa, “Conceptual structures: Information processing in mind and machine,” Addison-Wesley Publishing Company: Reading, MA, 1984.
Google Scholar
TALC, 1999. http://www.de.ibm.com/ide/solutions/dmsc/.
Y. Tonomura, A. Akutsu, Y. Taniguchi, and G. Suzuki, “Structured video computing,” IEEE MultiMedia, Vol. 1,No. 3, 1994, pp. 34–43.
Google Scholar
H.D. Wactler, A.G. Hauptmann, M.G. Christel, R.G. Houghton, and A.M. Olligschlaeger, “Complementary video and audio analysis for broadcast news archives,” Communications of the ACM, Vol. 43,No. 2, 2000, pp. 42–47.
Google Scholar
E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search, and retrieval of audio,” IEEE Multimedia Magazine, Vol. 3,No. 3, 1996, pp. 27–36.
Google Scholar
X-Link, 2001. http://www.w3.org/TR/xlink/.
XML Schema, 2000. XML Schema Part 0: Primer, W3C Candidate Recommendation, 24 October 2000, http://www.w3.org/TR/2000/CR-xmlschema-1-20001024/; XML Schema Part 1: Structures, W3C Recommendation, 24 October 2000, http://www.w3.org/TR/2000/CR-xmlschema-1-20001024/; XML Schema Part 2: Datatypes, W3C Recommendation, 24 October 2000, http://www.w3.org/TR/2000/CR-xmlschema-1-20001024/.
XML Schema, 2001. XML Schema Part 0: Primer, W3C Recommendation, 2 May 2001 http://www.w3.org/TR/xmlschema-0/; XML Schema Part 1: Structures, W3C Recommendation, 2 May 2001 http://www.w3.org/TR/xmlschema-1/; XML Schema Part 2: Datatypes, W3C Recommendation, 2 May 2001 http://www.w3.org/TR/xmlschema-2/.
M.M. Yeung, B. Yeo, W. Wolf, and B. Liu, “Video browsing using clustering and scene transitions on compressed sequences,” in Proceedings IS&T/SPIE '95 Multimedia Computing and Networking, San Jose, SPIE (2417), 1995, pp. 399–413.
H. Zhang, Y. Gong, and S.W. Smoliar, “Automated parsing of news video,” in IEEE International Conference on Multimedia Computing and Systems, Boston: IEEE Computer Society Press, 1994, pp. 45–54.
Google Scholar

Download references

Author information

Authors and Affiliations

CWI, Amsterdam, The Netherlands
Frank Nack
FHG-IPSI, Darmstadt, Germany
Wolfgang Putz

Authors

Frank Nack
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Putz
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nack, F., Putz, W. Saying What it Means: Semi-Automated (News) Media Annotation. Multimedia Tools and Applications 22, 263–302 (2004). https://doi.org/10.1023/B:MTAP.0000017031.26875.f7

Download citation

Issue Date: March 2004
DOI: https://doi.org/10.1023/B:MTAP.0000017031.26875.f7

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Saying What it Means: Semi-Automated (News) Media Annotation

Abstract

Access this article

Similar content being viewed by others

When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience

Production and Delivery of Interactive Narratives Based on Video Snippets

Evaluating unsupervised thesaurus-based labeling of audiovisual content in an archive production environment

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Saying What it Means: Semi-Automated (News) Media Annotation

Abstract

Access this article

Similar content being viewed by others

When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience

Production and Delivery of Interactive Narratives Based on Video Snippets

Evaluating unsupervised thesaurus-based labeling of audiovisual content in an archive production environment

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation