Narrative theme navigation for sitcoms supported by fan-generated scripts

Friedland, Gerald; Gottlieb, Luke; Janin, Adam

doi:10.1007/s11042-011-0877-z

Narrative theme navigation for sitcoms supported by fan-generated scripts

Video navigation based on acoustic detection of actors and narrative elements

Published: 20 September 2011

Volume 63, pages 387–406, (2013)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Gerald Friedland¹,
Luke Gottlieb¹ &
Adam Janin¹

359 Accesses
2 Citations
Explore all metrics

Abstract

The following article provides the definitive description of the complete Joke-O-Mat system to navigate sitcoms as presented briefly in Friedland et al. (2009) and extended in Janin et al. (2010), which was augmented with fan-generated scripts as described in Friedland et al. (2010). The system with the extension allows a user to browse a sitcom by scene, punchline, and dialog segment, and to filter these themes by actor and by keyword. For example, the user can choose to watch only punchlines by the character “Kramer” that contain the word “armoire”. The system infers the narrative themes and provides word-level search by automatically aligning the output of a speaker identification system and a speech recognizer to both closed captions and scripts generated by fans on the Internet. The segmentations produced by this system have proven to be indistinguishable from expert-generated segmentations, and require significantly less time to produce. The article describes the original and the extended Joke-O-Mat (http://www.icsi.berkeley.edu/jokeomat/) system, discusses problems with the use of fan-generated content, and presents results on episodes from the sitcom Seinfeld with regards to segmentation accuracy and overall user satisfaction as determined by a human-subject study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Narrative theory and the dynamics of popular movies

Article Open access 03 May 2016

The power of AI in marketing: enhancing efficiency and improving customer perception through AI-generated storyboards

Article 19 December 2023

Seeing is No Longer Believing: A Survey on the State of Deepfakes, AI-Generated Humans, and Other Nonveridical Media

Notes

http://nist.gov/speech/tests/rt/rt2004/fall.
In previous work on transcribing multiparty meetings, we found that a one hour meeting could take upwards of 20 hr for a human to transcribe and there is no reason to think that a sitcom would be qualitatively different.

References

Adcock J, Cooper M, Pickens J (2008) Experiments in interactive video search by addition and subtraction. In: CIVR ’08: proceedings of the 2008 international conference on content-based image and video retrieval. ACM, New York, pp 465–474. doi:10.1145/1386352.1386412
Chapter Google Scholar
Ayache S, Quénot G (2007) Evaluation of active learning strategies for video indexing. Image Commun. 22(7–8):692–704. doi:10.1016/j.image.2007.05.010
Google Scholar
Benitez AB, Ab ABB, Smith JR, Chang SF (2000) Medianet: a multimedia information network for knowledge representation
Berrani S, Manson G, Lechat P (2008) A non-supervised approach for repeated sequence detection in TV broadcast streams. Signal Process Image Commun 23(7):525–537
Article Google Scholar
Bertini M, Del Bimbo A, Torniai C (2005) Automatic video annotation using ontologies extended with visual information. In: MULTIMEDIA ’05: proceedings of the 13th annual ACM international conference on multimedia. ACM, New York, pp 395–398. doi:10.1145/1101149.1101235
Chapter Google Scholar
Brown MG, Foote JT, Jones GJF, Sparck Jones K, Young SJ (1995) Automatic content-based retrieval of broadcast news. In: MULTIMEDIA ’95: proceedings of the third ACM international conference on multimedia. ACM, New York, pp 35–43. doi:10.1145/217279.215080
Chapter Google Scholar
Brunelli R, Mich O, Modena CM (1999) A survey on the automatic indexing of video data. J Vis Commun Image Represent 10(2):78–112. doi:10.1006/jvci.1997.0404
Article Google Scholar
Bruno E, Moenne-Loccoz N, Marchand-Maillet S (2008) Design of multimodal dissimilarity spaces for retrieval of video documents. IEEE Trans Pattern Anal Mach Intell 30(9):1520–1533. doi:10.1109/TPAMI.2007.70801
Article Google Scholar
fu Chang S, Chen W, Meng HJ, Sundaram H, Zhong D (1998) A fully automated content-based video search engine supporting spatiotemporal queries. IEEE Trans Circuits Syst Video Technol 8:602–615
Article Google Scholar
Christel MG, Hauptmann AG, Wactlar HD, Ng TD (2002) Collages as dynamic summaries for news video. In: MULTIMEDIA ’02: proceedings of the tenth ACM international conference on multimedia. ACM, New York, pp 561–569. doi:10.1145/641007.641120
Chapter Google Scholar
Chua TS (2007) Towards the next plateau: innovative multimedia research beyond trecvid. In: MULTIMEDIA ’07: proceedings of the 15th international conference on multimedia. ACM, New York, pp 1054–1054. doi:10.1145/1291233.1291463
Chapter Google Scholar
Friedland G, Vinyals O (2008) Live speaker identification in conversations. In: Proceedings of ACM multimedia. ACM, pp 1017–1018
Friedland G, Gottlieb L, Janin A (2009) Joke-o-Mat: browsing sticoms punchline-by-punchline. In: Proceedings of ACM multimedia. ACM, pp 1115–1116
Friedland G, Yeo C, Hung H (2009) Visual speaker localization aided by acoustic models. In: Proceedings of ACM multimedia. ACM, pp 195–202
Friedland G, Gottlieb L, Janin A (2010) Narrative theme navigation for sitcoms supported by fan-generated scripts. In: Proceedings of the 3rd international workshop on automated information extraction in media production. ACM, New York, pp 3–8. doi:10.1145/1877850.1877854
Google Scholar
Gauvain JL, Lamel L, Adda G (2002) The limsi broadcast news transcription system. Speech Commun 37(1–2):89–108. doi:10.1016/S0167-6393(01)00061-9
Article MATH Google Scholar
Goh KS, Chang EY, Lai WC Multimodal concept-dependent active learning for image retrieval. In: MULTIMEDIA ’04: proceedings of the 12th annual ACM international conference on multimedia. ACM, New York, pp 564–571 (2004). doi:10.1145/1027527.1027664
Chapter Google Scholar
Gupta A, Jain R (1997) Visual information retrieval. Commun ACM 40(5):70–79. doi:10.1145/253769.253798
Article Google Scholar
Haubold A, Kender JR (2007) Vast mm: multimedia browser for presentation video. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 41–48. doi:10.1145/1282280.1282286
Chapter Google Scholar
Hollink L, Worring M (2005) Building a visual ontology for video retrieval. In: MULTIMEDIA ’05: proceedings of the 13th annual ACM international conference on multimedia. ACM, New York, pp 479–482. doi:10.1145/1101149.1101256
Chapter Google Scholar
Hoogs A, Rittscher J, Stein G, Schmiederer J (2003) Video content annotation using visual analysis and a large semantic knowledgebase. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 327–334
Huijbregts M, Ordelman R, de Jong F (2007) Annotation of heterogeneous multimedia content using automatic speech recognition. In: Proceedings of the second international conference on semantic and digital media technologies, SAMT 2007. Lecture notes in computer science, vol 4816. Springer, Berlin, pp 78–90. http://doc.utwente.nl/62090/
Google Scholar
Janin A, Gottlieb L, Friedland G (2010) Joke-o-Mat HD: browsing sitcoms with human derived transcripts. In: Proceedings of the ACM international conference on multimedia 2010. ACM, New York, pp 1591–1594. doi:10.1145/1873951.1874295
Google Scholar
Jong FD, luc Gauvain J, Hartog JD, Netter K (1998) Olive: speech based video retrieval
Larson M, Newman E, Jones G (2008) Overview of videoclef 2008: automatic generation of topic-based feeds for dual language audio-visual content. In: Working notes for the CLEF 2008 workshop, Aarhus
Natsev A, Tešić J, Xie L, Yan R, Smith JR (2007) Ibm multimedia search and retrieval system. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 645–645. doi:10.1145/1282280.1282373
Chapter Google Scholar
NIST Rich Transcription Evaluation. http://www.itl.nist.gov/iad/mig/tests/rt/
NIST TRECVid Evaluation. http://www-nlpir.nist.gov/projects/trecvid/
Niu F, Goela N, Divakaran A, Abdel-Mottaleb M (2008) Audio scene segmentation for video with generic content. In: Proceedings of SPIE, vol 6820, p 68200S
Reynolds DA (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Commun 17(1–2):91–108. doi:10.1016/0167-6393(95)00009-D
Article Google Scholar
Reynolds DA, Torres-Carrasquillo P (2005) Approaches and applications of audio diarization. In: Proceedings of the IEEE ICASSP
de Rooij O, Snoek CGM, Worring M (2007) Query on demand video browsing. In: MULTIMEDIA ’07: proceedings of the 15th international conference on multimedia. ACM, New York, pp 811–814. doi:10.1145/1291233.1291417
Chapter Google Scholar
Satoh S, Nakamura Y, Kanade T (1999) Name-it: naming and detecting faces in news videos. IEEE Multimed 6(1):22–35
Article Google Scholar
Snoek CGM, Worring M (2009) Concept-based video retrieval. Foundam Trends Inf Retr 2(4):215–322. doi:10.1561/1500000014
Article Google Scholar
Sun Q, Hürst W (2008) Video browsing on handheld devices: interface designs for the next generation of mobile video players. IEEE Multimed 15(3):76–83. doi:10.1109/MMUL.2008.66
Article Google Scholar
Vinyals O, Friedland G (2008) Towards semantic analysis of conversations: a system for the live identification of speakers in meetings. In: Proceedings of IEEE international conference on semantic computing, pp 456–459
Wactlar H, Kanade T, Smith M, Stevens S (1996) Intelligent access to digital video: informedia project. Computer 29(5):46–52
Article Google Scholar
Wooters C, Huijbregts M (2008) The ICSI RT07s speaker diarization system. In: Multimodal technologies for perception of humans: international evaluation workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, 8–11 May 2007, revised selected papers. Springer, Berlin, Heidelberg, pp 509–519. doi:10.1007/978-3-540-68585-2_47
Google Scholar
Xu C, Wang J, Wan K, Li Y, Duan L (2006) Live sports event detection based on broadcast video and web-casting text. In: MULTIMEDIA ’06: proceedings of the 14th annual ACM international conference on multimedia. ACM, New York, pp 221–230. doi:10.1145/1180639.1180699
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute, Berkeley, CA, USA
Gerald Friedland, Luke Gottlieb & Adam Janin

Authors

Gerald Friedland
View author publications
You can also search for this author in PubMed Google Scholar
Luke Gottlieb
View author publications
You can also search for this author in PubMed Google Scholar
Adam Janin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luke Gottlieb.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Friedland, G., Gottlieb, L. & Janin, A. Narrative theme navigation for sitcoms supported by fan-generated scripts. Multimed Tools Appl 63, 387–406 (2013). https://doi.org/10.1007/s11042-011-0877-z

Download citation

Published: 20 September 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s11042-011-0877-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Narrative theme navigation for sitcoms supported by fan-generated scripts

Abstract

Access this article

Similar content being viewed by others

Narrative theory and the dynamics of popular movies

The power of AI in marketing: enhancing efficiency and improving customer perception through AI-generated storyboards

Seeing is No Longer Believing: A Survey on the State of Deepfakes, AI-Generated Humans, and Other Nonveridical Media

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Narrative theme navigation for sitcoms supported by fan-generated scripts

Abstract

Access this article

Similar content being viewed by others

Narrative theory and the dynamics of popular movies

The power of AI in marketing: enhancing efficiency and improving customer perception through AI-generated storyboards

Seeing is No Longer Believing: A Survey on the State of Deepfakes, AI-Generated Humans, and Other Nonveridical Media

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation