Skip to main content
Log in

Narrative theme navigation for sitcoms supported by fan-generated scripts

Video navigation based on acoustic detection of actors and narrative elements

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The following article provides the definitive description of the complete Joke-O-Mat system to navigate sitcoms as presented briefly in Friedland et al. (2009) and extended in Janin et al. (2010), which was augmented with fan-generated scripts as described in Friedland et al. (2010). The system with the extension allows a user to browse a sitcom by scene, punchline, and dialog segment, and to filter these themes by actor and by keyword. For example, the user can choose to watch only punchlines by the character “Kramer” that contain the word “armoire”. The system infers the narrative themes and provides word-level search by automatically aligning the output of a speaker identification system and a speech recognizer to both closed captions and scripts generated by fans on the Internet. The segmentations produced by this system have proven to be indistinguishable from expert-generated segmentations, and require significantly less time to produce. The article describes the original and the extended Joke-O-Mat (http://www.icsi.berkeley.edu/jokeomat/) system, discusses problems with the use of fan-generated content, and presents results on episodes from the sitcom Seinfeld with regards to segmentation accuracy and overall user satisfaction as determined by a human-subject study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://nist.gov/speech/tests/rt/rt2004/fall.

  2. In previous work on transcribing multiparty meetings, we found that a one hour meeting could take upwards of 20 hr for a human to transcribe and there is no reason to think that a sitcom would be qualitatively different.

References

  1. Adcock J, Cooper M, Pickens J (2008) Experiments in interactive video search by addition and subtraction. In: CIVR ’08: proceedings of the 2008 international conference on content-based image and video retrieval. ACM, New York, pp 465–474. doi:10.1145/1386352.1386412

    Chapter  Google Scholar 

  2. Ayache S, Quénot G (2007) Evaluation of active learning strategies for video indexing. Image Commun. 22(7–8):692–704. doi:10.1016/j.image.2007.05.010

    Google Scholar 

  3. Benitez AB, Ab ABB, Smith JR, Chang SF (2000) Medianet: a multimedia information network for knowledge representation

  4. Berrani S, Manson G, Lechat P (2008) A non-supervised approach for repeated sequence detection in TV broadcast streams. Signal Process Image Commun 23(7):525–537

    Article  Google Scholar 

  5. Bertini M, Del Bimbo A, Torniai C (2005) Automatic video annotation using ontologies extended with visual information. In: MULTIMEDIA ’05: proceedings of the 13th annual ACM international conference on multimedia. ACM, New York, pp 395–398. doi:10.1145/1101149.1101235

    Chapter  Google Scholar 

  6. Brown MG, Foote JT, Jones GJF, Sparck Jones K, Young SJ (1995) Automatic content-based retrieval of broadcast news. In: MULTIMEDIA ’95: proceedings of the third ACM international conference on multimedia. ACM, New York, pp 35–43. doi:10.1145/217279.215080

    Chapter  Google Scholar 

  7. Brunelli R, Mich O, Modena CM (1999) A survey on the automatic indexing of video data. J Vis Commun Image Represent 10(2):78–112. doi:10.1006/jvci.1997.0404

    Article  Google Scholar 

  8. Bruno E, Moenne-Loccoz N, Marchand-Maillet S (2008) Design of multimodal dissimilarity spaces for retrieval of video documents. IEEE Trans Pattern Anal Mach Intell 30(9):1520–1533. doi:10.1109/TPAMI.2007.70801

    Article  Google Scholar 

  9. fu Chang S, Chen W, Meng HJ, Sundaram H, Zhong D (1998) A fully automated content-based video search engine supporting spatiotemporal queries. IEEE Trans Circuits Syst Video Technol 8:602–615

    Article  Google Scholar 

  10. Christel MG, Hauptmann AG, Wactlar HD, Ng TD (2002) Collages as dynamic summaries for news video. In: MULTIMEDIA ’02: proceedings of the tenth ACM international conference on multimedia. ACM, New York, pp 561–569. doi:10.1145/641007.641120

    Chapter  Google Scholar 

  11. Chua TS (2007) Towards the next plateau: innovative multimedia research beyond trecvid. In: MULTIMEDIA ’07: proceedings of the 15th international conference on multimedia. ACM, New York, pp 1054–1054. doi:10.1145/1291233.1291463

    Chapter  Google Scholar 

  12. Friedland G, Vinyals O (2008) Live speaker identification in conversations. In: Proceedings of ACM multimedia. ACM, pp 1017–1018

  13. Friedland G, Gottlieb L, Janin A (2009) Joke-o-Mat: browsing sticoms punchline-by-punchline. In: Proceedings of ACM multimedia. ACM, pp 1115–1116

  14. Friedland G, Yeo C, Hung H (2009) Visual speaker localization aided by acoustic models. In: Proceedings of ACM multimedia. ACM, pp 195–202

  15. Friedland G, Gottlieb L, Janin A (2010) Narrative theme navigation for sitcoms supported by fan-generated scripts. In: Proceedings of the 3rd international workshop on automated information extraction in media production. ACM, New York, pp 3–8. doi:10.1145/1877850.1877854

    Google Scholar 

  16. Gauvain JL, Lamel L, Adda G (2002) The limsi broadcast news transcription system. Speech Commun 37(1–2):89–108. doi:10.1016/S0167-6393(01)00061-9

    Article  MATH  Google Scholar 

  17. Goh KS, Chang EY, Lai WC Multimodal concept-dependent active learning for image retrieval. In: MULTIMEDIA ’04: proceedings of the 12th annual ACM international conference on multimedia. ACM, New York, pp 564–571 (2004). doi:10.1145/1027527.1027664

    Chapter  Google Scholar 

  18. Gupta A, Jain R (1997) Visual information retrieval. Commun ACM 40(5):70–79. doi:10.1145/253769.253798

    Article  Google Scholar 

  19. Haubold A, Kender JR (2007) Vast mm: multimedia browser for presentation video. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 41–48. doi:10.1145/1282280.1282286

    Chapter  Google Scholar 

  20. Hollink L, Worring M (2005) Building a visual ontology for video retrieval. In: MULTIMEDIA ’05: proceedings of the 13th annual ACM international conference on multimedia. ACM, New York, pp 479–482. doi:10.1145/1101149.1101256

    Chapter  Google Scholar 

  21. Hoogs A, Rittscher J, Stein G, Schmiederer J (2003) Video content annotation using visual analysis and a large semantic knowledgebase. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 327–334

  22. Huijbregts M, Ordelman R, de Jong F (2007) Annotation of heterogeneous multimedia content using automatic speech recognition. In: Proceedings of the second international conference on semantic and digital media technologies, SAMT 2007. Lecture notes in computer science, vol 4816. Springer, Berlin, pp 78–90. http://doc.utwente.nl/62090/

    Google Scholar 

  23. Janin A, Gottlieb L, Friedland G (2010) Joke-o-Mat HD: browsing sitcoms with human derived transcripts. In: Proceedings of the ACM international conference on multimedia 2010. ACM, New York, pp 1591–1594. doi:10.1145/1873951.1874295

    Google Scholar 

  24. Jong FD, luc Gauvain J, Hartog JD, Netter K (1998) Olive: speech based video retrieval

  25. Larson M, Newman E, Jones G (2008) Overview of videoclef 2008: automatic generation of topic-based feeds for dual language audio-visual content. In: Working notes for the CLEF 2008 workshop, Aarhus

  26. Natsev A, Tešić J, Xie L, Yan R, Smith JR (2007) Ibm multimedia search and retrieval system. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 645–645. doi:10.1145/1282280.1282373

    Chapter  Google Scholar 

  27. NIST Rich Transcription Evaluation. http://www.itl.nist.gov/iad/mig/tests/rt/

  28. NIST TRECVid Evaluation. http://www-nlpir.nist.gov/projects/trecvid/

  29. Niu F, Goela N, Divakaran A, Abdel-Mottaleb M (2008) Audio scene segmentation for video with generic content. In: Proceedings of SPIE, vol 6820, p 68200S

  30. Reynolds DA (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Commun 17(1–2):91–108. doi:10.1016/0167-6393(95)00009-D

    Article  Google Scholar 

  31. Reynolds DA, Torres-Carrasquillo P (2005) Approaches and applications of audio diarization. In: Proceedings of the IEEE ICASSP

  32. de Rooij O, Snoek CGM, Worring M (2007) Query on demand video browsing. In: MULTIMEDIA ’07: proceedings of the 15th international conference on multimedia. ACM, New York, pp 811–814. doi:10.1145/1291233.1291417

    Chapter  Google Scholar 

  33. Satoh S, Nakamura Y, Kanade T (1999) Name-it: naming and detecting faces in news videos. IEEE Multimed 6(1):22–35

    Article  Google Scholar 

  34. Snoek CGM, Worring M (2009) Concept-based video retrieval. Foundam Trends Inf Retr 2(4):215–322. doi:10.1561/1500000014

    Article  Google Scholar 

  35. Sun Q, Hürst W (2008) Video browsing on handheld devices: interface designs for the next generation of mobile video players. IEEE Multimed 15(3):76–83. doi:10.1109/MMUL.2008.66

    Article  Google Scholar 

  36. Vinyals O, Friedland G (2008) Towards semantic analysis of conversations: a system for the live identification of speakers in meetings. In: Proceedings of IEEE international conference on semantic computing, pp 456–459

  37. Wactlar H, Kanade T, Smith M, Stevens S (1996) Intelligent access to digital video: informedia project. Computer 29(5):46–52

    Article  Google Scholar 

  38. Wooters C, Huijbregts M (2008) The ICSI RT07s speaker diarization system. In: Multimodal technologies for perception of humans: international evaluation workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, 8–11 May 2007, revised selected papers. Springer, Berlin, Heidelberg, pp 509–519. doi:10.1007/978-3-540-68585-2_47

    Google Scholar 

  39. Xu C, Wang J, Wan K, Li Y, Duan L (2006) Live sports event detection based on broadcast video and web-casting text. In: MULTIMEDIA ’06: proceedings of the 14th annual ACM international conference on multimedia. ACM, New York, pp 221–230. doi:10.1145/1180639.1180699

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luke Gottlieb.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Friedland, G., Gottlieb, L. & Janin, A. Narrative theme navigation for sitcoms supported by fan-generated scripts. Multimed Tools Appl 63, 387–406 (2013). https://doi.org/10.1007/s11042-011-0877-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-011-0877-z

Keywords

Navigation