Abstract
The following article provides the definitive description of the complete Joke-O-Mat system to navigate sitcoms as presented briefly in Friedland et al. (2009) and extended in Janin et al. (2010), which was augmented with fan-generated scripts as described in Friedland et al. (2010). The system with the extension allows a user to browse a sitcom by scene, punchline, and dialog segment, and to filter these themes by actor and by keyword. For example, the user can choose to watch only punchlines by the character “Kramer” that contain the word “armoire”. The system infers the narrative themes and provides word-level search by automatically aligning the output of a speaker identification system and a speech recognizer to both closed captions and scripts generated by fans on the Internet. The segmentations produced by this system have proven to be indistinguishable from expert-generated segmentations, and require significantly less time to produce. The article describes the original and the extended Joke-O-Mat (http://www.icsi.berkeley.edu/jokeomat/) system, discusses problems with the use of fan-generated content, and presents results on episodes from the sitcom Seinfeld with regards to segmentation accuracy and overall user satisfaction as determined by a human-subject study.
Similar content being viewed by others
Notes
In previous work on transcribing multiparty meetings, we found that a one hour meeting could take upwards of 20 hr for a human to transcribe and there is no reason to think that a sitcom would be qualitatively different.
References
Adcock J, Cooper M, Pickens J (2008) Experiments in interactive video search by addition and subtraction. In: CIVR ’08: proceedings of the 2008 international conference on content-based image and video retrieval. ACM, New York, pp 465–474. doi:10.1145/1386352.1386412
Ayache S, Quénot G (2007) Evaluation of active learning strategies for video indexing. Image Commun. 22(7–8):692–704. doi:10.1016/j.image.2007.05.010
Benitez AB, Ab ABB, Smith JR, Chang SF (2000) Medianet: a multimedia information network for knowledge representation
Berrani S, Manson G, Lechat P (2008) A non-supervised approach for repeated sequence detection in TV broadcast streams. Signal Process Image Commun 23(7):525–537
Bertini M, Del Bimbo A, Torniai C (2005) Automatic video annotation using ontologies extended with visual information. In: MULTIMEDIA ’05: proceedings of the 13th annual ACM international conference on multimedia. ACM, New York, pp 395–398. doi:10.1145/1101149.1101235
Brown MG, Foote JT, Jones GJF, Sparck Jones K, Young SJ (1995) Automatic content-based retrieval of broadcast news. In: MULTIMEDIA ’95: proceedings of the third ACM international conference on multimedia. ACM, New York, pp 35–43. doi:10.1145/217279.215080
Brunelli R, Mich O, Modena CM (1999) A survey on the automatic indexing of video data. J Vis Commun Image Represent 10(2):78–112. doi:10.1006/jvci.1997.0404
Bruno E, Moenne-Loccoz N, Marchand-Maillet S (2008) Design of multimodal dissimilarity spaces for retrieval of video documents. IEEE Trans Pattern Anal Mach Intell 30(9):1520–1533. doi:10.1109/TPAMI.2007.70801
fu Chang S, Chen W, Meng HJ, Sundaram H, Zhong D (1998) A fully automated content-based video search engine supporting spatiotemporal queries. IEEE Trans Circuits Syst Video Technol 8:602–615
Christel MG, Hauptmann AG, Wactlar HD, Ng TD (2002) Collages as dynamic summaries for news video. In: MULTIMEDIA ’02: proceedings of the tenth ACM international conference on multimedia. ACM, New York, pp 561–569. doi:10.1145/641007.641120
Chua TS (2007) Towards the next plateau: innovative multimedia research beyond trecvid. In: MULTIMEDIA ’07: proceedings of the 15th international conference on multimedia. ACM, New York, pp 1054–1054. doi:10.1145/1291233.1291463
Friedland G, Vinyals O (2008) Live speaker identification in conversations. In: Proceedings of ACM multimedia. ACM, pp 1017–1018
Friedland G, Gottlieb L, Janin A (2009) Joke-o-Mat: browsing sticoms punchline-by-punchline. In: Proceedings of ACM multimedia. ACM, pp 1115–1116
Friedland G, Yeo C, Hung H (2009) Visual speaker localization aided by acoustic models. In: Proceedings of ACM multimedia. ACM, pp 195–202
Friedland G, Gottlieb L, Janin A (2010) Narrative theme navigation for sitcoms supported by fan-generated scripts. In: Proceedings of the 3rd international workshop on automated information extraction in media production. ACM, New York, pp 3–8. doi:10.1145/1877850.1877854
Gauvain JL, Lamel L, Adda G (2002) The limsi broadcast news transcription system. Speech Commun 37(1–2):89–108. doi:10.1016/S0167-6393(01)00061-9
Goh KS, Chang EY, Lai WC Multimodal concept-dependent active learning for image retrieval. In: MULTIMEDIA ’04: proceedings of the 12th annual ACM international conference on multimedia. ACM, New York, pp 564–571 (2004). doi:10.1145/1027527.1027664
Gupta A, Jain R (1997) Visual information retrieval. Commun ACM 40(5):70–79. doi:10.1145/253769.253798
Haubold A, Kender JR (2007) Vast mm: multimedia browser for presentation video. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 41–48. doi:10.1145/1282280.1282286
Hollink L, Worring M (2005) Building a visual ontology for video retrieval. In: MULTIMEDIA ’05: proceedings of the 13th annual ACM international conference on multimedia. ACM, New York, pp 479–482. doi:10.1145/1101149.1101256
Hoogs A, Rittscher J, Stein G, Schmiederer J (2003) Video content annotation using visual analysis and a large semantic knowledgebase. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 327–334
Huijbregts M, Ordelman R, de Jong F (2007) Annotation of heterogeneous multimedia content using automatic speech recognition. In: Proceedings of the second international conference on semantic and digital media technologies, SAMT 2007. Lecture notes in computer science, vol 4816. Springer, Berlin, pp 78–90. http://doc.utwente.nl/62090/
Janin A, Gottlieb L, Friedland G (2010) Joke-o-Mat HD: browsing sitcoms with human derived transcripts. In: Proceedings of the ACM international conference on multimedia 2010. ACM, New York, pp 1591–1594. doi:10.1145/1873951.1874295
Jong FD, luc Gauvain J, Hartog JD, Netter K (1998) Olive: speech based video retrieval
Larson M, Newman E, Jones G (2008) Overview of videoclef 2008: automatic generation of topic-based feeds for dual language audio-visual content. In: Working notes for the CLEF 2008 workshop, Aarhus
Natsev A, Tešić J, Xie L, Yan R, Smith JR (2007) Ibm multimedia search and retrieval system. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 645–645. doi:10.1145/1282280.1282373
NIST Rich Transcription Evaluation. http://www.itl.nist.gov/iad/mig/tests/rt/
NIST TRECVid Evaluation. http://www-nlpir.nist.gov/projects/trecvid/
Niu F, Goela N, Divakaran A, Abdel-Mottaleb M (2008) Audio scene segmentation for video with generic content. In: Proceedings of SPIE, vol 6820, p 68200S
Reynolds DA (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Commun 17(1–2):91–108. doi:10.1016/0167-6393(95)00009-D
Reynolds DA, Torres-Carrasquillo P (2005) Approaches and applications of audio diarization. In: Proceedings of the IEEE ICASSP
de Rooij O, Snoek CGM, Worring M (2007) Query on demand video browsing. In: MULTIMEDIA ’07: proceedings of the 15th international conference on multimedia. ACM, New York, pp 811–814. doi:10.1145/1291233.1291417
Satoh S, Nakamura Y, Kanade T (1999) Name-it: naming and detecting faces in news videos. IEEE Multimed 6(1):22–35
Snoek CGM, Worring M (2009) Concept-based video retrieval. Foundam Trends Inf Retr 2(4):215–322. doi:10.1561/1500000014
Sun Q, Hürst W (2008) Video browsing on handheld devices: interface designs for the next generation of mobile video players. IEEE Multimed 15(3):76–83. doi:10.1109/MMUL.2008.66
Vinyals O, Friedland G (2008) Towards semantic analysis of conversations: a system for the live identification of speakers in meetings. In: Proceedings of IEEE international conference on semantic computing, pp 456–459
Wactlar H, Kanade T, Smith M, Stevens S (1996) Intelligent access to digital video: informedia project. Computer 29(5):46–52
Wooters C, Huijbregts M (2008) The ICSI RT07s speaker diarization system. In: Multimodal technologies for perception of humans: international evaluation workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, 8–11 May 2007, revised selected papers. Springer, Berlin, Heidelberg, pp 509–519. doi:10.1007/978-3-540-68585-2_47
Xu C, Wang J, Wan K, Li Y, Duan L (2006) Live sports event detection based on broadcast video and web-casting text. In: MULTIMEDIA ’06: proceedings of the 14th annual ACM international conference on multimedia. ACM, New York, pp 221–230. doi:10.1145/1180639.1180699
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Friedland, G., Gottlieb, L. & Janin, A. Narrative theme navigation for sitcoms supported by fan-generated scripts. Multimed Tools Appl 63, 387–406 (2013). https://doi.org/10.1007/s11042-011-0877-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0877-z