Multimodal concept detection in broadcast media: KavTan

Soysal, Medeni; Berker Loğoğlu, K.; Tekin, Mashar; Esen, Ersin; Saracoğlu, Ahmet; Oskay Acar, Banu; Can Ozan, Ezgi; Ateş, Tuğrul K.; Sevimli, Hakan; Sevinç, Müge; Atıl, İlkay; Özkan, Savaş; Ali Arabacı, Mehmet; Tankız, Seda; Karadeniz, Talha; Önür, Duygu; Selçuk, Sezin; Alatan, A. Aydın; Çiloğlu, Tolga

doi:10.1007/s11042-013-1564-z

Multimodal concept detection in broadcast media: KavTan

Published: 10 July 2013

Volume 72, pages 2787–2832, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Medeni Soysal¹,
K. Berker Loğoğlu¹,
Mashar Tekin¹,
Ersin Esen¹,
Ahmet Saracoğlu¹,
Banu Oskay Acar¹,
Ezgi Can Ozan¹,
Tuğrul K. Ateş¹,
Hakan Sevimli¹,
Müge Sevinç¹,
İlkay Atıl¹,
Savaş Özkan¹,
Mehmet Ali Arabacı¹,
Seda Tankız¹,
Talha Karadeniz¹,
Duygu Önür¹,
Sezin Selçuk¹,
A. Aydın Alatan¹ &
…
Tolga Çiloğlu¹

349 Accesses
2 Citations
1 Altmetric
1 Mention
Explore all metrics

Abstract

Concept detection stands as an important problem for efficient indexing and retrieval in large video archives. In this work, the KavTan System, which performs high-level semantic classification in one of the largest TV archives of Turkey, is presented. In this system, concept detection is performed using generalized visual and audio concept detection modules that are supported by video text detection, audio keyword spotting and specialized audio-visual semantic detection components. The performance of the presented framework was assessed objectively over a wide range of semantic concepts (5 high-level, 14 visual, 9 audio, 2 supplementary) by using a significant amount of precisely labeled ground truth data. KavTan System achieves successful high-level concept detection performance in unconstrained TV broadcast by efficiently utilizing multimodal information that is systematically extracted from both spatial and temporal extent of multimedia data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proc ECML, pp 39–50
Ates TK, Ozkan S, Soysal M, Alatan AA (2011) Relevance feedback for semantic classification: a comparative study. In: 2011 IEEE 19th conference on signal processing and communications applications (SIU), pp 1004–1007
Barrington L, Chan A, Turnbull D, Lanckriet G (2007) Audio information retrieval using semantic similarity. In: Proc. ICASSP, IEEE, vol 2, pp II–725
Bay H, Ess a, Tuytelaars T, Van Gool L (2008) Speeded-up Robust Features (SURF). Comp Vision Image Underst 110(3):346–359
Article Google Scholar
Biatov K, Hesseler W, Koehler J (2008) Audio data retrieval and recognition using model selection criterion. In: Proc. ICSPCS, IEEE, pp 1–5
Chang S, He J, Jiang Y, Khoury E, Ngo C, Yanagawa A, Zavesky E (2008) Columbia university at trecvid2008: high-level feature extraction and interactive video search. In: Proc. TRECVID
Chang YC, Chen SM (2006) A new query reweighting method for document retrieval based on genetic algorithms. IEEE Trans Evol Comput 10(5):617–622
Article Google Scholar
Changkaew P, Kongkachandra R (2010) Automatic movie rating using visual and linguistic information. In: Proc. ICIIC, IEEE, pp 12–16
Cheng J, Drue S, Hartmann G, Thiem J (2000) Efficient detection and extraction of color objects from complex scenes. In: Proc. ICPR, IEEE, vol 1, pp 668–671
Chu S, Narayanan S, Kuo C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158
Article Google Scholar
Clarin C, Dionisio J, Echavez M, Naval P (2006) Dove: detection of movie violence using motion intensity analysis on skin and blood. In: Proc. PCSC, Citeseer, vol 6, pp 150–156
Clavel C, Ehrette T, Richard G (2005) Events detection for an audio-based surveillance system. In: Proc. ICME, IEEE, pp 1306–1309
Crandall D, Luo J (2004) Robust color object detection using spatial-color joint probability functions. In: Proc. CVPR, IEEE, vol 1, pp I–379
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. CVPR, IEEE, vol 1, pp 886–893
Deselaers T, Pimenidis L, Ney H (2008) Bag-of-visual-words models for adult image classification and filtering. In: Proc. ICPR, IEEE, pp 1–4
Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proc. CVPR, IEEE, vol 2, pp II–264
Ghimire D, Lee J (2010) Color image enhancement in hsv space using nonlinear transfer function and neighborhood dependent approach with preserving details. In: Proc. PSIVT, IEEE, pp 422–426
Gotlieb CC, Kreyszig HE (1990) Texture descriptors based on co-occurrence matrices. Comput Vis Graph Image Process 51(1):70–86
Article Google Scholar
Huang J, Kumar S, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: Proc. CVPR, pp 762–768
Huang R, Hansen J (2006) Advances in unsupervised audio classification and segmentation for the broadcast news and ngsw corpora. IEEE Trans Audio Speech Lang Process 14(3):907–919
Article Google Scholar
Huttenlocher D, Klanderman G, Rucklidge W (1993) Comparing images using the hausdorff distance. IEEE Trans Patt Anal Mac Intel 15(9):850–863
Article Google Scholar
Jansohn C, Ulges A, Breuel T (2009) Detecting pornographic video content by combining image features with motion information. In: Proc. MM, ACM, pp 601–604
Jia W, Zhang H, He X, Wu Q (2006) Image matching using colour edge cooccurrence histograms. In: Proc. SMC, IEEE, vol 3, pp 2413–2419
Jiang Y, Ngo C, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proc. CIVR, ACM, pp 494–501
Jones M, Rehg J (1999) Statistical color models with application to skin detection. In: Proc. CVPR, IEEE, vol 1
Jones M, Rehg J (2002) Statistical color models with application to skin detection. Int J Comput Vis 46(1):81–96
Article MATH Google Scholar
Jones M, Viola P, Jones M, Snow D (2003) Detecting pedestrians using patterns of motion and appearance. In: Proc. ICCV, Citeseer
Lin C, Chen S, Truong T, Chang Y (2005) Audio classification and categorization based on wavelets and support vector machine. IEEE Trans Audio Speech Lang Process 13(5):644–651
Article Google Scholar
Liu Y, Xie H (2009) Constructing surf visual-words for pornographic images detection. In: Proc. ICCIT, IEEE, pp 404–407
Lopes A, de Avila S, Peixoto A, Oliveira R, Araújo A (2009a) A bag-of-features approach based on hue-sift descriptor for nude detection. In: Proc. ESPC, Citeseer
Lopes A, de Avila S, Peixoto A, Oliveira R, de M Coelho M, de A Araujo A (2009b) Nude detection in video using bag-of-visual-features. In: Proc. SIBGRAPI, IEEE, pp 224–231
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mamou J, Ramabhadran B, Siohan O (2007) Vocabulary independent spoken term detection. In: Proc. SIGIR, ACM, pp 615–622
Manjunath B, Salembier P, Sikora T (2002) Introduction to MPEG-7: multimedia content description interface, vol 1. Wiley
Mesaros A, Heittola T, Eronen A, Virtanen T (2010) Acoustic event detection in real life recordings. In: Proc. ESPC, pp 1267–1271
Mikolajczyk K, Schmid C, Zisserman A (2004) Human detection based on a probabilistic assembly of robust part detectors. In: Proc. ECCV, pp 69–82
MPEG (2001) Mpeg-7 multimedia content description interface. ISO/IEC 15938
Muller H, Muller W, Marchand-Maillet S, Pun T, Squire DM (2000) Strategies for positive and negative relevance feedback in image retrieval. In: Proc. ICPR, vol 1, pp 1043–1046
Nam J, Alghoniemy M, Tewfik A (1998) Audio-visual content-based violent scene characterization. In: Proc. ICIP, IEEE, vol 1, pp 353–357
Over P, Awad G, Fiscus J, Antonishek B, Qu G (2011) TRECVID 2011 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proc. TRECVID
Ozan E, Tankiz S, Acar B, Ciloglu T (2011) Content based event retrieval on TV broadcast audio. In: Proc. SIU, IEEE, pp 391–394
Peng Y, Yang Z, Yi J, Cao L, Li H, Yao J (2008) Peking university at trecvid 2008: high level feature extraction. In: Proc. TRECVID, vol 3
Petridis S, Giannakopoulos T, Perantonis S (2010) A multi-class method for detecting audio events in news broadcasts. In: Artificial intelligence: theories, models and applications, pp 399–404
Phan R, Androutsos D (2010) Content-based retrieval of logo and trademarks in unconstrained color image databases using color edge gradient co-occurrence histograms. Comp Vision Image Underst 114(1):66–84
Article Google Scholar
Phan R, Chia J, Androutsos D (2008) Colour logo and trademark detection in unconstrained images using colour edge gradient co-occurrence histograms. In: Proc. CCECE 2008, IEEE, pp 000,531–000,534
Phillips P, Moon H, Rizvi S, Rauss P (2000) The feret evaluation methodology for face-recognition algorithms. IEEE Trans Patt Anal Mac Intel 22(10):1090–1104
Article Google Scholar
Portelo J, Bugalho M, Trancoso I, Neto J, Abad A, Serralheiro A (2009) Non-speech audio event detection. In: Proc. ICASSP, IEEE, pp 1973–1976
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system: experiments in automatic document processing, chap 14. Prentice-Hall series in automatic computation, Prentice-Hall, Englewood Cliffs NJ, pp 313–323
Google Scholar
van de Sande KEA, Gevers T, Snoek C (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Patt Anal Mac Intel 32(9):1582–1596
Article Google Scholar
Saracoglu A, Alatan A (2006) Automatic video text localization and recognition. In: Proc. SIU, IEEE, pp 1–4
Saracoğlu A, Tekin M, Esen E, Soysal M, Loğoğlu K, Ateş T, Sevinç A, Sevimli H, Acar B, Zubari U et al (2010) Generalized visual concept detection. In: Proc. SIU, IEEE, pp 621–624
Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245
Article Google Scholar
Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in trecvid: a 5-year retrospective of achievements. In: Divakaran A (ed) Multimedia content analysis, theory and applications. Springer Verlag, Berlin, pp 151–174
Google Scholar
Snoek C, Worring M, Koelma D, Smeulders A (2007) A learned lexicon-driven paradigm for interactive video retrieval. IEEE Trans Multimed 9(2):280–292
Article Google Scholar
Snoek CGM, van de Sande KEA, de Rooij O, Huurnink B, Gavves E, Odijk D, de Rijke M, Gevers T, Worring M, Koelma DC, Smeulders AWM (2010) The mediamill trecvid 2010 semantic video search engine. In: Proc. TRECVID
Snoek C et al (2006) The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Trans Patt Anal Mac Intel 28(10):1678–1689
Article Google Scholar
Stricker MA, Orengo M (1995) Similarity of color images. In: Proc. SPIE, pp 381–392
Sundaram S, Narayanan S (2008) Audio retrieval by latent perceptual indexing. In: ICASSP, IEEE, pp 49–52
Tao L, Asari V (2004) An integrated neighborhood dependent approach for nonlinear enhancement of color images. In: Proc. ITCC, IEEE, vol 2, pp 138–139
Viola M, Jones M, Viola P (2003) Fast multi-view face detection. In: Proc. CVPR, Citeseer
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proc. CVPR, IEEE, vol 1, pp I–511
Wang Y, Liu Z, Huang JC (2000) Multimedia content analysis-using both audio and visual clues. IEEE Signal Proc Mag 17(6):12–36
Article Google Scholar
Wu P, Manjunanth B, Newsam S, Shin H (1999) A texture descriptor for image retrieval and browsing. In: Proc. CBAIVL, pp 3–7
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proc. CIKM, ACM, pp 102–111
Yoon J, Jayant N (2001) Relevance feedback for semantics based image retrieval. In: Proc. ICIP, vol 1, pp 42–45.
You J, Liu G, Perkis A (2010) A semantic framework for video genre classification and event analysis. Signal Process Imag Commun 25(4):287–302
Article Google Scholar
Zhou XS, Huang TS (2003) Relevance feedback in image retrieval: a comprehensive review. Multimedia Systems 8(6):536–544
Article Google Scholar
Zubari Ü, Ozan E, Acar B, Ciloglu T, Esen E, Ateş T, Önür D (2010) Speech detection on broadcast audio. In: EUSIPCO
Zuo H, Wu O, Hu W, Xu B (2008) Recognition of blue movies by fusion of audio and video. In: Proc. ICME, IEEE, pp 37–40

Download references

Author information

Authors and Affiliations

TUBITAK - UZAY, METU Campus, Ankara, Turkey
Medeni Soysal, K. Berker Loğoğlu, Mashar Tekin, Ersin Esen, Ahmet Saracoğlu, Banu Oskay Acar, Ezgi Can Ozan, Tuğrul K. Ateş, Hakan Sevimli, Müge Sevinç, İlkay Atıl, Savaş Özkan, Mehmet Ali Arabacı, Seda Tankız, Talha Karadeniz, Duygu Önür, Sezin Selçuk, A. Aydın Alatan & Tolga Çiloğlu

Authors

Medeni Soysal
View author publications
You can also search for this author in PubMed Google Scholar
K. Berker Loğoğlu
View author publications
You can also search for this author in PubMed Google Scholar
Mashar Tekin
View author publications
You can also search for this author in PubMed Google Scholar
Ersin Esen
View author publications
You can also search for this author in PubMed Google Scholar
Ahmet Saracoğlu
View author publications
You can also search for this author in PubMed Google Scholar
Banu Oskay Acar
View author publications
You can also search for this author in PubMed Google Scholar
Ezgi Can Ozan
View author publications
You can also search for this author in PubMed Google Scholar
Tuğrul K. Ateş
View author publications
You can also search for this author in PubMed Google Scholar
Hakan Sevimli
View author publications
You can also search for this author in PubMed Google Scholar
Müge Sevinç
View author publications
You can also search for this author in PubMed Google Scholar
İlkay Atıl
View author publications
You can also search for this author in PubMed Google Scholar
Savaş Özkan
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet Ali Arabacı
View author publications
You can also search for this author in PubMed Google Scholar
Seda Tankız
View author publications
You can also search for this author in PubMed Google Scholar
Talha Karadeniz
View author publications
You can also search for this author in PubMed Google Scholar
Duygu Önür
View author publications
You can also search for this author in PubMed Google Scholar
Sezin Selçuk
View author publications
You can also search for this author in PubMed Google Scholar
A. Aydın Alatan
View author publications
You can also search for this author in PubMed Google Scholar
Tolga Çiloğlu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Medeni Soysal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soysal, M., Berker Loğoğlu, K., Tekin, M. et al. Multimodal concept detection in broadcast media: KavTan. Multimed Tools Appl 72, 2787–2832 (2014). https://doi.org/10.1007/s11042-013-1564-z

Download citation

Published: 10 July 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s11042-013-1564-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal concept detection in broadcast media: KavTan

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video

Content-Based Video Retrieval in Historical Collections of the German Broadcasting Archive

VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multimodal concept detection in broadcast media: KavTan

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video

Content-Based Video Retrieval in Historical Collections of the German Broadcasting Archive

VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation