Skip to main content
Log in

Multimodal concept detection in broadcast media: KavTan

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Concept detection stands as an important problem for efficient indexing and retrieval in large video archives. In this work, the KavTan System, which performs high-level semantic classification in one of the largest TV archives of Turkey, is presented. In this system, concept detection is performed using generalized visual and audio concept detection modules that are supported by video text detection, audio keyword spotting and specialized audio-visual semantic detection components. The performance of the presented framework was assessed objectively over a wide range of semantic concepts (5 high-level, 14 visual, 9 audio, 2 supplementary) by using a significant amount of precisely labeled ground truth data. KavTan System achieves successful high-level concept detection performance in unconstrained TV broadcast by efficiently utilizing multimodal information that is systematically extracted from both spatial and temporal extent of multimedia data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proc ECML, pp 39–50

  2. Ates TK, Ozkan S, Soysal M, Alatan AA (2011) Relevance feedback for semantic classification: a comparative study. In: 2011 IEEE 19th conference on signal processing and communications applications (SIU), pp 1004–1007

  3. Barrington L, Chan A, Turnbull D, Lanckriet G (2007) Audio information retrieval using semantic similarity. In: Proc. ICASSP, IEEE, vol 2, pp II–725

  4. Bay H, Ess a, Tuytelaars T, Van Gool L (2008) Speeded-up Robust Features (SURF). Comp Vision Image Underst 110(3):346–359

    Article  Google Scholar 

  5. Biatov K, Hesseler W, Koehler J (2008) Audio data retrieval and recognition using model selection criterion. In: Proc. ICSPCS, IEEE, pp 1–5

  6. Chang S, He J, Jiang Y, Khoury E, Ngo C, Yanagawa A, Zavesky E (2008) Columbia university at trecvid2008: high-level feature extraction and interactive video search. In: Proc. TRECVID

  7. Chang YC, Chen SM (2006) A new query reweighting method for document retrieval based on genetic algorithms. IEEE Trans Evol Comput 10(5):617–622

    Article  Google Scholar 

  8. Changkaew P, Kongkachandra R (2010) Automatic movie rating using visual and linguistic information. In: Proc. ICIIC, IEEE, pp 12–16

  9. Cheng J, Drue S, Hartmann G, Thiem J (2000) Efficient detection and extraction of color objects from complex scenes. In: Proc. ICPR, IEEE, vol 1, pp 668–671

  10. Chu S, Narayanan S, Kuo C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158

    Article  Google Scholar 

  11. Clarin C, Dionisio J, Echavez M, Naval P (2006) Dove: detection of movie violence using motion intensity analysis on skin and blood. In: Proc. PCSC, Citeseer, vol 6, pp 150–156

  12. Clavel C, Ehrette T, Richard G (2005) Events detection for an audio-based surveillance system. In: Proc. ICME, IEEE, pp 1306–1309

  13. Crandall D, Luo J (2004) Robust color object detection using spatial-color joint probability functions. In: Proc. CVPR, IEEE, vol 1, pp I–379

  14. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. CVPR, IEEE, vol 1, pp 886–893

  15. Deselaers T, Pimenidis L, Ney H (2008) Bag-of-visual-words models for adult image classification and filtering. In: Proc. ICPR, IEEE, pp 1–4

  16. Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proc. CVPR, IEEE, vol 2, pp II–264

  17. Ghimire D, Lee J (2010) Color image enhancement in hsv space using nonlinear transfer function and neighborhood dependent approach with preserving details. In: Proc. PSIVT, IEEE, pp 422–426

  18. Gotlieb CC, Kreyszig HE (1990) Texture descriptors based on co-occurrence matrices. Comput Vis Graph Image Process 51(1):70–86

    Article  Google Scholar 

  19. Huang J, Kumar S, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: Proc. CVPR, pp 762–768

  20. Huang R, Hansen J (2006) Advances in unsupervised audio classification and segmentation for the broadcast news and ngsw corpora. IEEE Trans Audio Speech Lang Process 14(3):907–919

    Article  Google Scholar 

  21. Huttenlocher D, Klanderman G, Rucklidge W (1993) Comparing images using the hausdorff distance. IEEE Trans Patt Anal Mac Intel 15(9):850–863

    Article  Google Scholar 

  22. Jansohn C, Ulges A, Breuel T (2009) Detecting pornographic video content by combining image features with motion information. In: Proc. MM, ACM, pp 601–604

  23. Jia W, Zhang H, He X, Wu Q (2006) Image matching using colour edge cooccurrence histograms. In: Proc. SMC, IEEE, vol 3, pp 2413–2419

  24. Jiang Y, Ngo C, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proc. CIVR, ACM, pp 494–501

  25. Jones M, Rehg J (1999) Statistical color models with application to skin detection. In: Proc. CVPR, IEEE, vol 1

  26. Jones M, Rehg J (2002) Statistical color models with application to skin detection. Int J Comput Vis 46(1):81–96

    Article  MATH  Google Scholar 

  27. Jones M, Viola P, Jones M, Snow D (2003) Detecting pedestrians using patterns of motion and appearance. In: Proc. ICCV, Citeseer

  28. Lin C, Chen S, Truong T, Chang Y (2005) Audio classification and categorization based on wavelets and support vector machine. IEEE Trans Audio Speech Lang Process 13(5):644–651

    Article  Google Scholar 

  29. Liu Y, Xie H (2009) Constructing surf visual-words for pornographic images detection. In: Proc. ICCIT, IEEE, pp 404–407

  30. Lopes A, de Avila S, Peixoto A, Oliveira R, Araújo A (2009a) A bag-of-features approach based on hue-sift descriptor for nude detection. In: Proc. ESPC, Citeseer

  31. Lopes A, de Avila S, Peixoto A, Oliveira R, de M Coelho M, de A Araujo A (2009b) Nude detection in video using bag-of-visual-features. In: Proc. SIBGRAPI, IEEE, pp 224–231

  32. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  33. Mamou J, Ramabhadran B, Siohan O (2007) Vocabulary independent spoken term detection. In: Proc. SIGIR, ACM, pp 615–622

  34. Manjunath B, Salembier P, Sikora T (2002) Introduction to MPEG-7: multimedia content description interface, vol 1. Wiley

  35. Mesaros A, Heittola T, Eronen A, Virtanen T (2010) Acoustic event detection in real life recordings. In: Proc. ESPC, pp 1267–1271

  36. Mikolajczyk K, Schmid C, Zisserman A (2004) Human detection based on a probabilistic assembly of robust part detectors. In: Proc. ECCV, pp 69–82

  37. MPEG (2001) Mpeg-7 multimedia content description interface. ISO/IEC 15938

  38. Muller H, Muller W, Marchand-Maillet S, Pun T, Squire DM (2000) Strategies for positive and negative relevance feedback in image retrieval. In: Proc. ICPR, vol 1, pp 1043–1046

  39. Nam J, Alghoniemy M, Tewfik A (1998) Audio-visual content-based violent scene characterization. In: Proc. ICIP, IEEE, vol 1, pp 353–357

  40. Over P, Awad G, Fiscus J, Antonishek B, Qu G (2011) TRECVID 2011 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proc. TRECVID

  41. Ozan E, Tankiz S, Acar B, Ciloglu T (2011) Content based event retrieval on TV broadcast audio. In: Proc. SIU, IEEE, pp 391–394

  42. Peng Y, Yang Z, Yi J, Cao L, Li H, Yao J (2008) Peking university at trecvid 2008: high level feature extraction. In: Proc. TRECVID, vol 3

  43. Petridis S, Giannakopoulos T, Perantonis S (2010) A multi-class method for detecting audio events in news broadcasts. In: Artificial intelligence: theories, models and applications, pp 399–404

  44. Phan R, Androutsos D (2010) Content-based retrieval of logo and trademarks in unconstrained color image databases using color edge gradient co-occurrence histograms. Comp Vision Image Underst 114(1):66–84

    Article  Google Scholar 

  45. Phan R, Chia J, Androutsos D (2008) Colour logo and trademark detection in unconstrained images using colour edge gradient co-occurrence histograms. In: Proc. CCECE 2008, IEEE, pp 000,531–000,534

  46. Phillips P, Moon H, Rizvi S, Rauss P (2000) The feret evaluation methodology for face-recognition algorithms. IEEE Trans Patt Anal Mac Intel 22(10):1090–1104

    Article  Google Scholar 

  47. Portelo J, Bugalho M, Trancoso I, Neto J, Abad A, Serralheiro A (2009) Non-speech audio event detection. In: Proc. ICASSP, IEEE, pp 1973–1976

  48. Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system: experiments in automatic document processing, chap 14. Prentice-Hall series in automatic computation, Prentice-Hall, Englewood Cliffs NJ, pp 313–323

    Google Scholar 

  49. van de Sande KEA, Gevers T, Snoek C (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Patt Anal Mac Intel 32(9):1582–1596

    Article  Google Scholar 

  50. Saracoglu A, Alatan A (2006) Automatic video text localization and recognition. In: Proc. SIU, IEEE, pp 1–4

  51. Saracoğlu A, Tekin M, Esen E, Soysal M, Loğoğlu K, Ateş T, Sevinç A, Sevimli H, Acar B, Zubari U et al (2010) Generalized visual concept detection. In: Proc. SIU, IEEE, pp 621–624

  52. Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245

    Article  Google Scholar 

  53. Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in trecvid: a 5-year retrospective of achievements. In: Divakaran A (ed) Multimedia content analysis, theory and applications. Springer Verlag, Berlin, pp 151–174

    Google Scholar 

  54. Snoek C, Worring M, Koelma D, Smeulders A (2007) A learned lexicon-driven paradigm for interactive video retrieval. IEEE Trans Multimed 9(2):280–292

    Article  Google Scholar 

  55. Snoek CGM, van de Sande KEA, de Rooij O, Huurnink B, Gavves E, Odijk D, de Rijke M, Gevers T, Worring M, Koelma DC, Smeulders AWM (2010) The mediamill trecvid 2010 semantic video search engine. In: Proc. TRECVID

  56. Snoek C et al (2006) The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Trans Patt Anal Mac Intel 28(10):1678–1689

    Article  Google Scholar 

  57. Stricker MA, Orengo M (1995) Similarity of color images. In: Proc. SPIE, pp 381–392

  58. Sundaram S, Narayanan S (2008) Audio retrieval by latent perceptual indexing. In: ICASSP, IEEE, pp 49–52

  59. Tao L, Asari V (2004) An integrated neighborhood dependent approach for nonlinear enhancement of color images. In: Proc. ITCC, IEEE, vol 2, pp 138–139

  60. Viola M, Jones M, Viola P (2003) Fast multi-view face detection. In: Proc. CVPR, Citeseer

  61. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proc. CVPR, IEEE, vol 1, pp I–511

  62. Wang Y, Liu Z, Huang JC (2000) Multimedia content analysis-using both audio and visual clues. IEEE Signal Proc Mag 17(6):12–36

    Article  Google Scholar 

  63. Wu P, Manjunanth B, Newsam S, Shin H (1999) A texture descriptor for image retrieval and browsing. In: Proc. CBAIVL, pp 3–7

  64. Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proc. CIKM, ACM, pp 102–111

  65. Yoon J, Jayant N (2001) Relevance feedback for semantics based image retrieval. In: Proc. ICIP, vol 1, pp 42–45.

  66. You J, Liu G, Perkis A (2010) A semantic framework for video genre classification and event analysis. Signal Process Imag Commun 25(4):287–302

    Article  Google Scholar 

  67. Zhou XS, Huang TS (2003) Relevance feedback in image retrieval: a comprehensive review. Multimedia Systems 8(6):536–544

    Article  Google Scholar 

  68. Zubari Ü, Ozan E, Acar B, Ciloglu T, Esen E, Ateş T, Önür D (2010) Speech detection on broadcast audio. In: EUSIPCO

  69. Zuo H, Wu O, Hu W, Xu B (2008) Recognition of blue movies by fusion of audio and video. In: Proc. ICME, IEEE, pp 37–40

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Medeni Soysal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soysal, M., Berker Loğoğlu, K., Tekin, M. et al. Multimodal concept detection in broadcast media: KavTan. Multimed Tools Appl 72, 2787–2832 (2014). https://doi.org/10.1007/s11042-013-1564-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1564-z

Keywords

Navigation