Skip to main content

Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation

  • Conference paper
Image and Video Retrieval (CIVR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3568))

Included in the following conference series:

Abstract

Recent research in video analysis has shown a promising direction, in which mid-level features (e.g., people, anchor, indoor) are abstracted from low-level features (e.g., color, texture, motion, etc.) and used for discriminative classification of semantic labels. However, in most systems, such mid-level features are selected manually. In this paper, we propose an information-theoretic framework, visual cue cluster construction (VC3), to automatically discover adequate mid-level features. The problem is posed as mutual information maximization, through which optimal cue clusters are discovered to preserve the highest information about the semantic labels. We extend the Information Bottleneck framework to high-dimensional continuous features and further propose a projection method to map each video into probabilistic memberships over all the cue clusters. The biggest advantage of the proposed approach is to remove the dependence on the manual process in choosing the mid-level features and the huge labor cost involved in annotating the training corpus for training the detector of each mid-level feature. The proposed VC3 framework is general and effective, leading to exciting potential in solving other problems of semantic video analysis. When tested in news video story segmentation, the proposed approach achieves promising performance gain over representations derived from conventional clustering techniques and even the mid-level features selected manually.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chaisorn, L., Chua, T.S., Koh, C.K., Zhao, Y., Xu, H., Feng, H., Tian, Q.: A two-level multi-modal approach for story segmentation of large news video corpus. In: TRECVID Workshop, Washington DC (2003)

    Google Scholar 

  2. Amir, A., Berg, M., Chang, S.F., Iyengar, G., Lin, C.Y., Natsev, A., Neti, C., Nock, H., Naphade, M., Hsu, W., Smith, J.R., Tseng, B., Wu, Y., Zhang, D.: IBM research trecvid 2003 video retrieval system. In: TRECVID 2003 Workshop (2003)

    Google Scholar 

  3. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2001)

    MATH  Google Scholar 

  4. Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: 25th ACM intermational Conference on Research and Development of Information Retireval (2002)

    Google Scholar 

  5. Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: Neural Information Processing Systems, NIPS (1999)

    Google Scholar 

  6. Gordon, S., Greenspan, H., Goldberger, J.: Applying the information bottleneck principle to unsupervised clustering of discrete and continuous image representations. In: International Conference on Computer Vision (2003)

    Google Scholar 

  7. Hsu, W., Kennedy, L., Chang, S.F., Franz, M., Smith, J.: Columbia-IBM news video story segmentation in trecvid 2004. (Technical Report ADVENT #207-2005-3)

    Google Scholar 

  8. Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley-Interscience, Hoboken (1992)

    Book  MATH  Google Scholar 

  9. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  10. Hsu, W., Chang, S.F.: Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation. In: IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan (2004)

    Google Scholar 

  11. Hsu, W., Chang, S.F., Huang, C.W., Kennedy, L., Lin, C.Y., Iyengar, G.: Discovery and fusion of salient multi-modal features towards news story segmentation. In: IS&T/SPIE Electronic Imaging, San Jose, CA (2004)

    Google Scholar 

  12. France, V., Hlavac, V.: Statistical pattern recognition toolbox for matlab. Technical report, Czech Technical University (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hsu, W.H., Chang, SF. (2005). Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation. In: Leow, WK., Lew, M.S., Chua, TS., Ma, WY., Chaisorn, L., Bakker, E.M. (eds) Image and Video Retrieval. CIVR 2005. Lecture Notes in Computer Science, vol 3568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11526346_12

Download citation

  • DOI: https://doi.org/10.1007/11526346_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27858-0

  • Online ISBN: 978-3-540-31678-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics