Skip to main content

Multi-modal Solution for Unconstrained News Story Retrieval

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7131))

Abstract

We propose a multi-modal approach to retrieve associated news stories sharing the same main topic. In the textual domain, we utilize Automatic Speech Recognition (ASR) and refined Optical Character Recognition (OCR) transcripts while in the visual domain we employ a Near Duplicate Keyframe detection method to identify stories with common visual clues. In addition, we adopt another visual representation namely semantic signature, indicating pre-defined semantic concepts included in the news story, to improve the discriminativness of visual modality. We propose a query-class weighting scheme to integrate the retrieval outcomes gained from visual modalities. Experimental results show the distinguishing power of the enhanced representation in individual modalities and the superiority of our fusion approach performance compared to existing strategies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16(6), 345–379 (2010)

    Article  Google Scholar 

  2. Das, D., Chen, D., Hauptmann, A.G.: Improving multimedia retrieval with a video ocr. In: Gevers, T., Jain, R.C., Santini, S. (eds.) Society of Photo-Optical Instrumentation Engineers (SIPE) Conference, vol. 6820, p. 68200B. SPIE (January 2008)

    Google Scholar 

  3. Hauptmann, A.G., Jin, R., Ng, T.D.: Multi-modal information retrieval from broadcast video using ocr and speech recognition. In: JCDL 2002, pp. 160–161. ACM (July 2002)

    Google Scholar 

  4. http://aspell.net (last visited August 2010)

  5. http://jocr.sourceforge.net (last visited August 2010)

  6. http://www-nlpir.nist.gov/projects/tv2006/tv2006.html (last visited August 2010)

  7. Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia 12(1), 42–53 (2009)

    Article  Google Scholar 

  8. Rice, J.A.: Mathematical Statistic and Data Analysis, 3rd edn. Duxbury, Belmont (2007)

    Google Scholar 

  9. Xie, L., Natsev, A., Testic, J.: Dynamic multimodal fusion in video search. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1499–1502 (July 2007)

    Google Scholar 

  10. Yan, R., Hauptmann, A.G.: Probabilistic latent query analysis for combining multiple retrieval sources. In: SIGIR 2006, pp. 324–331. ACM (August 2006)

    Google Scholar 

  11. Yan, R., Yang, J., Hauptmann, A.G.: Learning query-class dependent weights in automatic video retrieval. In: ACM MM 2004, pp. 548–555. ACM (2004)

    Google Scholar 

  12. Zhao, W.-L., Ngo, C.-W.: Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Transactions on Image Processing 18, 412–423 (2009)

    Article  Google Scholar 

  13. Zheng, Y., Duan, L., Tian, Q., Jin, J.: Tv commercial classification by using multi-modal textual information. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 497–500 (July 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Younessian, E., Rajan, D. (2012). Multi-modal Solution for Unconstrained News Story Retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, CW., Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27355-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27355-1_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27354-4

  • Online ISBN: 978-3-642-27355-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics