Multimedia Resource Discovery

Rüger, Stefan

doi:10.1007/978-3-642-20946-8_7

Stefan Rüger³

Part of the book series: The Information Retrieval Series ((INRE,volume 33))

1877 Accesses

Abstract

This chapter examines the challenges and opportunities of Multimedia Information Retrieval and corresponding search engine applications. Computer technology has changed our access to information tremendously: We used to search authors or titles (which we had to know) in library cards in order to locate relevant books; now we can issue keyword searches within the full text of whole book repositories in order to identify authors, titles and locations of relevant books. What about the corresponding challenge of finding multimedia by fragments, examples and excerpts? Rather than asking for a music piece by artist and title, can we hum its tune to find it? Can doctors submit scans of a patient to identify medically similar images of diagnosed cases in a database? Can your mobile phone take a picture of a statue and tell you about its artist and significance via a service that it sends this picture to?

In an attempt to answer some of these questions we get to know basic concepts of multimedia resource discovery technologies for a number of different query and document types: piggy-back text search, i.e., reducing the multimedia to pseudo text documents; automated annotation of visual components; content-based retrieval where the query is an image; and fingerprinting to match near duplicates.

Some of the research challenges are given by the semantic gap between the simple pixel properties computers can readily index and high-level human concepts; related to this is an inherent technological limitation of automated annotation of images from pixels alone. Other challenges are given by polysemy, i.e., the many meanings and interpretations that are inherent in visual material and the corresponding wide range of a user’s information need.

This chapter demonstrates how these challenges can be tackled by automated processing and machine learning and by utilising the skills of the user, for example through browsing or through a process that is called relevance feedback, thus putting the user at centre stage. The latter is made easier by “added value” technologies, exemplified here by summaries of complex multimedia objects such as TV news, information visualisation techniques for document clusters, visual search by example, and methods to create browsable structures within the collection.

This book chapter is an updated re-print of Rüger (2009), Multimedia resource discovery, in Göker and Davies (eds), Information Retrieval: Searching in the 21st Century, pp. 39–62, Wiley, with excerpts from Rüger (2010), Multimedia information retrieval, Lecture notes in the series Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan and Claypool Publishers, http://dx.doi.org/10.2200/S00244ED1V01Y200912ICR010.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See http://dir.yahoo.com/.
2.
See http://www.mkweb.co.uk/places_to_visit/displayarticle.asp?id=411 accessed Aug 2010.
3.
See http://www.oclc.org/worldcat/statistics accessed Aug 2010.
4.
See http://flickr.com.
5.
See http://www.youtube.com.
6.
See http://www.apple.com/itunes.
7.
See http://del.icio.us.
8.
See http://www.behold.cc.
9.
Topic 124 of TRECVid 2003, see http://www-nlpir.nist.gov/projects/tv2003.
10.
See http://www.chlt.org.
11.
See http://www.flickr.com.
12.
See http://del.icio.us.
13.
See http://plasma.nationalgeographic.com/map-machine/ as of January 2011.

References

Aggarwal C, Yu P (2000) The IGrid index: Reversing the dimensionality curse for similarity indexing in high dimensional space. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, pp 119–129
Chapter Google Scholar
Ankerst M, Keim D, Kriegel H (1996) Circle segments: A technique for visually exploring large multidimensional data sets. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-70761, visited on February, 2011
Aslam J, Montague M (2001) Models for metasearch. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 276–284
Google Scholar
Au P, Carey M, Sewraz S, Guo Y, Rüger S (2000) New paradigms in information visualisation. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 307–309
Google Scholar
Baillie M, Jose JM (2004) An audio-based sports video segmentation and event detection algorithm. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp 110–110
Google Scholar
Bainbridge D, Browne P, Cairns P, Rüger S, Xu LQ (2005) Managing the growth of multimedia digital content. ERCIM News: special theme on Multimedia Informatics 16–17
Google Scholar
Bartell B, Cottrell G, Belew R (1994) Automatic combination of multiple ranked retrieval systems. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 173–181
Google Scholar
Beis J, Lowe D (1997) Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp 1000–1006
Google Scholar
Birmingham W, Dannenberg R, Pardo B (2006) Query by humming with the VocalSearch system. Communications of the ACM 49:49–52
Article Google Scholar
Blei DM, Jordan M (2003) Modeling annotated data. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 127–134
Google Scholar
Börner K (2000) Visible threads: A smart VR interface to digital libraries. In: Proceedings of the International Symposium on Electronic Imaging: Visual Data Exploration and Analysis, pp 228–237
Google Scholar
Campbell I (2000) Interactive evaluation of the ostensive model using a new test collection of images with multiple relevance assessments. Journal of Information Retrieval 2:89–114. http://portal.acm.org/citation.cfm?id=593954.593979, visited on December, 2010
Article Google Scholar
Cano P, Batlle E, Kalker T, Haitsma J (2005) A review of audio fingerprinting. Journal of VLSI Signal Processing 41:271–284
Article Google Scholar
Card S (1996) Visualizing retrieved information: A survey. IEEE Computer Graphics and Applications 16:63–67
Article Google Scholar
Carey M, Heesch D, Rüger S (2003) Info navigator: A visualization interface for document searching and browsing. In: Proceedings of the International Conference on Distributed Multimedia Systems, pp 23–28
Google Scholar
Cavallaro A, Ebrahimi T (2004) Interaction between high-level and low-level image analysis for semantic video object extraction. Journal on Applied Signal Processing 786–797
Google Scholar
Chawda B, Craft B, Cairns P, Rüger S, Heesch D (2005) Do “attractive things work better”? An exploration of search tool visualisations. In: Proceedings of the Australasian Database Interaction Conference, vol 2, pp 46–51
Google Scholar
Christel M, Warmack A (2001) The effect of text in storyboards for video navigation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp 1409–1412
Google Scholar
Christel M, Hauptmann A, Warmack A, Crosby S (1999) Adjustable filmstrips and skims as abstractions for a digital video library. In: Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries, pp 98–104
Chapter Google Scholar
Cockburn A, Savage J (2003) Comparing speed-dependent automatic zooming with traditional scroll and pan and zoom methods. In: Proceedings of the Australasian Database Interaction Conference, pp 87–102
Google Scholar
Cox K (1992) Information retrieval by browsing. In: Proceedings of the International Conference on New Information Technology, pp 69–80
Google Scholar
Cox K (1995) Searching through browsing. PhD thesis, University of Canberra
Google Scholar
Cox I, Miller M, Minka T, Papathomas T, Yianilos P (2000) The Bayesian image retrieval system and PicHunter. IEEE Transactions on Image Processing 9:20–38
Article Google Scholar
Crane G (2005) Perseus digital library project. Tech rep, Tufts University. http://www.perseus.tufts.edu, visited on December, 2010
Croft B, Parenty T (1985) A comparison of a network structure and a database system used for document retrieval. Information Systems 10:377–390
Article Google Scholar
Cunningham H (2002) GATE: a general architecture for text engineering. Computers and the Humanities 36:223–254
Article Google Scholar
Datta R, Joshi D, Li J, Wang J (2008) Image retrieval: Ideas, influences and trends of the new age. ACM Computing Surveys 40:1–60
Article Google Scholar
de Vries A, Mamoulis N, Nes N, Kersten M (2002) Efficient k-nn search on vertically decomposed data. In: Proceedings of the ACM Conference on Management of Data, pp 322–333
Google Scholar
Dearholt D, Schvaneveldt R (1990) Properties of Pathfinder networks. In: Schvaneveldt R (ed) Pathfinder Associative Networks: Studies in Knowledge Organization. Norwood, pp 1–30
Google Scholar
Doraisamy S, Rüger S (2003) Robust polyphonic music retrieval with n-grams. Journal of Intelligent Information Systems 21:53–70
Article Google Scholar
Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of the European Conference on Computer Vision. Lecture Notes in Computer Science, vol 2353. Springer, Berlin, pp 349–354
Google Scholar
Enser P, Sandom C (2002) Retrieval of archival moving imagery—CBIR outside the frame? In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 2383. Springer, Berlin, pp 85–106
Google Scholar
Enser P, Sandom C (2003) Towards a comprehensive survey of the semantic gap in visual image retrieval. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, pp 163–168
Google Scholar
Feng S, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp 1002–1009
Google Scholar
Fowler R, Wilson B, Fowler W (1992) Information navigator: An information system using associative networks for display and retrieval. Tech Rep NAG9-551, 92-1, Department of Computer Science and University of Texas
Google Scholar
Hare J, Lewis P (2004) Salient regions for query by image content. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, pp 264–268
Google Scholar
Hare J, Lewis P (2005) Saliency-based models of image content and their application to auto-annotation by semantic propagation. http://eprints.ecs.soton.ac.uk/id/eprint/10954, visited on February, 2011
Hare J, Lewis P, Enser P, Sandom C (2006) Mind the gap: Another look at the problem of the semantic gap in image retrieval. In: Multimedia Content Analysis and Management and Retrieval. Lecture Notes in Computer Science, vol 6073, Springer, Berlin, pp 1–12
Google Scholar
Heesch D (2005) The NN^k technique for image searching and browsing. PhD thesis, Imperial College. London
Google Scholar
Heesch D, Rüger S (2003) Relevance feedback for content-based image retrieval: What can three mouse clicks achieve? In: Proceedings of the European Conference on Information Retrieval. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, pp 363–376
Google Scholar
Heesch D, Rüger S (2004) Approaching the problem of multi-lingual information retrieval and visualization in Greek and Latin and Old Norse texts. In: Proceedings of the European Conference on Information Retrieval. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, pp 253–266
Google Scholar
Heesch D, Pickering M, Rüger S, Yavlinsky A (2003) Video retrieval using search and browsing with key frames. http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/imperial.pdf, visited on February, 2011
Hemmje M, Kunkel C, Willet A (1994) LyberWorld—a visualization user interface supporting fulltext retrieval. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 249–259
Google Scholar
Hoffman P, Grinstein G, Pinkney D (1999) Dimensional anchors: A graphic primitive for multidimensional multivariate information visualizations. In: Proceedings of the New Paradigms in Information Visualisation and Manipulation Workshop at ACM Conference on Information and Knowledge Management, pp 9–16
Google Scholar
Howarth P, Rüger S (2005) Trading precision for speed: Localised similarity functions. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3568. Springer, Berlin, pp 415–424
Chapter Google Scholar
Ishikawa Y, Subramanya R, Faloutsos C (1998) MindReader: Querying databases through multiple examples. In: Proceedings of the International Conference on Very Large Databases, pp 218–227
Google Scholar
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 119–126
Google Scholar
Korfhage R (1991) To see or not to see—is that the query? In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 134–141
Google Scholar
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: Neural Information Processing Systems, pp 553–560
Google Scholar
Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing and Communications and Applications 2:1–19
Article Google Scholar
Liu C, Yuen J, Torralba A (2009c) Nonparametric scene parsing: Label transfer via dense scene alignment. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp 1972–1979
Google Scholar
Magalhães J, Rüger S (2006) Logistic regression of semantic codebooks for semantic image retrieval. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 4071. Springer, Berlin, pp 41–50
Chapter Google Scholar
Magalhães J, Rüger S (2007) Information-theoretic semantic multimedia indexing. In: Proceedings of the International Conference on Image and Video Retrieval, pp 619–626
Google Scholar
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the European Conference on Computer Vision. Lecture Notes in Computer Science, vol 5304. Springer, Berlin, pp 316–329
Google Scholar
Metzler D, Manmatha R (2004) An inference network approach to image retrieval. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, pp 42–50
Chapter Google Scholar
Müller W, Henrich A (2004) Faster exact histogram intersection on large data collections using inverted VA-files. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, pp 455–463
Chapter Google Scholar
Nene S, Nayar S (1997) A simple algorithm for nearest neighbor search in high dimensions. IEEE Transactions on Pattern Analysis and Machine Intelligence 19:989–1003
Article Google Scholar
Pickering M (2004) Video retrieval and summarisation. PhD thesis, Imperial College London
Google Scholar
Pickering M, Wong L, Rüger S (2003) ANSES: Summarisation of news video. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, pp 481–486
Google Scholar
Rodden K, Basalaj W, Sinclair D, Wood K (1999) Evaluating a visualization of image similarity. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 36–43
Google Scholar
Rüger S (2009) Multimedia resource discovery. In: Göker A, Davies J (eds) Information Retrieval: Searching in the 21st Century. Wiley, New York, NY, pp 39–62
Google Scholar
Rüger S (2010) Multimedia Information Retrieval. Morgan & Claypool
Google Scholar
Rui Y, Huang T, Mehrotra S (1998) Relevance feedback techniques in interactive content-based image retrieval. In: Multimedia Content Analysis and Management and Retrieval, pp 25–36
Google Scholar
Rydberg-Cox J, Vetter L, Rüger S, Heesch D (2004) Approaching the problem of multi-lingual information retrieval and visualization in Greek and Latin and Old Norse texts. In: Proceedings of the European Conference on Digital Libraries. Lecture Notes in Computer Science, vol 3232. Springer, Berlin, pp 168–178
Google Scholar
Salway A, Graham M (2003) Extracting information about emotions in films. In: Proceedings of the ACM Conference on Multimedia, pp 299–302
Google Scholar
Salway A, Vassiliou A, Ahmad K (2005) What happens in films? http://doi.ieeecomputersociety.org/10.1109/ICME.2005.1521357, visited on February, 2011
Sammon J (1969) A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18:401–409
Article Google Scholar
Santini S, Jain R (2000) Integrated browsing and querying for image databases. IEEE Multimedia 7:26–39
Article Google Scholar
Seo J, Haitsma J, Kalker T, Yoo C (2004) A robust image fingerprinting system using the radon transform. Signal Processing: Image Communication 19:325–339
Article Google Scholar
Shaw J, Fox E (1994) Combination of multiple searches. In: Proceedings of the Text Retrieval Conference, pp 243–252
Google Scholar
Shneiderman B, Feldman D, Rose A, Ferré Grau X (2000) Visualizing digital library search results with categorical and hierarchical axes. In: Proceedings of the ACM Conference on Digital Libraries, pp 57–66
Chapter Google Scholar
Smeaton A, Gurrin C, Lee H, Mc Donald K, Murphy N, O’Connor N, O’Sullivan D, Smyth B, Wilson D (2004) The Físchlár-news-stories system: Personalised access to an archive of TV news. In: Acte de la Conférence sur la Recherche d’Information Assistée par Ordinateur, pp 3–17
Google Scholar
Squire D, Müller W, Müller H, Pun T (2000) Content-based query of image databases: Inspirations from text retrieval. Pattern Recognition Letters 21:1193–1198
Article MATH Google Scholar
Tolonen T, Karjalainen M (2000) A computationally efficient multi-pitch analysis model. IEEE Transactions on Speech and Audio Processing 8:708–716
Article Google Scholar
Torralba A, Oliva A (2003) Statistics of natural image categories. Network: Computation in Neural Systems 14:391–412
Article Google Scholar
TRECVid (2003) TREC video retrieval evaluation. http://trecvid.nist.gov/, visited on December, 2010
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10:293–302
Article Google Scholar
van Dongen S (2000) A cluster algorithm for graphs. Tech Rep INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands
Google Scholar
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the ACM Conference on Human Factors in Computing Systems, pp 319–326
Chapter Google Scholar
Voss J (2007) Tagging, folksonomy & Co—Renaissance of manual indexing? Computing Research Repository abs/cs/0701072:1–12
Google Scholar
Watts D, Strogatz S (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
Article Google Scholar
Weber R, Stock H, Blott S (1998) A quantitative analysis and performance study for similarity search methods in high-dimensional space. In: Proceedings of the International Conference on Very Large Databases, pp 194–205
Google Scholar
Wood M, Campbell N, Thomas B (1998) Iterative refinement by relevance feedback in content-based digital image retrieval. ACM Multimedia 13–20
Google Scholar
Yavlinsky A, Pickering M, Heesch D, Rüger S (2004) A comparative study of evidence combination strategies. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp 1040–1043
Google Scholar
Yavlinsky A, Schofield E, Rüger S (2005) Automated image annotation using global features and robust nonparametric density estimation. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3568. Springer, Berlin, pp 507–517
Chapter Google Scholar

Download references

Acknowledgements

Outlining the paradigms in this chapter and their implementations would not have been possible without the ingenuity, imagination and hard work of Paul Browne, Matthew Carey, Shyamala Doraisamy, Daniel Heesch, Peter Howarth, Suzanne Little, Haiming Liu, Ainhoa Llorente, João Magalhães, Alexander May, Simon Overell, Marcus Pickering, Adam Rae, Edward Schofield, Shalini Sewraz, Dawei Song, Lawrence Wong and Alexei Yavlinsky.

Credits

The photograph in Fig. 7.1 (Milton Keynes Peace pagoda) by Stefan Rüger, July 2007, was first published in Rüger (2010). Figure 7.2 is a mock-up based on the existing üBase search engine, see Fig. 7.14, with modifications by Peter Devine and was previously published in Rüger (2010). Figure 7.3 (new search engine types) was designed by Peter Devine and published in Rüger (2010). Figures 7.5, 7.6 and 7.9 use royalty-free images from Corel Gallery 380,000, © Corel Corporation, all rights reserved. Figure 7.7 (Behold) by Alexei Yavlinsky are screenshots from http://photo.beholdsearch.com, 19 July 2007, now http://www.behold.cc with thumbnails of creative-commons Flickr images. The photograph in Fig. 7.8 © by Stefan Rüger, taken May 1996 in the Nord Jyllands Kunstmuseum, Ålborg. Figures 7.8 and 7.9 were published in Rüger (2010). The screenshots in Figs. 7.10–7.14 and 7.16–7.18 are reproduced courtesy of © Imperial College London. The ANSES system in Fig. 7.10 was originally designed by Marcus Pickering and later modified by Lawrence Wong; the images and part of the text displayed in the screenshot of Fig. 7.10 were recorded from British Broadcasting Corporation (BBC), http://www.bbc.co.uk. The Sammon map in Fig. 7.11 and the radial visualisation in Fig. 7.13 were designed by Matthew Carey. The Dendro map in Fig. 7.12 was designed by Daniel Heesch. The üBase system depicted in the screenshots of Figs. 7.14(a), 7.14(b) and 7.16(a) was designed by Alexander May. The images used within the screenshot of Fig. 7.14 and within the illustration of Fig. 7.15 were reproduced from Corel Gallery 380,000, © Corel Corporation, all rights reserved. The images in the (partial) screenshots of Figs. 7.16 and 7.17 were reproduced from TREC Video Retrieval Evaluation 2003 (TRECVid), http://www-nlpir.nist.gov/projects. The geotemporal browsing screenshot in Fig. 7.18 was created by Simon Overell.

Author information

Authors and Affiliations

Knowledge Media Institute, The Open University, Milton Keynes, MK7 6AA, United Kingdom
Stefan Rüger

Authors

Stefan Rüger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Rüger .

Editor information

Editors and Affiliations

, Department of Information Engineering, University of Padua, Via G. Gradenigo, 6, Padua, 35131, Italy
Massimo Melucci
Yahoo! Research Barcelona, Ocata 1, Barcelona, 08003, Spain
Ricardo Baeza-Yates

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rüger, S. (2011). Multimedia Resource Discovery. In: Melucci, M., Baeza-Yates, R. (eds) Advanced Topics in Information Retrieval. The Information Retrieval Series, vol 33. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20946-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-20946-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20945-1
Online ISBN: 978-3-642-20946-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics