Abstract
This chapter examines the challenges and opportunities of Multimedia Information Retrieval and corresponding search engine applications. Computer technology has changed our access to information tremendously: We used to search authors or titles (which we had to know) in library cards in order to locate relevant books; now we can issue keyword searches within the full text of whole book repositories in order to identify authors, titles and locations of relevant books. What about the corresponding challenge of finding multimedia by fragments, examples and excerpts? Rather than asking for a music piece by artist and title, can we hum its tune to find it? Can doctors submit scans of a patient to identify medically similar images of diagnosed cases in a database? Can your mobile phone take a picture of a statue and tell you about its artist and significance via a service that it sends this picture to?
In an attempt to answer some of these questions we get to know basic concepts of multimedia resource discovery technologies for a number of different query and document types: piggy-back text search, i.e., reducing the multimedia to pseudo text documents; automated annotation of visual components; content-based retrieval where the query is an image; and fingerprinting to match near duplicates.
Some of the research challenges are given by the semantic gap between the simple pixel properties computers can readily index and high-level human concepts; related to this is an inherent technological limitation of automated annotation of images from pixels alone. Other challenges are given by polysemy, i.e., the many meanings and interpretations that are inherent in visual material and the corresponding wide range of a user’s information need.
This chapter demonstrates how these challenges can be tackled by automated processing and machine learning and by utilising the skills of the user, for example through browsing or through a process that is called relevance feedback, thus putting the user at centre stage. The latter is made easier by “added value” technologies, exemplified here by summaries of complex multimedia objects such as TV news, information visualisation techniques for document clusters, visual search by example, and methods to create browsable structures within the collection.
This book chapter is an updated re-print of Rüger (2009), Multimedia resource discovery, in Göker and Davies (eds), Information Retrieval: Searching in the 21st Century, pp. 39–62, Wiley, with excerpts from Rüger (2010), Multimedia information retrieval, Lecture notes in the series Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan and Claypool Publishers, http://dx.doi.org/10.2200/S00244ED1V01Y200912ICR010.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
See http://www.mkweb.co.uk/places_to_visit/displayarticle.asp?id=411 accessed Aug 2010.
- 3.
See http://www.oclc.org/worldcat/statistics accessed Aug 2010.
- 4.
See http://flickr.com.
- 5.
- 6.
- 7.
See http://del.icio.us.
- 8.
See http://www.behold.cc.
- 9.
Topic 124 of TRECVid 2003, see http://www-nlpir.nist.gov/projects/tv2003.
- 10.
See http://www.chlt.org.
- 11.
- 12.
See http://del.icio.us.
- 13.
See http://plasma.nationalgeographic.com/map-machine/ as of January 2011.
References
Aggarwal C, Yu P (2000) The IGrid index: Reversing the dimensionality curse for similarity indexing in high dimensional space. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, pp 119–129
Ankerst M, Keim D, Kriegel H (1996) Circle segments: A technique for visually exploring large multidimensional data sets. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-70761, visited on February, 2011
Aslam J, Montague M (2001) Models for metasearch. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 276–284
Au P, Carey M, Sewraz S, Guo Y, Rüger S (2000) New paradigms in information visualisation. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 307–309
Baillie M, Jose JM (2004) An audio-based sports video segmentation and event detection algorithm. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp 110–110
Bainbridge D, Browne P, Cairns P, Rüger S, Xu LQ (2005) Managing the growth of multimedia digital content. ERCIM News: special theme on Multimedia Informatics 16–17
Bartell B, Cottrell G, Belew R (1994) Automatic combination of multiple ranked retrieval systems. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 173–181
Beis J, Lowe D (1997) Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp 1000–1006
Birmingham W, Dannenberg R, Pardo B (2006) Query by humming with the VocalSearch system. Communications of the ACM 49:49–52
Blei DM, Jordan M (2003) Modeling annotated data. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 127–134
Börner K (2000) Visible threads: A smart VR interface to digital libraries. In: Proceedings of the International Symposium on Electronic Imaging: Visual Data Exploration and Analysis, pp 228–237
Campbell I (2000) Interactive evaluation of the ostensive model using a new test collection of images with multiple relevance assessments. Journal of Information Retrieval 2:89–114. http://portal.acm.org/citation.cfm?id=593954.593979, visited on December, 2010
Cano P, Batlle E, Kalker T, Haitsma J (2005) A review of audio fingerprinting. Journal of VLSI Signal Processing 41:271–284
Card S (1996) Visualizing retrieved information: A survey. IEEE Computer Graphics and Applications 16:63–67
Carey M, Heesch D, Rüger S (2003) Info navigator: A visualization interface for document searching and browsing. In: Proceedings of the International Conference on Distributed Multimedia Systems, pp 23–28
Cavallaro A, Ebrahimi T (2004) Interaction between high-level and low-level image analysis for semantic video object extraction. Journal on Applied Signal Processing 786–797
Chawda B, Craft B, Cairns P, Rüger S, Heesch D (2005) Do “attractive things work better”? An exploration of search tool visualisations. In: Proceedings of the Australasian Database Interaction Conference, vol 2, pp 46–51
Christel M, Warmack A (2001) The effect of text in storyboards for video navigation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp 1409–1412
Christel M, Hauptmann A, Warmack A, Crosby S (1999) Adjustable filmstrips and skims as abstractions for a digital video library. In: Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries, pp 98–104
Cockburn A, Savage J (2003) Comparing speed-dependent automatic zooming with traditional scroll and pan and zoom methods. In: Proceedings of the Australasian Database Interaction Conference, pp 87–102
Cox K (1992) Information retrieval by browsing. In: Proceedings of the International Conference on New Information Technology, pp 69–80
Cox K (1995) Searching through browsing. PhD thesis, University of Canberra
Cox I, Miller M, Minka T, Papathomas T, Yianilos P (2000) The Bayesian image retrieval system and PicHunter. IEEE Transactions on Image Processing 9:20–38
Crane G (2005) Perseus digital library project. Tech rep, Tufts University. http://www.perseus.tufts.edu, visited on December, 2010
Croft B, Parenty T (1985) A comparison of a network structure and a database system used for document retrieval. Information Systems 10:377–390
Cunningham H (2002) GATE: a general architecture for text engineering. Computers and the Humanities 36:223–254
Datta R, Joshi D, Li J, Wang J (2008) Image retrieval: Ideas, influences and trends of the new age. ACM Computing Surveys 40:1–60
de Vries A, Mamoulis N, Nes N, Kersten M (2002) Efficient k-nn search on vertically decomposed data. In: Proceedings of the ACM Conference on Management of Data, pp 322–333
Dearholt D, Schvaneveldt R (1990) Properties of Pathfinder networks. In: Schvaneveldt R (ed) Pathfinder Associative Networks: Studies in Knowledge Organization. Norwood, pp 1–30
Doraisamy S, Rüger S (2003) Robust polyphonic music retrieval with n-grams. Journal of Intelligent Information Systems 21:53–70
Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of the European Conference on Computer Vision. Lecture Notes in Computer Science, vol 2353. Springer, Berlin, pp 349–354
Enser P, Sandom C (2002) Retrieval of archival moving imagery—CBIR outside the frame? In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 2383. Springer, Berlin, pp 85–106
Enser P, Sandom C (2003) Towards a comprehensive survey of the semantic gap in visual image retrieval. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, pp 163–168
Feng S, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp 1002–1009
Fowler R, Wilson B, Fowler W (1992) Information navigator: An information system using associative networks for display and retrieval. Tech Rep NAG9-551, 92-1, Department of Computer Science and University of Texas
Hare J, Lewis P (2004) Salient regions for query by image content. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, pp 264–268
Hare J, Lewis P (2005) Saliency-based models of image content and their application to auto-annotation by semantic propagation. http://eprints.ecs.soton.ac.uk/id/eprint/10954, visited on February, 2011
Hare J, Lewis P, Enser P, Sandom C (2006) Mind the gap: Another look at the problem of the semantic gap in image retrieval. In: Multimedia Content Analysis and Management and Retrieval. Lecture Notes in Computer Science, vol 6073, Springer, Berlin, pp 1–12
Heesch D (2005) The NNk technique for image searching and browsing. PhD thesis, Imperial College. London
Heesch D, Rüger S (2003) Relevance feedback for content-based image retrieval: What can three mouse clicks achieve? In: Proceedings of the European Conference on Information Retrieval. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, pp 363–376
Heesch D, Rüger S (2004) Approaching the problem of multi-lingual information retrieval and visualization in Greek and Latin and Old Norse texts. In: Proceedings of the European Conference on Information Retrieval. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, pp 253–266
Heesch D, Pickering M, Rüger S, Yavlinsky A (2003) Video retrieval using search and browsing with key frames. http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/imperial.pdf, visited on February, 2011
Hemmje M, Kunkel C, Willet A (1994) LyberWorld—a visualization user interface supporting fulltext retrieval. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 249–259
Hoffman P, Grinstein G, Pinkney D (1999) Dimensional anchors: A graphic primitive for multidimensional multivariate information visualizations. In: Proceedings of the New Paradigms in Information Visualisation and Manipulation Workshop at ACM Conference on Information and Knowledge Management, pp 9–16
Howarth P, Rüger S (2005) Trading precision for speed: Localised similarity functions. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3568. Springer, Berlin, pp 415–424
Ishikawa Y, Subramanya R, Faloutsos C (1998) MindReader: Querying databases through multiple examples. In: Proceedings of the International Conference on Very Large Databases, pp 218–227
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 119–126
Korfhage R (1991) To see or not to see—is that the query? In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 134–141
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: Neural Information Processing Systems, pp 553–560
Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing and Communications and Applications 2:1–19
Liu C, Yuen J, Torralba A (2009c) Nonparametric scene parsing: Label transfer via dense scene alignment. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp 1972–1979
Magalhães J, Rüger S (2006) Logistic regression of semantic codebooks for semantic image retrieval. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 4071. Springer, Berlin, pp 41–50
Magalhães J, Rüger S (2007) Information-theoretic semantic multimedia indexing. In: Proceedings of the International Conference on Image and Video Retrieval, pp 619–626
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the European Conference on Computer Vision. Lecture Notes in Computer Science, vol 5304. Springer, Berlin, pp 316–329
Metzler D, Manmatha R (2004) An inference network approach to image retrieval. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, pp 42–50
Müller W, Henrich A (2004) Faster exact histogram intersection on large data collections using inverted VA-files. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, pp 455–463
Nene S, Nayar S (1997) A simple algorithm for nearest neighbor search in high dimensions. IEEE Transactions on Pattern Analysis and Machine Intelligence 19:989–1003
Pickering M (2004) Video retrieval and summarisation. PhD thesis, Imperial College London
Pickering M, Wong L, Rüger S (2003) ANSES: Summarisation of news video. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, pp 481–486
Rodden K, Basalaj W, Sinclair D, Wood K (1999) Evaluating a visualization of image similarity. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, pp 36–43
Rüger S (2009) Multimedia resource discovery. In: Göker A, Davies J (eds) Information Retrieval: Searching in the 21st Century. Wiley, New York, NY, pp 39–62
Rüger S (2010) Multimedia Information Retrieval. Morgan & Claypool
Rui Y, Huang T, Mehrotra S (1998) Relevance feedback techniques in interactive content-based image retrieval. In: Multimedia Content Analysis and Management and Retrieval, pp 25–36
Rydberg-Cox J, Vetter L, Rüger S, Heesch D (2004) Approaching the problem of multi-lingual information retrieval and visualization in Greek and Latin and Old Norse texts. In: Proceedings of the European Conference on Digital Libraries. Lecture Notes in Computer Science, vol 3232. Springer, Berlin, pp 168–178
Salway A, Graham M (2003) Extracting information about emotions in films. In: Proceedings of the ACM Conference on Multimedia, pp 299–302
Salway A, Vassiliou A, Ahmad K (2005) What happens in films? http://doi.ieeecomputersociety.org/10.1109/ICME.2005.1521357, visited on February, 2011
Sammon J (1969) A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18:401–409
Santini S, Jain R (2000) Integrated browsing and querying for image databases. IEEE Multimedia 7:26–39
Seo J, Haitsma J, Kalker T, Yoo C (2004) A robust image fingerprinting system using the radon transform. Signal Processing: Image Communication 19:325–339
Shaw J, Fox E (1994) Combination of multiple searches. In: Proceedings of the Text Retrieval Conference, pp 243–252
Shneiderman B, Feldman D, Rose A, Ferré Grau X (2000) Visualizing digital library search results with categorical and hierarchical axes. In: Proceedings of the ACM Conference on Digital Libraries, pp 57–66
Smeaton A, Gurrin C, Lee H, Mc Donald K, Murphy N, O’Connor N, O’Sullivan D, Smyth B, Wilson D (2004) The Físchlár-news-stories system: Personalised access to an archive of TV news. In: Acte de la Conférence sur la Recherche d’Information Assistée par Ordinateur, pp 3–17
Squire D, Müller W, Müller H, Pun T (2000) Content-based query of image databases: Inspirations from text retrieval. Pattern Recognition Letters 21:1193–1198
Tolonen T, Karjalainen M (2000) A computationally efficient multi-pitch analysis model. IEEE Transactions on Speech and Audio Processing 8:708–716
Torralba A, Oliva A (2003) Statistics of natural image categories. Network: Computation in Neural Systems 14:391–412
TRECVid (2003) TREC video retrieval evaluation. http://trecvid.nist.gov/, visited on December, 2010
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10:293–302
van Dongen S (2000) A cluster algorithm for graphs. Tech Rep INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the ACM Conference on Human Factors in Computing Systems, pp 319–326
Voss J (2007) Tagging, folksonomy & Co—Renaissance of manual indexing? Computing Research Repository abs/cs/0701072:1–12
Watts D, Strogatz S (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
Weber R, Stock H, Blott S (1998) A quantitative analysis and performance study for similarity search methods in high-dimensional space. In: Proceedings of the International Conference on Very Large Databases, pp 194–205
Wood M, Campbell N, Thomas B (1998) Iterative refinement by relevance feedback in content-based digital image retrieval. ACM Multimedia 13–20
Yavlinsky A, Pickering M, Heesch D, Rüger S (2004) A comparative study of evidence combination strategies. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp 1040–1043
Yavlinsky A, Schofield E, Rüger S (2005) Automated image annotation using global features and robust nonparametric density estimation. In: Proceedings of the International Conference on Image and Video Retrieval. Lecture Notes in Computer Science, vol 3568. Springer, Berlin, pp 507–517
Acknowledgements
Outlining the paradigms in this chapter and their implementations would not have been possible without the ingenuity, imagination and hard work of Paul Browne, Matthew Carey, Shyamala Doraisamy, Daniel Heesch, Peter Howarth, Suzanne Little, Haiming Liu, Ainhoa Llorente, João Magalhães, Alexander May, Simon Overell, Marcus Pickering, Adam Rae, Edward Schofield, Shalini Sewraz, Dawei Song, Lawrence Wong and Alexei Yavlinsky.
Credits
The photograph in Fig. 7.1 (Milton Keynes Peace pagoda) by Stefan Rüger, July 2007, was first published in Rüger (2010). Figure 7.2 is a mock-up based on the existing üBase search engine, see Fig. 7.14, with modifications by Peter Devine and was previously published in Rüger (2010). Figure 7.3 (new search engine types) was designed by Peter Devine and published in Rüger (2010). Figures 7.5, 7.6 and 7.9 use royalty-free images from Corel Gallery 380,000, © Corel Corporation, all rights reserved. Figure 7.7 (Behold) by Alexei Yavlinsky are screenshots from http://photo.beholdsearch.com, 19 July 2007, now http://www.behold.cc with thumbnails of creative-commons Flickr images. The photograph in Fig. 7.8 © by Stefan Rüger, taken May 1996 in the Nord Jyllands Kunstmuseum, Ålborg. Figures 7.8 and 7.9 were published in Rüger (2010). The screenshots in Figs. 7.10–7.14 and 7.16–7.18 are reproduced courtesy of © Imperial College London. The ANSES system in Fig. 7.10 was originally designed by Marcus Pickering and later modified by Lawrence Wong; the images and part of the text displayed in the screenshot of Fig. 7.10 were recorded from British Broadcasting Corporation (BBC), http://www.bbc.co.uk. The Sammon map in Fig. 7.11 and the radial visualisation in Fig. 7.13 were designed by Matthew Carey. The Dendro map in Fig. 7.12 was designed by Daniel Heesch. The üBase system depicted in the screenshots of Figs. 7.14(a), 7.14(b) and 7.16(a) was designed by Alexander May. The images used within the screenshot of Fig. 7.14 and within the illustration of Fig. 7.15 were reproduced from Corel Gallery 380,000, © Corel Corporation, all rights reserved. The images in the (partial) screenshots of Figs. 7.16 and 7.17 were reproduced from TREC Video Retrieval Evaluation 2003 (TRECVid), http://www-nlpir.nist.gov/projects. The geotemporal browsing screenshot in Fig. 7.18 was created by Simon Overell.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rüger, S. (2011). Multimedia Resource Discovery. In: Melucci, M., Baeza-Yates, R. (eds) Advanced Topics in Information Retrieval. The Information Retrieval Series, vol 33. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20946-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-20946-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20945-1
Online ISBN: 978-3-642-20946-8
eBook Packages: Computer ScienceComputer Science (R0)