Abstract
This paper presents a framework for multimodal retrieval with relevance feedback based on genetic programming. In this supervised learning-to-rank framework, genetic programming is used for the discovery of effective combination functions of (multimodal) similarity measures using the information obtained throughout the user relevance feedback iterations. With these new functions, several similarity measures, including those extracted from different modalities (e.g., text, and content), are combined into one single measure that properly encodes the user preferences. This framework was instantiated for multimodal image retrieval using visual and textual features and was validated using two image collections, one from the Washington University and another from the ImageCLEF Photographic Retrieval Task. For this image retrieval instance several multimodal relevance feedback techniques were implemented and evaluated. The proposed approach has produced statistically significant better results for multimodal retrieval over single modality approaches and superior effectiveness when compared to the best submissions of the ImageCLEF Photographic Retrieval Task 2008.
Similar content being viewed by others
Notes
http://www.cs.washington.edu/research/imagedatabase/groundtruth (as of 11/16/2011).
http://snowball.tartarus.org/algorithms/english/stemmer.html (as of 11/16/2011).
http://trec.nist.gov/trec_eval/index.html (as of 11/16/2011).
We also did not compute the C20 and F1 measures because the information about the subtopics for each image was not available for this collection.
http://www.imageclef.org/2008/results-photo (as of 11/16/2011).
References
Agrawal R, Grosky W, Fotouhi F (2006) Image retrieval using multimodal keywords. In: ISM ’06: Proceedings of the eighth IEEE international symposium on multimedia. Washington, DC, USA, pp 817–822. doi:10.1109/ISM.2006.91
Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders JM (2008) Xrce’s participation to imageclef 2008. In: Working notes for the CLEF 2008 workshop
Atrey P, Hossain M, Saddik AE, Kankanhalli M (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16:1–35. doi:10.1007/s00530-010-0182-0
Baeza-Yates RA, Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co, Inc, Boston, MA, USA
Banzhaf W, Nordin P, Keller R, Francone F (1998) Genetic programming—an introduction. Morgan Kaufmann Publishers, Inc, San Francisco, CA
Bhanu B, Lin Y (2004) Object detection in multi-modal images using genetic programming. Appl Soft Comput 4(2):175–201
Bottoni P, Ferri F, Grifoni P, Marcante A, Mussio P, Padula M, Reggiori A (2009) e-document management in situated interactivity: the wil approach. Univers Access Inf Soc 8:137–153. doi:10.1007/s10209-008-0142-z, URL:http://dl.acm.org/citation.cfm?id=1613120.1613126
Bruno E, Kludas J, Marchand-Maillet S (2007) Combining multimodal preferences for multimedia information retrieval. In: MIR ’07: proceedings of the international workshop on workshop on multimedia information retrieval. New York, NY, USA, pp 71–78. doi:10.1145/1290082.1290095
Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’04. ACM, New York, NY, USA, pp 25–32. doi:10.1145/1008992.1009000
Caschera MC, D’Ulizia A (2007) Information extraction based on personalization and contextualization models for multimodal data. In: Proceedings of the 18th international conference on database and expert systems applications. IEEE Computer Society, Washington, DC, USA, pp 114–118. doi:10.1109/DEXA.2007.89, URL:http://dl.acm.org/citation.cfm?id=1302492.1302591
Chai JY, Hong P, Zhou MX (2004) A probabilistic approach to reference resolution in multimodal user interfaces. In: Proceedings of the 9th international conference on intelligent user interfaces, IUI ’04. ACM, New York, NY, USA, pp 70–77. doi:10.1145/964442.964457
Clinchant S, Csurka1 G, Ah-Pine J, Jacquet G, Perronnin F, Sánchez J, Minoukadeh K (2010) Xrce’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of imageclef 2010. In: CLEF (Notebook Papers/LABs/Workshops)
Clough P, Grubinger M, Deselaers T, Hanbury A, Mller H (2007) Overview of the ImageCLEF 2006 photographic retrieval and object annotation tasks. In: Evaluation of multilingual and multi-modal information retrieval. Lecture notes in computer science, vol 4730. Springer Berlin / Heidelberg, pp 579–594. doi:10.1007/978-3-540-74999-8_71, URL:http://www.springerlink.com/content/e081998770x6566p
Coelho TAS, Calado PP, Souza LV, Ribeiro-Neto B, Muntz R (2004) Image retrieval using multiple evidence ranking. IEEE Trans Knowl Data Eng 16(4):408–417. doi:10.1109/TKDE.2004.1269666
Cooke T, Jkel F, Wallraven C, Blthoff HH (2007) Multimodal similarity and categorization of novel, three-dimensional objects. Neuropsychologia 45(3):484–495. http://www.ncbi.nlm.nih.gov/pubmed/16580027
Corradini A, Mehta M, Bernsen NO, Martin JC, Abrilian S (2003) Multimodal input fusion in humancomputer interaction on the example of the on-going nice project. In: Proceedings of the NATO-ASI conference on data fusion for situation monitoring, incident detection, alert and response management
Deb S, Zhang Y (2004) An overview of content-based image retrieval techniques. In: Proceedings of the 18th international conference on advanced information networking and applications, vol 1, pp 59–64
Dorairaj R, Namuduri K (2004) Compact combination of MPEG-7 color and texture descriptors for image retrieval. In: Conference record of the thirty-eighth asilomar conference on signals, systems and computers, vol 1, pp 387–391
D’Ulizia A, Ferri F, Grifoni P (2010) Generating multimodal grammars for multimodal dialogue processing. Trans Sys Man Cyber Part A 40:1130–1145. doi:10.1109/TSMCA.2010.2041227
Equitz W, Niblack W (1994) Retrieving images from a database using texture-algorithms from the QBIC system. IBM Research Report Technical Report RJ 9805, IBM
Fan W, Fox EA, Pathak P, Wu H (2004) The effects of fitness functions on genetic programming-based ranking discovery for Web search. J Am Soc Inf Sci Technol 55(7):628–636
Ferecatu M, Sahbi H (2008) Telecom paristech at imageclefphoto 2008: bi-modal text and image retrieval with diversity enhancement. In: Working notes for the CLEF 2008 workshop
Ferreira CD, dos Santos JA, da Silva Torres R, Gonçalves MA, Rezende RC, Fan W (2011) Relevance feedback based on genetic programming for image retrieval. Pattern Recogn Lett 32(1):27–37
Ferri F, Grifoni P, Padula M (2002) Using shape to index and query Web document contents. J Vis Lang Comput 13(4):355–373. doi:10.1006/jvlc.2002.0221, URL:http://www.sciencedirect.com/science/article/pii/S1045926X02902211
Flickner M, Sawhney H, Niblack W, Ashley JQH, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. Computer 28(9):23–32
Freitas RB, da Silva Torres R (2005) OntoSAIA: Um ambiente Baseado em Ontologias para Recuperao e Anotao Semi-Automtica de Imagens. In: Proceedings of primeiro workshop de bibliotecas digitais, Simpsio Brasileiro de Banco de Dados, pp 60–79. Uberlandia, MG, Brazil
Grubinger M, Clough P, Hanbury A, Mller H (2008) Overview of the ImageCLEFphoto 2007 photographic retrieval task. In: Advances in multilingual and multimodal information retrieval. Lecture notes in computer science, vol 5152. Springer Berlin / Heidelberg, pp 433–444. doi:10.1007/978-3-540-85760-0_57, URL:http://www.springerlink.com/content/p4u1737885747w75
Harman D (1992) Relevance feedback revisited. In: Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval. Copenhagen, Denmark, pp 1–10. doi:10.1145/133160.133167
Huang C, Liu Q (2007) An orientation independent texture descriptor for image retireval. In: International conference on computational science, pp 772–776
Huang J, Kumar R, Mitra M, Zhu W, Zabih R (1997) Image indexing using color correlograms. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, pp 762–768
Jiang W, Er G, Dai Q, Gu J (2005) Hidden annotation for image retrieval with long-term relevance feedback learning. Pattern Recogn 38(11):2007–2021
Johnston M, Bangalore S (2005) Finite-state multimodal integration and understanding. Nat Lang Eng 11:159–187. doi:10.1017/S1351324904003572, URL:http://dl.acm.org/citation.cfm?id=1064781.1064784
Kak A, Pavlopoulou C (2002) Content-based image retrieval from large medical databases. In: First international symposium on 3D data processing visualization and transmission, vol 10(1), pp 138–147
Kim DH, Chung CW, Barnard K (2005) Relevance feedback using adaptive clustering for image similarity retrieval. J Syst Softw 78(1):9–23
Kovaćević A, Milosavljevć B, Konjović Z, Vidaković M (2010) Adaptive content-based music retrieval system. Multimed Tools Appl 47:525–544. doi:10.1007/s11042-009-0336-2
Kovalev V, Volmer S (1998) Color co-occurence descriptors for querying-by-example. In: Proceedings of the 1998 conference on multimedia modeling, pp 32–38
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA
Lew MS (ed) (2001) Principles of visual information retrieval—advances in pattern recognition. Springer-Verlag, London Berlin Heidelberg
Lewis J, Ossowski S, Hicks J, Errami M, Garner H (2006) Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18):2298–2304. http://bioinformatics.oxfordjournals.org/cgi/content/full/22/18/2298
Li B, Yuan S (2004) A novel relevance feedback method in content-based image retrieval. In: Proceedings of international conference on information technology: coding an computing, pp 120–123
Lieberman H, Rosenzweig E, Singh P (2001) Aria: an agent for annotating and retrieving images. Computer 34(7):57–62
Loncaric S (1998) A survey of shape analysis techniques. Pattern Recogn 31(8):983–1190
Lu K, He X (2005) Image retrieval based on incremental subspace learning. Pattern Recogn 38(11):2047–2054
Mankoff J, Hudson SE, Abowd GD (2000) Providing integrated toolkit-level support for ambiguity in recognition-based interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’00. ACM, New York, NY, USA, pp 368–375. doi:10.1145/332040.332459
Meffert K (2010) Jgap—Java genetic algorithms and genetic programming package. http://jgap.sf.net. Accessed 15 Jan 2011
Ogle VE, Stonebraker M (1995) Chabot: retrieval from relational database of images. Computer 28(9):40–48
Oviatt S (2008) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, chap multimodal interfaces. CRC Press
Penatti OB, da Silva Torres R (2008) Color descriptors for Web image retrieval: a comparative study. In: XXI Brazilian symposium on computer graphics and image processing
Penatti OB, Valle EA, da Silva Torres R (2012) Comparative study of global color and texture descriptors for Web image retrieval. J Vis Commun Image Represent 23:359–380
Porter MF (1997) An algorithm for suffix stripping. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 313–316. http://portal.acm.org/citation.cfm?id=275537.275705
Robertson SE, Walker S, Jones S, Hancock-beaulieu MM, Gatford M (1995) Okapi at trec-3. In: Proceedings of the Third Text REtrieval Conference (TREC-3), pp 109–126
Rui Y, Huang TS, Ortega M, Mehrotra S (1998) Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuits Syst Video Technol 8(5):644–655
Rui Y, Huang TS, Chang SF (1999) Image retrieval: current techniques, promising directions, and open issues. J Visual Commun Image Represent 10(1):39–62
da Silva Torres R (2004) Integrating image and spatial data for biodiversity information management. PhD thesis, Institute of Computing, University of Campinas
da Silva Torres R, Falcão AX (2006) Content-based image retrieval: theory and applications. Rev Inform Teór Apl 13(2):161–185
da Silva Torres R, Falcão AX, Gonalves MA, Papa JP, Zhang B, Fan W, Fox EA (2009) A genetic programming framework for content-based image retrieval. Pattern Recogn 42(2):283–292
Santos KL, Almeida H, da Silva Torres R, Gonalves MA (2009) Recuperao de imagens da Web utilizando múltiplas evidncias textuais e programao gentica. In: Brazilian symposium on databases. Fortaleza, Brazil, pp 91–105
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Stehling R, Nascimento M, Falcão A (2002) A compact and efficient image retrieval approach based on border/interior pixel classification. In: Proceedings of the eleventh international conference on information and knowledge management, pp 102–109
Swain M, Ballard D (1991) Color indexing. Int J Comput Vis 7(1):11–32
Tamura H, Mori S, Yamawaki T (1978) Texture features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):460–473
Tao B, Dickinson B (2000) Texture recognition and image retrieval using gradient indexing. J Vis Commun Image Represent 11(3):327–342
Thomas A, Paul C, Sanderson M, Grubinger M (2009) Overview of the ImageCLEFphoto 2008 photographic retrieval task. In: Evaluating systems for multilingual and multimodal information access. Lecture notes in computer science, vol 5706. Springer Berlin / Heidelberg, pp 500–511. doi:10.1007/978-3-642-04447-2_62, URL:http://www.springerlink.com/content/w62642627246m817/
Tong H, He J, Li M, Zhang C, Ma W (2005) Graph based multi-modality learning. In: MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on multimedia. New York, NY, USA, pp 862–871. doi:10.1145/1101149.1101337
Vadivel A, Majumdar A, Sural S (2004) Characteristics of weighted feature vector in content-based image retrieval applications. In: International conference intelligent sensing and information processing, pp 127–132
Williams A, Yoon P (2007) Content-based image retrieval using joint correlograms. Multimed Tools Appl 34(2):239–248
Wu P, Manjunanth BS, Newsam SD, Shin HD (1999) A texture descriptor for image retrieval and browsing. In: CBAIVL ’99: proceedings of the IEEE workshop on content-based access of image and video libraries. IEEE Computer Society, Washington, DC, USA, p 3
Xu Z, Xu X, Yu K, Tresp V (2003) A hybrid relevance-feedback approach to text retrieval. In: Proceedings of the 25th European conference on information retrieval research. Lecture notes in computer science, vol 2633, pp 81–293
Yan R, Hauptmann AG (2007) A review of text and image retrieval approaches for broadcast news video. Inf Retr 10(4–5):445–484. doi:10.1007/s10791-007-9031-y, URL:http://www.springerlink.com/content/r742245481q23631/
Zeng Z, Hu Y, Liu M, Fu Y, Huang TS (2006) Training combination strategy of multi-stream fused hidden markov model for audio-visual affect recognition. In: Proceedings of the 14th annual ACM international conference on multimedia, MULTIMEDIA ’06, pp 65–68. ACM, New York, NY, USA. doi:10.1145/1180639.1180661
Zhai CX, Cohen WW, Lafferty J (2003) Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, SIGIR ’03. ACM, New York, NY, USA, pp 10–17. doi:10.1145/860435.860440
Zhang D, Lu G (2004) Review of shape representation and description. Pattern Recogn 37(1):1–19
Zhang B, Gonçalves MA, Fan W, Chen Y, Fox EA, Calado P, Cristo M (2004) Combining structural and citation-based evidence for text classification. In: Proceedings of the 13th ACM conference on information and knowledge management, pp 162–163
Zhang R, Zhang Z, Li M, Ma W, Zhang H (2006) A probabilistic semantic model for image annotation and multi-modal image retrieval. Multimedia Syst 12(1):27–33. doi:10.1007/s00530-006-0025-1, URL:http://www.springerlink.com/content/u1t220x838372257/
Zhou XS, Huang TS (2003) Relevance feedback in image retrieval: a comprehensive review. Multimedia Syst 8(6): 536–544
Acknowledgements
We would like to thank all partners from LIS (Laboratory of Information Systems - IC/UNICAMP), RECOD (Reasoning for Complex Data - IC/UNICAMP), LDB (Databases Lab - DCC/UFMG). This work was supported by The National Council for Scientific and Technological Development (CNPq), Coordination for the Improvement of Higher Level Personnel (CAPES), São Paulo Research Foundation (FAPESP), and Minas Gerais Agency for Research and Development (FAPEMIG).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Calumby, R.T., da Silva Torres, R. & Gonçalves, M.A. Multimodal retrieval with relevance feedback based on genetic programming. Multimed Tools Appl 69, 991–1019 (2014). https://doi.org/10.1007/s11042-012-1152-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1152-7