Personalization in multimedia retrieval: A survey

Lu, Yijuan; Sebe, Nicu; Hytnen, Ross; Tian, Qi

doi:10.1007/s11042-010-0621-0

Personalization in multimedia retrieval: A survey

Published: 03 November 2010

Volume 51, pages 247–277, (2011)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yijuan Lu¹,
Nicu Sebe²,
Ross Hytnen¹ &
…
Qi Tian³

708 Accesses
3 Altmetric
Explore all metrics

Abstract

With the explosive broadcast of multimedia (text documents, image, video etc.) in our life, how to annotate, search, index, browse and relate various forms of information efficiently becomes more and more important. Combining these challenges by relating them to user preference and customization only complicates the matter further. The goal of this survey is to give an overview of the current situation in the branches of research that are involved in annotation, relation and presentation to a user by preference. This paper will present some current models and techniques being researched to model ontology, preference, context, and presentation and bring them together in a chain of ideas that leads from raw uninformed data to an actual usable user interface that adapts with user preference and customization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting semantic knowledge from web context for multimedia IR: a taxonomy, survey and challenges

Article 25 July 2017

A Survey on the Use of Personalized Model-Based Search Engine

Multimedia, Similarity, and Preferences: Adding Flexibility to Your Information Needs

Notes

Courtesy to the FACS Consortium.
The idea of I-Objects and part of the description come from discussions and documents created during the FACS consortium interactions.

References

Agarwal S, Fankhauser P, Gonzalez-Ollala J, Hartman J, Hollfelder S, Jameson A, Klink S, Lehti P, Ley M, Rabbidge E, Scharzkopf E, Shrestha N, Stojanovic N, Studer R, Stumme G, Walter B, Weber A (2003) Semantic methods and tools for information portals. Proceedings of INFORMATIK 2003 - Innovative Informatikanwendungen, pp 116–131
Agius H, Angelides M (2007) Closing the content-user gap in MPEG-7: the hanging basket model. Multimed Syst 13(2):155–176
Article Google Scholar
Ahn LV, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images, SIGCHI Conference. Human Factors in Computing Systems, pp 55–64
Aizawa K, Tancharoen D, Kawasaki S, Yamasaki T (2004) Efficient retrieval of life log based on context and content. ACM Workshop on Continuous Archival and Retrieval of Personal Experiences, pp 22–31
Arifin S, Cheung PYK (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. ACM Multimedia, pp 68–77
Arthur GM, Harry A (2008) Video summarization: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143
Article Google Scholar
Battelle J (2005) The search: how Google and its rivals rewrote the rules of business and transformed our culture, Portofolio Hardcover
Belloti R, Decurtins C, Grossniklaus M, Norrie M, Palinginis A (2004) Modeling context for information environments, ubiquitous mobile information and collaboration systems. Lect Notes Comput Sci 3272:43–56
Google Scholar
Blei D, Jordan M (2003) Modeling annotated data. ACM SIGIR, pp 127–134
Brewer E et al (2005) The case for technology in developing regions. IEEE Computer 38(6):25–38
Google Scholar
Bruno D, Denis L, Sharon O (2009) Multimodal interfaces: a survey of principles, models and frameworks, human machine interaction. Lect Notes Comput Sci 5440:3–26
Article Google Scholar
Bulterman D, Rutledge L (2004) SMIL 2.0: Interactive multimedia for web and mobile devices. Springer-Verlag, Heidelberg
Google Scholar
Bulterman D, Hardman L, Jansen J, Mullender K, Rutledge L (1998) GRiNS: A GRaphical interface for creating and playing SMIL documents. Comput Netw ISDN systems 10:519–529
Article Google Scholar
Chen L, Sycara K (1998) WebMate: personal agent for browsing and searching. Int. Conf. on Autonomous Agents, pp 132–139
Chen H, Zheng NN, Liang L, Li Y, Xu YQ, Shum HY (2002) PicToon: a personalized image-based cartoon system, ACM Multimedia, pp 171–178
Crystal D (1991) A dictionary of linguistics and phonetics. Blackwell, Oxford
Google Scholar
Deng J, Dong W, Socher R, Li J, Li K, Li FF (2009) ImageNet: a large-scale hierarchical image database. IEEE Conf. on Computer Vision and Pattern Recognition, pp 248–255
Dimitrova N (2003) Multimedia content analysis: the next wave, Int. Conf. on Image and Video Retrieval, pp 415–420
Dimitrova N, Zhang HJ, Shahraray B, Sezan I, Huang T, Zakhor A (2002) Applications of video-content analysis and retrieval. IEEE Multimedia 9(3):42–55
Article Google Scholar
Dorai C, Farrell R, Katriel A, Kofman G, Li Y, Park Y (2006) BMAGICAL demonstration: system for automated metadata generation for instructional content. ACM Multimedia, pp 491–492
eHealth Workshop 2010, http://research.microsoft.com/en-us/collaboration/global/asia-pacific/programs/ehealth.aspx
Eynard D (2008) Using semantics and user participation to customize personalization, HP Laboratories Technical Report HPL-2008-197
Fergus R, Perona P, Zissermann A (2003) Object class recognition by unsupervised scale invariant learning, IEEE Conf. on Computer Vision and Pattern Recognition, pp 264–271
Foote JT (1997) Content-based retrieval of music and audio. SPIE Multimed Storage Archiving Syst II 3229:138–147
Google Scholar
Gevers T, Smeulders A (1999) Color based object recognition. Pattern Recogn 32:453–464
Article Google Scholar
Ghidini C, Giunchiglia F (2001) Local models, semantics, or contextual reasoning = locality + compatibility. Artif Intell 127(2):221–259
Article MATH MathSciNet Google Scholar
Giunchiglia F, Serafini L (1994) Multilanguage hierarchical logics, or how can we do without modal logics. Artif Intell 65(1):29–70
Article MATH MathSciNet Google Scholar
Guerts J, van OssenBruggen J, Hardman L (2001) Application-specific constraints for multimedia presentation generation. Int. Conf. on Multimedia Modelling, pp 247–266
Guerts J, van OssenBruggen J, Hardman L, Rutledge L (2003) Towards a multimedia formatting vocabulary. Int. Conf. on WWW, pp 384–393
Hanjalic A (2005) Adaptive extraction of highlights from a sport video based on excitement modeling. IEEE Trans Multimedia 7(6):1114–1122
Article Google Scholar
Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized TV. IEEE Signal Process Mag 23(2):90–100
Article Google Scholar
Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154
Article Google Scholar
Hirsh H, Basu C, Davison B (2000) Learning to personalize. Commun ACM 43(8):102–106
Article Google Scholar
Hori T, Aizawa K (2003) Context-based video retrieval system for the Life Log applications. ACM Multimedia Information Retrieval Workshop, pp 31–38
Hori T, Aizawa K (2004) Capturing life log and retrieval based on context. IEEE Conf. on Multimedia and Expo, pp 301–304
http://www.oratrix.com/GRiNS/
Hua XS, Lu L, Zhang HJ (2004) P-Karaoke: personalized karaoke system, ACM Multimedia, pp 172–173
Infomedia Project, http://www.informedia.cs.cmu.edu
Isbister K, Hook K, Sharp M, Laaksolahti J (2006) The sensual evaluation instrument: developing an affective evaluation tool. SIGCHI Conf. on Human Factors in Computing Systems, pp 1163–1172
Jaimes A, Sebe N (2007) Multimodal human-computer interaction: a survey. Comput Vis Image Underst 108(1–2):116–134
Google Scholar
Jaimes A, Sebe N, Gatica-Perez D (2006) Human-centered computing: a multimedia perspective, ACM Multimedia, pp 855–864
Jaimes A, Gatica-Perez D, Sebe N, Huang T (2007) Human-centered computing: toward a human revolution. IEEE Computer 40(5):30–34
Google Scholar
Jain R (2003) Folk computing. Communications ACM 46(4):27–29
Google Scholar
Jameson A (2001) Systems that adapt to their users. Tutorial presented at IJCAI 2001, www.dfki.de/~jameson
Jameson A (2001) User-adaptive and other smart adaptive systems: possible synergies. The First EUNITE Symposium, pp 13–14
Kadlek T, Jelenik I (2008) Semantic user profile acquisition and sharing, Int. Conf. on Computer Systems and Technologies and Workshop for PhD students in Computing
Kang HB (2002) Analysis of scene context related with emotional events. ACM Multimedia, pp 311–314
Klemke R (2000) Context framework—an open approach to enhance organizational memory systems with context modeling techniques, Int. Conf. on Practical Aspects of Knowledge Management, pp 14-1–14-12
Lang PJ (1993) The network model of emotion: motivational connections. In: Advances in social cognition. Lawrence Erlbaum Associates, Hillsdale, NJ, pp 109–133
Lavrenko V, Feng S, Manmatha R (2003) Statistical models for automatic video annotation and retrieval. Int. Conf. on Acoustics, Speech and Signal Processing, pp 17–21
Lee M, Wilks Y (1996) An ascription-based approach to speech acts, Int. Conf. on Computational Linguistics, pp 699–704
Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state-of-the-art and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1–19
Article Google Scholar
Li T, Mitsunori O (2003) Detecting emotion in music. Int. Conf. on Music Information Retrieval (ISMIR), pp 239–240
Li X, Yan J, Fan WG, Liu N, Yan SC, Chen Z (2009) An online blog reading system by topic clustering and personalized ranking. ACM Trans. on Internet Technology 9(3) Article 9
Liu D, Lu L, Zhang HJ (2003) Automatic mood detection from acoustic music data. Int. Conf. on Music Information Retrieval (ISMIR), pp 81–87
Liu B, Gupta A, Jain R (2005) MedSMan: a streaming data management system over live multimedia, ACM Multimedia, pp 171–180
Liu D, Hua G, Viola P, Chen T (2008) Integrated feature selection and higher-order spatial feature extraction for object categorization. IEEE Conf. on Computer Vision and Pattern Recognition, pp 1–8
Lu L, Liu D, Zhang HJ (2006) Automatic mood detection and tracking of music audio signals. IEEE Trans Audio Lang Process 14(1):5–18
Article MathSciNet Google Scholar
Magnini B, Strapparava C (2004) User modeling for news web sites with word sense based techniques. User Model User-Adapt Interact 14(2–3):239–257
Article Google Scholar
Mann W, Matthiesen C, Thompson S (1989) Rhetorical structure theory and text analysis, technical report ISI/RR-89-242, November
Marszalek M, Schmid C (2006) Spatial weighting for bag-of-features. IEEE Conf. on Computer Vision and Pattern Recognition, pp 2118–2125
Maybury MT (1997) Intelligent multimedia information retrieval, AAAI/MIT Press
McCarthy J (1987) Generality in artificial intelligence. Commun ACM 30(12):1030–1035
Article MATH MathSciNet Google Scholar
Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psycho 14(4):261–292
Article MathSciNet Google Scholar
Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comp Vis 60:63–86
Article Google Scholar
Moncrieff S, Dorai C, Venkatesh S (2001) Affect computing in film through sound energy dynamics. ACM Multimedia, pp 525–527
MPEG—Moving Picture Expert Group, http://www.chiariglione.org/mpeg/
Naphade, Huang TS (2001) A probabilistic framework for semantic video indexing, filtering and reieval. IEEE Trans Multimedia 3(1):141–151
Naphade MR, Huang TS (2002) Extracting semantics from audiovisual content: the final frontier in multimedia retrieval. IEEE Trans Neural Netw 13(4):793–810
Article Google Scholar
Naphade MR, Kristjansson T, Frey B, Huang TS (1998) Probabilistic multimedia objects (Multijects): a novel approach to video indexing and retrieval in multimedia systems. Int. Conf. on Image Processing, pp 536–540
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree, IEEE Conf. on Computer Vision and Pattern Recognition, pp 2161–2168
Oviatt S (2003) User-centered modeling and evaluation of multimodal interfaces. Proc IEEE 91(9):1457–1468
Article Google Scholar
Parsons S, Sierra C, Jennings NR (1998) Agents that reason and negotiate by arguing. J Log Comput 8(3):261–292
Article MATH MathSciNet Google Scholar
Quiroga L (1999) Empirical evaluation of explicit vs implicit acquisition of user profiles in information filtering systems, ACM Conf. on Digital Libraries, pp 238–239
Rauber A, Pampalk E, Merkl D (2003) The SOM-enhanced jukebox: organization and visualization of music collections based on perceptual models. J New Music Res JNMR 32(2):193–210
Article Google Scholar
Rigo S, Jose O (2008) Advanced in conceptual modeling—challenges and opportunities: ER 2008 Workshops CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM. Lect Notes Comput Sci 5232
Roy D, Pentland A (2002) Learning words from sights and sounds: a computational model. Cogn Sci 26(1):113–146
Google Scholar
Russell J, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11:273–294
Article Google Scholar
Savarese S, Winn J, Criminisi A (2006) Discriminative object class models of appearance and shape by correlatons. IEEE Conf. on Computer Vision and Pattern Recognition, pp 2033–2040
Schilit B, Adams N, Want R (1994) Context-aware computing applications. IEEE Workshop on Mobile Computing Systems and Applications, pp 85–90
Schlosberg H (1954) Three dimensions of emotion. Psychol Rev 61(2):81–88
Article Google Scholar
Sebe N, Tian Q (2007) Personalized multimedia retrieval: the new trend? ACM Multimedia Information Retrieval Workshop, pp 299–306
Zhang S, Huang Q, Jiang S, Gao W, Tian Q (2010) Affective visualization and retrieval for music video. IEEE Trans Multimedia, Special Issue on Multimodal Afftective Interaction 12(6):510–522
Google Scholar
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos, Int. Conf. on Computer Vision, pp 1470–1477
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Google Scholar
Snoek CGM, Worring M, Geusebroek J, Koelma D, Seinstra F, Smeulders A (2006) The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Trans Patt Anal Mach Intell 28(10):1678–1689
Article Google Scholar
Song Y, Hua XS, Dai LR, Wang M (2005) Semi-automatic video annotation based on active learning with multiple complementary predictors. ACM Int. Workshop on Multimedia Information Retrieval, pp 97–104
StreamSage, http://www.streamsage.com
Sullivan DO, Smyth B, Wilson DC, McDonald K, Smeaton A (2004) Improving the quality of the personalized electronic program guide. User Model User-Adapt Interact 14(1):5–36
Article Google Scholar
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970
Article Google Scholar
Tseng BL, Lin CY, Smith JR (2004) Using MPEG-7 and MPEG-21 for personalizing video. IEEE Trans Multimedia 11(1):42–52
Article Google Scholar
Tsinaraki C, Christodoulakis S (2005) Semantic user preference descriptions in MPEG-7/21. The 4th Hellienic Data Managerment Symposium (HDMS)
Tsinaraki C, Christodoulakis S (2006) A multimedia user preference model that supports semantics and its application to MPEG 7/21. Int. Conf. on Multimedia Modelling, pp 35–42
Tsinaraki C, Polydoros P, Kazasis F, Christodoulakis S (2005) Ontology-based semantic indexing for MPEG-7 and TV-anytime audiovisual content. Multimed Tools Appl 26(3):299–325
Article Google Scholar
Venkatesh S, Adams B, Phung D, Dorai C, Farrell RG, Agnihotri L, Dimitrova N (2008) “You Tube and I Find”-personalizing multimedia content access. Proc IEEE 96(4):697–711
Article Google Scholar
Wang HL, Cheong LF (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704
Article Google Scholar
Wang FS, Lu W, Liu J, Shah M, Xu D (2008) Automatic video annotation with adaptive number of key words, Int. Conf. on Pattern Recognition, pp 1–4
Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. ACM Multimedia, pp 239–248
Webb GI, Pazzani MJ, Billsus D (2001) Machine learning for user modeling. User Model User-Adapt Interact 11(1–2):19–29
Article MATH Google Scholar
Wei G, Petrushin V, Gershman A (2002) From data to insight: the community of multimedia agents, Int. Workshop on Multimedia Data Mining
Weitzman L, Wittenberg K (1994) Automatic presentation of multimedia documents using relational grammars. ACM Multimedia, pp 443–451
Winn J, Criminisi A, Minka T (2005) Object categorization by learning universal visual word dictionary. Int. Conf. on Computer Vision, pp 1800–1807
Wold E, Blum T, Kreislar D, Wheaton J (1996) Content-based classification, search, and retrieval of audio. IEEE Multimedia 3(3):27–36
Article Google Scholar
Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Intell 30(11):1985–1997
Article Google Scholar
Xu M, Chia LT, Jin J (2005) Affective content analysis in comedy and horror videos by audio emotional event detection. IEEE Int. Conf. on Multimedia and Expo, pp 622–625
Yang L, Meer P, Foran DJ (2007) Multiple class segmentation using a unified framework over mean-shift patches. IEEE Conf. on Computer Vision and Pattern Recognition, pp 1–8
Yu B, Ma WY, Nahrstedt K, Zhang HJ (2003) Video summarization based on user log enhanced link analysis. ACM Multimedia, pp 382–391
Zeng ZH, Pantic M, Roisman GI, Huang T. A survey of affect recognition methods: audio, visual and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Zhang S, Tian Q, Hua G, Huang Q, Li S (2009) Descriptive visual words and visual phrases for image applications. ACM Multimedia, pp 75–84
Zhou M (1999) Visual planning: a practical approach to automated presentation design. Int. Joint Conference on Artificial Intelligence, pp 634–641
Zhou XS, Huang TS (2003) Relevance feedback in image retrieval: a comprehensive review. Multimed Syst 8(6):536–544
Article Google Scholar
Zhou M, Houck K, Pan S, Shaw J, Aggarwal V, Wen Z (2006) Enabling context-sensitive information seeking, Int. Conf. on Intelligent User Interfaces, pp 116–123
Zhou X, Zhuang XD, Yan SC, Chang SF, Johnson MH, Huang T (2008) SIFT-Bag kernel for video event analysis. ACM Multimedia, pp 229–238
Von AL (2006) Games with a purpose. IEEE Computer 39(6):96–98
Google Scholar

Download references

Acknowledgements

We would like to thank Dick Bulterman, Stavros Christodoulakis, Chabane Djeraba, Daniel Gatica-Perez, Thomas Huang, Alex Jaimes, Ramesh Jain, Mike Lew, Andy Rauber, Pasquale Savino, Arnold Smeulders, and the whole FACS consortium for excellent suggestions and discussions. The work of Nicu Sebe has been supported by the FP7 IP GLOCAL european project and by the FIRB S-PATTERN project. The work of Yijuan Lu was supported in part by the Research Enhancement Program (REP) and start-up funding from the Texas State University.

Author information

Authors and Affiliations

Department of Computer Science, Texas State University, San Marcos, TX, 78666, USA
Yijuan Lu & Ross Hytnen
Department of Information Engineering and Computer Science, University of Trento, Via Sommarive 14-38100 Povo, Trento, Italy
Nicu Sebe
Computer Science Department, University of Texas at San Antonio, San Antonio, TX, 78249, USA
Qi Tian

Authors

Yijuan Lu
View author publications
You can also search for this author inPubMed Google Scholar
Nicu Sebe
View author publications
You can also search for this author inPubMed Google Scholar
Ross Hytnen
View author publications
You can also search for this author inPubMed Google Scholar
Qi Tian
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Qi Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Y., Sebe, N., Hytnen, R. et al. Personalization in multimedia retrieval: A survey. Multimed Tools Appl 51, 247–277 (2011). https://doi.org/10.1007/s11042-010-0621-0

Download citation

Published: 03 November 2010
Issue Date: January 2011
DOI: https://doi.org/10.1007/s11042-010-0621-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Personalization in multimedia retrieval: A survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Extracting semantic knowledge from web context for multimedia IR: a taxonomy, survey and challenges

A Survey on the Use of Personalized Model-Based Search Engine

Multimedia, Similarity, and Preferences: Adding Flexibility to Your Information Needs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now