Abstract
Exoticism is the charm of the unfamiliar or something remote. It has received significant interest in different kinds of arts, but although visual concept classification in images and videos for semantic multimedia retrieval has been researched for years, the visual concept of exoticism has not been investigated yet from a computational perspective. In this paper, we present the first approach to automatically classify images as exotic or non-exotic. We have gathered two large datasets that cover exoticism in a general as well as a concept-specific way. The datasets have been annotated in a crowdsourcing approach. To circumvent cultural differences in the annotation, only North American crowdworkers are employed for this task. Two deep learning architectures to learn the concept of exoticism are evaluated. Besides deep learning features, we also investigate the usefulness of hand-crafted features, which are combined with deep features in our proposed fusion-based approach. Different machine learning models are compared with the fusion-based approach, which is the best performing one, reaching an accuracy over 83% and 91% on two different datasets. Comprehensive experimental results provide insights into which features contribute at most to recognizing exoticism. The estimation of image exoticism could be applied in fields like advertising and travel suggestions, as well as to increase serendipity and diversity of recommendations and search results.
Similar content being viewed by others
Notes
See footnote 1.
References
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: IEEE CVPR ’09
Adamopoulos P, Tuzhilin A (2015) On unexpectedness in recommender systems: or how to better expect the unexpected. ACM TIST 5(4):54
Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual web texts. Inf Retrieval 12(5):526–558
Borth D, Chen T, Ji R, Chang S (2013) Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: MM’13
Bradski G (2000) The openCV library. Dr. Dobb’s J Softw Tools 120:122–125
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML ’14
Editors of the American Heritage Dictionaries (2018) The American heritage dictionary of the English language. https://ahdictionary.com/word/search.html?q=exotic. Accessed 18 Jan 2019
Eickhoff C, de Vries AP (2013) Increasing cheat robustness of crowdsourcing tasks. Inf Retrieval 16(2):121–137
Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Ewerth R, Springstein M, Phan-Vogtmann LA, Schütze J (2017) “Are machines better than humans in image tagging?”: a user study adds to the puzzle. In: Jose JM, Hauff C, Altıngovde IS, Song D, Albakour D, Watt S, Tait J (eds) Advances in information retrieval. Springer, Cham, pp 186–198
Ge M, Delgado-Battenfeld C, Jannach D (2010) Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: RecSys ’10
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR ’14
Goldwater RJ (1986) Primitivism in modern art. Harvard University Press, Cambridge
Gracia J, Montiel-Ponsoda E, Cimiano P, Gómez-Pérez A, Buitelaar P, McCrae J (2012) Challenges for the multilingual web of data. Web Semant Sci Serv Agents World Wide Web 11:63–71
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato
Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804
Hare J, Samangooei S, Dupplaw D (2011) OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images. In: MM ’11
Howarth P, Rüger S (2004) Evaluation of texture features for content-based image retrieval. In: CIVR ’04
Hull DA, Grefenstette G (1996) Querying across languages: a dictionary-based approach to multilingual information retrieval. In: SIGIR ’96
Jacobs M (1995) The painted voyage: art, travel and exploration, 1564–1875 (Art History). British Museum Press, London
Jenkins OH (1999) Understanding and measuring tourist destination images. Int J Tour Res 1:1–15
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM MM ’14
Jones A (2007) This is not a cruise. http://archive.fo/TEec. Accessed 18 Jan 2019
Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang S (2015) Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: MM ’15
Kaminskas M, Bridge D (2017) Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Trans Interact Intell Syst 7(1):2
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS ’12
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Locke RP (2009) Musical exoticism. Images and reflections. Cambridge University Pres, Cambridge
Luo Y, Tang X (2008) Photo and video quality evaluation: focusing on the subject. In: ECCV ’08
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: ACM MM’10
Markatopoulou F, Moumtzidou A, Tzelepis C, Avgerinakis K, Gkalelis N, Vrochidis S, Mezaris V, Kompatsiaris I (2015) ITI-CERTH participation to TRECVID 2015. In: TRECVID 2015 workshop
Mavridaki E, Mezaris V (2014) No-reference blur assessment in natural images using Fourier transform and spatial pyramids. In: ICIP ’14
Mavridaki E, Mezaris V (2015) A comprehensive aesthetic quality assessment method for natural images using basic rules of photography. In: IEEE ICIP ’15
Mihalcea R, Banea C, Wiebe J (2007) Learning multilingual subjective language via cross-lingual projections. In: ACL ’07
Merriam-Webster Online (2018) Merriam-Webster’s dictionary of English usage. https://www.merriam-webster.com/dictionary/exotic. Accessed 18 Jan 2019
Müller-Budack E, Pustu-Iren K, Ewerth R (2018) Geolocation estimation of photos using a hierarchical model and scene classification. In: European conference on computer vision (ECCV). Springer, Munich, pp 575–592
Nguyen TT, Hui P, Harper F, Terveen L, Konstan J (2014) Exploring the filter bubble: the effect of using recommender systems on content diversity. In: WWW’14
Over P, Awad G, Fiscus J, Sanders G, Shaw B, Michel M, Smeaton A, Kraaij W, Quénot G (2013) TRECVID 2013: an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. Washington, USA. https://hal.inria.fr/hal-00953093
Pappas N, Redi M, Topkara M, Jou B, Liu H, Chen T, Chang S (2016) Multilingual visual sentiment concept matching. In: ICMR ’16
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
San Pedro J, Siersdorfer S (2009) Ranking and classifying attractiveness of photos in folksonomies. In: WWW ’09
Segalen V (2002) Essay on exoticism: an aesthetics of diversity. Duke University Press, Durham
Sharma G, Wu W, Dalal EN (2005) The CIEDE2000 color-difference formula: implementation notes, supplementary test data, and mathematical observations. Color Res Appl 30(1):21–30
Sheridan P, Ballerini JP (1996) Experiments in multilingual information retrieval using the spider system. In: SIGIR ’96
Shi Y, Larson M, Hanjalic A (2014) Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput Surv 47:1–45
Song K, Tian Y, Gao W, Huang T (2006) Diversifying the image retrieval results. In: ACM MM ’06
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR’15
Tamura H, Mori S, Yamawaki T (1978) Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):460–473
Tapachai N, Waryszak R (2000) An examination of the role of beneficial image in tourist destination selection. J Travel Res 39(1):37–44
Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2016) Yfcc100m: the new data in multimedia research. Commun ACM 59(2):64–73
Tong H, Li M, Zhang H, He J, Zhang C (2004) Classification of digital photos taken by photographers or home users. In: PCM ’04
van Leuken RH, Garcia L, Olivares X, van Zwol R (2009) Visual diversification of image search results. In: WWW ’09
van de Weijer J, Schmid C, Verbeek J (2007) Learning color names from real-world images. In: IEEE CVPR’07
Vargas S, Castells P (2011) Rank and relevance in novelty and diversity metrics for recommender systems. In: RecSys ’11
Weyand T, Kostrikov I, Philbin J (2016) Planet-photo geolocation with convolutional neural networks. In: European conference on computer vision. Springer, pp 37–55
Wu S, Chen YC, Li X, Wu AC, You JJ, Zheng WS (2016) An enhanced deep feature representation for person re-identification. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp 1–8
Wu Y, Bauckhage C, Thurau C (2010) The good, the bad, and the ugly: predicting aesthetic image labels. In: ICPR ’10
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3485–3492
Yeh CH, Ho YC, Barsky BA, Ouhyoung M (2010) Personalized photograph ranking and selection system. In: ACM MM ’10
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: ECCV ’14
Zhang N, Donahue J, Girshick RB, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: ECCV ’14
Zhao S, Gao Y, Jiang X, Yao H, Chua T, Sun X (2014) Exploring principles-of-art features for image emotion recognition. In: MM ’14
Zhao S, Ding G, Huang Q, Chua TS, Schuller BW, Keutzer K (2018) Affective image content analysis: a comprehensive survey. In: IJCAI, pp 5534–5541
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ceroni, A., Ma, C. & Ewerth, R. Mining exoticism from visual content with fusion-based deep neural networks. Int J Multimed Info Retr 8, 19–33 (2019). https://doi.org/10.1007/s13735-018-00165-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-018-00165-4