Abstract
In this paper, we propose a robust visual object clustering approach based on bounding box ranking to discover the characteristics of objects from real-world datasets containing a large number of noisy images, and apply it to sightseeing spot assessment. The purpose is to develop a diversity of resources for sightseeing from images available on social network services (SNS). Objects appearing frequently in images captured in a certain city may represent a certain characteristic of it (local culture, architecture, and so on). Such knowledge can be used to discover various sightseeing resources from the perspective of the user rather than that of the provider (e.g., a travel agency). However, owing to the variable quality of images on SNS, it is challenging to identify objects common to several images by using conventional object discovery methods, and this is where the proposed approach is useful. Extensive experiments on standard and extended benchmarks verified its effectiveness. We also tested the proposed method on an application where the characteristics of a city (i.e., cultural elements) were discovered from a set of images of it. Moreover, by utilizing the objects discovered from images on SNS, we propose an object-level assessment framework to rank sightseeing spots by assigning scores and verify its performance.












Similar content being viewed by others
Notes
References
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. TPAMI 34(11):2189–2202
Cho M, Kwak S, Laptev I, Schmid C, Ponce J (2015) Unsupervised object discovery and localization in images and videos. In: URAI, pp 292–293
Cho M, Kwak S, Schmid C, Ponce J (2015) Unsupervised object discovery and localization in the wild: part-based matching with bottom-up region proposals. In: CVPR, pp 1201–1210
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, pp 886–893
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: SIGKDD, pp 269–s274
Doersch C, Singh S , Gupta A, Sivic J, Efros A (2012) What makes paris look like paris? In: TOG, vol 31, issue 4
Everingham M, Zisserman A, Williams CKI , Van Gool L, Allan M, Bishop CM, Chapelle O, Dalal N, Deselaers T, Dorkó G et al (2007) The PASCAL visual object classes challenge 2007 (VOC2007) results
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. In: PAMI, pp 1627–1645
Girshick R (2005) Fast r-cnn. In: ICCV, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587
Harel J, Koch C, Perona P et al (2006) Graph-based visual saliency. In: NIPS, pp 545–552
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: ECCV, pp 346–361
Hochman N, Schwartz R (2012) Visualizing instagram: tracing cultural visual rhythms. In: ICWSM12, pp 6–9
Jeong J-W, Hong H-K, Heu J-U, Qasim I, Lee D-H (2012) Visual summarization of the social image collection using image attractiveness learned from social behaviors. In: ICME, pp 538–543
Kwak S, Cho M, Laptev I, Ponce J, Schmid C (2015) Unsupervised object discovery and tracking in video collections. In: ICCV, pp 3173–3181
Lowe DG (1999) Object recognition from local scale-invariant features. In: ICCV, pp 1150–1157
Manen S, Guillaumin M, Van Gool L (2013) Prime object proposals with randomized prim’s algorithm. In: ICCV, pp 2536–2543
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99
Rosenberg A, Hirschberg J (2007) V-Measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, pp 410–420
Rubinstein M, Joulin A, Kopf J, Liu C (2013) Unsupervised joint object discovery and segmentation in internet images. In: CVPR, pp 1939–1946
San Pedro J, Siersdorfer S (2009) Ranking and classifying attractiveness of photos in folksonomies. In: WWW, pp 771–780
Shen Y, Ge M, Zhuang C, Ma Q (2016) Sightseeing value estimation by analyzing geosocial images. In: BigMM, pp 117–124
Shen Y, Ge M, Zhuang C, Ma Q (2018) Sightseeing value estimation by analysing geosocial images. IJBDI 5(1/2):31–48
Shen Y, Zhuang C, Ma Q (2017) Element-oriented method of landscape assessment of sightseeing spots by using social images. In: APWeb-WAIM, pp 66–73
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv:1409.1556
Singh S, Gupta A, Efros AA (2012) Unsupervised discovery of mid-level discriminative patches. In: ECCV, 73–86
Tang K, Joulin A, Li L-J, Fei-Fei L (2014) Co-localization in real-world images. In: CVPR, pp 1464–1471
Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: CVPR, pp 1–8
Zhuang C, Ma Q, Liang X, Yoshikawa M (2014) Anaba: an obscure sightseeing spots discovering system. In: ICME, pp 1–6
Zhuang C, Ma Q, Liang X, Yoshikawa M (2015) Discovering obscure sightseeing spots by analysis of geo-tagged social images. In: ASONAM, pp 590–595
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was partly supported by JSPS KAKENHI (16K12532) and MIC SCOPE (172307001).
Rights and permissions
About this article
Cite this article
Ge, M., Zhuang, C. & Ma, Q. Robust visual object clustering and its application to sightseeing spot assessment. Multimed Tools Appl 78, 17135–17164 (2019). https://doi.org/10.1007/s11042-018-7066-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-7066-2