Abstract
Discovering visual representation in an image category is a challenging issue, because the visual representation should not only be discriminating but also frequently appears in these images. Previous studies have proposed many solutions, but they all separately optimized the discrimination and frequency, which makes the solutions sub-optimal. To address this issue, we propose a method to discover the jointly discriminating and frequent visual representation, named as JDFR. To ensure discrimination, JDFR employs a classification task with cross-entropy loss. To achieve frequency, JDFR uses triplet loss to optimize within-class and between-class distance, then mines frequent visual representations in feature space. Moreover, we propose an attention module to locate the representative region in the image. Extensive experiments on four benchmark datasets (i.e. CIFAR10, CIFAR100-20, VOC2012-10 and Travel) show that the discovered visual representations have better discrimination and frequency than ones mined from five state-of-the-art methods with average improvements of 7.51% on accuracy and 1.88% on frequency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Li, Y., Liu, L., Shen, C., Van Den Hengel, A.: Mining mid-level visual patterns with deep CNN activations. IJCV 121, 344–364 (2017)
Chen, Z., Maffra, F., Sa, I., Chli, M.: Only look once, mining distinctive landmarks from convnet for visual place recognition. In: IROS, pp. 9–16 (2017)
Yang, L., Xie, X., Lai, J.: Learning discriminative visual elements using part-based convolutional neural network. Neurocomputing 316, 135–143 (2018)
Memon, I., Chen, L., Majid, A., Lv, M., Hussain, I., Chen, G.: Travel recommendation using geo-tagged photos in social media for tourist. Wirel. Pers. Commun. 80, 1347–1362 (2015)
Vu, H.Q., Li, G., Law, R., Ye, B.H.: Exploring the travel behaviors of inbound tourists to Hong Kong using geotagged photos. Tour. Manage. 46, 222–232 (2015)
Bronner, F., De Hoog, R.: Vacationers and eWOM : who posts, and why, where, and what? J. Travel Res. 50, 15–26 (2011)
Li, H., Ellis, J.G., Zhang, L., Chang, S.F.: Automatic visual pattern mining from categorical image dataset. Int. J. Multimedia Inf. Retrieval 8, 35–45 (2019)
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, vol. 2, pp. 1150–1157 (1999)
Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes Paris look like Paris? Commun. ACM 58, 103–110 (2015)
Zhang, W., Cao, X., Wang, R., Guo, Y., Chen, Z.: Binarized mode seeking for scalable visual pattern discovery. In: CVPR, pp. 3864–3872 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Tan, Z., Liang, W., Wei, F., Pun, C.M.: Image co-saliency detection by propagating superpixel affinities. In: ICASSP (2013)
Chang, K.Y., Liu, T.L., Lai, S.H.: From co-saliency to co-segmentation: an efficient and fully unsupervised energy minimization model. In: CVPR, pp. 2129–2136 (2011)
Zhang, B., Gao, Y., Zhao, S., Liu, J.: Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE TIP 19, 533–544 (2009)
Kim, S., Jin, X., Han, J.: Disiclass: discriminative frequent pattern-based image classification. In: KDD Workshop on Multimedia Data Mining (2010)
Lapuschkin, S., Binder, A., Montavon, G., Muller, K.R., Samek, W.: Analyzing classifiers: Fisher vectors and deep neural networks. In: CVPR, pp. 2912–2920 (2016)
Gong, Y., Pawlowski, M., Yang, F., Brandy, L., Bourdev, L., Fergus, R.: Web scale photo hash clustering on a single machine. In: CVPR, pp. 19–27 (2015)
Chum, O., Matas, J.: Large-scale discovery of spatially related images. IEEE TPAMI 32, 371–377 (2009)
Fu, H., Xu, D., Lin, S., Liu, J.: Object-based RGBD image co-segmentation with mutex constraint. In: CVPR, pp. 4428–4436 (2015)
Fu, H., Xu, D., Zhang, B., Lin, S.: Object-based multiple foreground video co-segmentation. In: CVPR, pp. 3166–3173 (2014)
Tang, K., Joulin, A., Li, L.J., Fei-Fei, L.: Co-localization in real-world images. In: CVPR, 1464–1471 (2014)
Wei, L., Zhao, S., Bourahla, O.E.F., Li, X., Wu, F.: Group-wise deep co-saliency detection. arXiv:1707.07381 (2017)
Fu, H., Cao, X., Tu, Z.: Cluster-based co-saliency detection. IEEE TIP 22, 3766–3778 (2013)
Cao, X., Tao, Z., Zhang, B., Fu, H., Feng, W.: Self-adaptively weighted co-saliency detection via rank constraint. IEEE TIP 23, 4175–4186 (2014)
Huang, R., Feng, W., Sun, J.: Saliency and co-saliency detection by low-rank multiscale fusion. In: ICME (2015)
Luo, Y., Jiang, M., Wong, Y., Zhao, Q.: Multi-camera saliency. IEEE TPAMI 37, 2057–2070 (2015)
Bors, A.G., Papushoy, A.: Image retrieval based on query by saliency content. In: Benois-Pineau, J., Le Callet, P. (eds.) Visual Content Indexing and Retrieval with Psycho-Visual Models. MSA, pp. 171–209. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57687-9_8
Ge, C., Fu, K., Liu, F., Bai, L., Yang, J.: Co-saliency detection via inter and intra saliency propagation. Signal Process. Image Commun. 44, 69–83 (2016)
Li, H., Ngan, K.N.: A co-saliency model of image pairs. IEEE TIP 20, 3365–3375 (2011)
Nguyen, H.V., Bai, L.: Cosine similarity metric learning for face verification. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6493, pp. 709–720. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19309-5_55
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928 (2016)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Shetty, S.: Application of convolutional neural network for image classification on pascal VOC challenge 2012 dataset. arXiv:1607.03785 (2016)
Zhang, K., Li, T., Liu, B., Liu, Q.: Co-saliency detection via mask-guided fully convolutional networks with multi-scale label smoothing. In: CVPR, pp. 3095–3104 (2019)
Acknowledgments
This work is supported by the Science and Technology Plan of Xi’an (20191122015KYPT011JC013), the Fundamental Research Funds of the Central Universities of China (No. JX18001) and the Science Basis Research Program in Shaanxi Province of China (No. 2020JQ-321, 2019JQ-663).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Q., Zhou, Y., Zhu, Z., Liang, X., Gu, Y. (2021). Jointly Discriminating and Frequent Visual Representation Mining. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12624. Springer, Cham. https://doi.org/10.1007/978-3-030-69535-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-69535-4_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69534-7
Online ISBN: 978-3-030-69535-4
eBook Packages: Computer ScienceComputer Science (R0)