Abstract
Triplet network is an efficient method of metric learning, but with the increase of the number of fine-grained images and sample categories, the training of Triplet network is more and more challengeable. In order to solve this problem, this paper proposes an algorithm that effectively combine Concept Ontology Structure with the Triplet network trained of Two-layer Ontology Loss. It not only utilizes semantic knowledge to guide the Concept Ontology Structure of the network, but also makes use of the relationship between the layers to make the network more effective to see the triplets, which enhances the separability of the learned features. At the same time, we also use the bilinear function jointly trained with the Triplet network to enhance the image details, further improving the performance of the network. Finally, the effectiveness of the proposed algorithm is also proved by the results of classification experiments on the fine-grained image databases - Orchid and Fashion60.











Similar content being viewed by others
References
Bell S, Bala K (2015) Learning visual similarity for product design with convolutional neural networks. Trans Graph 34(4):98:1–98:10
Bromley J, Bentz JW, Bottou L, Guyon I, Lecun Y, Moore C, Säckinger E, Shah R (1994) Signature verification using a siamese time delay neural network. Series Mach Percep Artif Intell 86(11):25–44
Bucher M, Herbin S, Jurie F (2016) Improving semantic embedding consistency by metric learning for zero-shot classiffication. Computer Vision –, ECCV, pp 730–746
Cheng D, Gong Y, Zhou S, Wang J, Zheng N (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: Conference on computer vision and pattern recognition (CVPR)
Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. In: Conference on computer vision and pattern recognition (CVPR)
Chen W, Chen X, Zhang J, Huang K (2017) A multitask deep network for person re-identification AAAI
Chen Y, Jin X, Feng J, Yan S (2017) Training group orthogonal neural networks with privileged information. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence
Dean T, Ruzon MA, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Conference on computer vision and pattern recognition
Deng J, Ding N, Jia Y, Frome A, Murphy K, Bengio S, Li Y, Neven H, Ha A (2014) Large-scale object classification using label relation graphs. Computer Vision –, ECCV, pp 48–64
Fan J, Zhao T, Kuang Z, Zheng Y, Zhang J, Yu J, Peng J (2017) Hd-mtl: hierarchical deep multi-task learning for large-scale visual recognition. Trans Image Process 26(4):1923–1938
Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Msceleb-1m: a dataset and benchmark for large-scale face recognition. In: Computer vision –, ECCV, pp 87–102
Han Y, Wei X, Cao X et al (2014) Augmenting image descriptions using structured prediction output. IEEE Trans Multimed 16(6):1665–1676
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (CVPR)
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. Computer Vision and Pattern Recognition
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. Similarity-Based Pattern Recognition, 84–92
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Conference on computer vision and pattern recognition (CVPR)
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90
Kuang Z, Li Z, Zhao T, Fan J (2017) Deep multi-task learning for large-scale image classification. In: Third international conference on multimedia big data (BigMM)
Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G (2016) Labeled faces in the wild: a survey. Advances in Face Detection and Facial Image Analysis, 189–248
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lin M, Chen Q, Yan S (2013) Network in network CoRR
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. International Conference on Computer Vision (ICCV)
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Procedings of the British machine vision conference
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Conference on computer vision and pattern recognition workshops
Roy D, Panda P, Roy K (2018) Tree-cnn: a hierarchical deep convolutional neural network for incremental learning. Proc IEEE
Sankaranarayanan S, Alavi A, Castillo CD, Chellappa R (2016) Triplet probabilistic embedding for face verification and clustering. In: 8th International conference on biometrics theory applications and systems (BTAS)
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Conference on computer vision and pattern recognition (CVPR)
Sebastian R (2017) An overview of multi-task learning in deep neural networks arXiv: learning
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Int Conf Learn Represent
Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Conference on computer vision and pattern recognition (CVPR)
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objectiv. NIPS, pp 1857–1865
Sukhbaatar S, Bruna J, Paluri M, Bourdev LD, Fergus R (2014) Training convolutional networks with noisy labels. Computer Vision and Pattern Recognition
Sun M, Huang W, Savarese S (2013) Find the best path: an efficient and accurate classifier for image hierarchies. International Conference on Computer Vision
Szegedy C et al (2015) Going deeper with convolutions. In: Conference on computer vision and pattern recognition(CVPR)
Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y (2014) Learning fine-grained image similarity with deep ranking. In: Conference on compute vision and pattern recognition
Wang C, Lan X, Zhang X (2017) How to train triplet networks with 100k identities. In: Conference on computer vision workshops (ICCVW)
Wang Q, Wan J, Li X (2018) Robust hierarchical deep learning for vehicular management. IEEE Transactions on Vehicular Technology (T-IV)
Wang Q, Chen M, Nie F, Li X (2018) Detecting coherent groups in crowd scenes by multiview clustering. In: IEEE Transactions on pattern analysis and machine intelligence (T-PAMI)
Wang Q, Yuan Z, Li X (2019) Getnet: a general end-to-end two-dimensional cnn framework for hyperspectral image change detection. IEEE Trans Geosci Remote Sens (T-GRS) 57(1):3–13
Wang Q, Liu S, Chanussot J, Li X (2019) Scene classification with recurrent attention of vhr remote sensing images. IEEE Trans Geosci Remote Sens (T-GRS) 57(2):1155–1167
Wu C-Y, Manmatha R, Smola AJ, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: International conference on computer vision (ICCV)
Xia Z, Hong X, Gao X, Feng X, Zhao G (2019) Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans Multimed, 1–1
Yan Z, Zhang H, Piramuthu R, Jagadeesh V, DeCoste D, Di W, Yu Y (2015) Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In: International conference on computer vision (ICCV)
Ye J, Ni J et al (2017) Deep learning hierarchical representations for image steganalysis. IEEE Trans Inform Forens Secur 12(11):2545–2557
Ying L (2014) Orthogonal incremental extreme learning machine for regression and multiclass classification. Neural Comput and Applic 27(1):111–120
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process (TIP) 23 (5):2019–2032
Yu J, Rui Y, Chen B (2014) Exploiting click constraints and multiview features for image reranking. IEEE Trans Multimed 16(1):159–168
Yu J, Tao D, Rui Y, Wang M (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern (IEEE TCYB) 45 (4):767–779
Yu J, Kuang Z, Zhang B, Zhang W, Lin D, Fan J (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Transactions on Information Forensics and Security
Zhang S, Gong Y, Wang J (2016) Deep metric learning with improved triplet loss for face clustering in videos. Lect Notes Comput Sci, 497–508
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. Computer Vision– ECCV, 818–833
Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: Conference on computer vision and pattern recognition (CVPR)
Zhang X, Zhou F, Lin Y et al (2016) Embedding label structures for fine-grained feature representation. CVPR, 1114–1123
Zhang H, He G, Peng J, Kuang Z, Fan J (2018) Deep learning of path-based tree classifiers for large-scale plant species identification. In: Conference on multimedia information processing and retrieval (MIPR)
Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. Trans Cybern 46(2):450–461
Zhuang B, Lin G, Shen C, Reid I (2016) Fast training of triplet-based deep binary embedding networks. In: Conference on computer vision and pattern recognition (CVPR)
Acknowledgments
This research was funded by the National Nature Science Foundation of China (NO.61402368), Aerospace Science and Technology Innovation Foundation of China (NO.2017ZD53047 and NO.20175896), Common Technology Foundation for Pre-research and Development of Equipment in the 13th Five-Year Plan (NO.41412010402), the Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University (NO.ZZ2019166).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, G., Zhang, Q., Zhang, H. et al. A concept ontology triplet network for learning discriminative representations of fine-grained classes. Multimed Tools Appl 79, 25189–25214 (2020). https://doi.org/10.1007/s11042-020-09090-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09090-3