Skip to main content
Log in

A concept ontology triplet network for learning discriminative representations of fine-grained classes

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Triplet network is an efficient method of metric learning, but with the increase of the number of fine-grained images and sample categories, the training of Triplet network is more and more challengeable. In order to solve this problem, this paper proposes an algorithm that effectively combine Concept Ontology Structure with the Triplet network trained of Two-layer Ontology Loss. It not only utilizes semantic knowledge to guide the Concept Ontology Structure of the network, but also makes use of the relationship between the layers to make the network more effective to see the triplets, which enhances the separability of the learned features. At the same time, we also use the bilinear function jointly trained with the Triplet network to enhance the image details, further improving the performance of the network. Finally, the effectiveness of the proposed algorithm is also proved by the results of classification experiments on the fine-grained image databases - Orchid and Fashion60.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bell S, Bala K (2015) Learning visual similarity for product design with convolutional neural networks. Trans Graph 34(4):98:1–98:10

    Article  Google Scholar 

  2. Bromley J, Bentz JW, Bottou L, Guyon I, Lecun Y, Moore C, Säckinger E, Shah R (1994) Signature verification using a siamese time delay neural network. Series Mach Percep Artif Intell 86(11):25–44

    Google Scholar 

  3. Bucher M, Herbin S, Jurie F (2016) Improving semantic embedding consistency by metric learning for zero-shot classiffication. Computer Vision –, ECCV, pp 730–746

  4. Cheng D, Gong Y, Zhou S, Wang J, Zheng N (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: Conference on computer vision and pattern recognition (CVPR)

  5. Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. In: Conference on computer vision and pattern recognition (CVPR)

  6. Chen W, Chen X, Zhang J, Huang K (2017) A multitask deep network for person re-identification AAAI

  7. Chen Y, Jin X, Feng J, Yan S (2017) Training group orthogonal neural networks with privileged information. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence

  8. Dean T, Ruzon MA, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Conference on computer vision and pattern recognition

  9. Deng J, Ding N, Jia Y, Frome A, Murphy K, Bengio S, Li Y, Neven H, Ha A (2014) Large-scale object classification using label relation graphs. Computer Vision –, ECCV, pp 48–64

  10. Fan J, Zhao T, Kuang Z, Zheng Y, Zhang J, Yu J, Peng J (2017) Hd-mtl: hierarchical deep multi-task learning for large-scale visual recognition. Trans Image Process 26(4):1923–1938

    Article  MathSciNet  Google Scholar 

  11. Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Msceleb-1m: a dataset and benchmark for large-scale face recognition. In: Computer vision –, ECCV, pp 87–102

  12. Han Y, Wei X, Cao X et al (2014) Augmenting image descriptions using structured prediction output. IEEE Trans Multimed 16(6):1665–1676

    Article  Google Scholar 

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (CVPR)

  14. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. Computer Vision and Pattern Recognition

  15. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39

    Google Scholar 

  16. Hoffer E, Ailon N (2015) Deep metric learning using triplet network. Similarity-Based Pattern Recognition, 84–92

  17. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Conference on computer vision and pattern recognition (CVPR)

  18. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90

    Article  Google Scholar 

  19. Kuang Z, Li Z, Zhao T, Fan J (2017) Deep multi-task learning for large-scale image classification. In: Third international conference on multimedia big data (BigMM)

  20. Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G (2016) Labeled faces in the wild: a survey. Advances in Face Detection and Facial Image Analysis, 189–248

  21. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  22. Lin M, Chen Q, Yan S (2013) Network in network CoRR

  23. Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. International Conference on Computer Vision (ICCV)

  24. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Procedings of the British machine vision conference

  25. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Conference on computer vision and pattern recognition workshops

  26. Roy D, Panda P, Roy K (2018) Tree-cnn: a hierarchical deep convolutional neural network for incremental learning. Proc IEEE

  27. Sankaranarayanan S, Alavi A, Castillo CD, Chellappa R (2016) Triplet probabilistic embedding for face verification and clustering. In: 8th International conference on biometrics theory applications and systems (BTAS)

  28. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Conference on computer vision and pattern recognition (CVPR)

  29. Sebastian R (2017) An overview of multi-task learning in deep neural networks arXiv: learning

  30. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Int Conf Learn Represent

  31. Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Conference on computer vision and pattern recognition (CVPR)

  32. Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objectiv. NIPS, pp 1857–1865

  33. Sukhbaatar S, Bruna J, Paluri M, Bourdev LD, Fergus R (2014) Training convolutional networks with noisy labels. Computer Vision and Pattern Recognition

  34. Sun M, Huang W, Savarese S (2013) Find the best path: an efficient and accurate classifier for image hierarchies. International Conference on Computer Vision

  35. Szegedy C et al (2015) Going deeper with convolutions. In: Conference on computer vision and pattern recognition(CVPR)

  36. Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y (2014) Learning fine-grained image similarity with deep ranking. In: Conference on compute vision and pattern recognition

  37. Wang C, Lan X, Zhang X (2017) How to train triplet networks with 100k identities. In: Conference on computer vision workshops (ICCVW)

  38. Wang Q, Wan J, Li X (2018) Robust hierarchical deep learning for vehicular management. IEEE Transactions on Vehicular Technology (T-IV)

  39. Wang Q, Chen M, Nie F, Li X (2018) Detecting coherent groups in crowd scenes by multiview clustering. In: IEEE Transactions on pattern analysis and machine intelligence (T-PAMI)

  40. Wang Q, Yuan Z, Li X (2019) Getnet: a general end-to-end two-dimensional cnn framework for hyperspectral image change detection. IEEE Trans Geosci Remote Sens (T-GRS) 57(1):3–13

    Article  Google Scholar 

  41. Wang Q, Liu S, Chanussot J, Li X (2019) Scene classification with recurrent attention of vhr remote sensing images. IEEE Trans Geosci Remote Sens (T-GRS) 57(2):1155–1167

    Article  Google Scholar 

  42. Wu C-Y, Manmatha R, Smola AJ, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: International conference on computer vision (ICCV)

  43. Xia Z, Hong X, Gao X, Feng X, Zhao G (2019) Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans Multimed, 1–1

  44. Yan Z, Zhang H, Piramuthu R, Jagadeesh V, DeCoste D, Di W, Yu Y (2015) Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In: International conference on computer vision (ICCV)

  45. Ye J, Ni J et al (2017) Deep learning hierarchical representations for image steganalysis. IEEE Trans Inform Forens Secur 12(11):2545–2557

    Article  Google Scholar 

  46. Ying L (2014) Orthogonal incremental extreme learning machine for regression and multiclass classification. Neural Comput and Applic 27(1):111–120

    Article  Google Scholar 

  47. Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process (TIP) 23 (5):2019–2032

    Article  MathSciNet  Google Scholar 

  48. Yu J, Rui Y, Chen B (2014) Exploiting click constraints and multiview features for image reranking. IEEE Trans Multimed 16(1):159–168

    Article  Google Scholar 

  49. Yu J, Tao D, Rui Y, Wang M (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern (IEEE TCYB) 45 (4):767–779

    Article  Google Scholar 

  50. Yu J, Kuang Z, Zhang B, Zhang W, Lin D, Fan J (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Transactions on Information Forensics and Security

  51. Zhang S, Gong Y, Wang J (2016) Deep metric learning with improved triplet loss for face clustering in videos. Lect Notes Comput Sci, 497–508

  52. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. Computer Vision– ECCV, 818–833

  53. Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: Conference on computer vision and pattern recognition (CVPR)

  54. Zhang X, Zhou F, Lin Y et al (2016) Embedding label structures for fine-grained feature representation. CVPR, 1114–1123

  55. Zhang H, He G, Peng J, Kuang Z, Fan J (2018) Deep learning of path-based tree classifiers for large-scale plant species identification. In: Conference on multimedia information processing and retrieval (MIPR)

  56. Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. Trans Cybern 46(2):450–461

    Article  Google Scholar 

  57. Zhuang B, Lin G, Shen C, Reid I (2016) Fast training of triplet-based deep binary embedding networks. In: Conference on computer vision and pattern recognition (CVPR)

Download references

Acknowledgments

This research was funded by the National Nature Science Foundation of China (NO.61402368), Aerospace Science and Technology Innovation Foundation of China (NO.2017ZD53047 and NO.20175896), Common Technology Foundation for Pre-research and Development of Equipment in the 13th Five-Year Plan (NO.41412010402), the Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University (NO.ZZ2019166).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guiqing He.

Ethics declarations

Conflict of interests

No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, G., Zhang, Q., Zhang, H. et al. A concept ontology triplet network for learning discriminative representations of fine-grained classes. Multimed Tools Appl 79, 25189–25214 (2020). https://doi.org/10.1007/s11042-020-09090-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09090-3

Keywords