Skip to main content

Adaptive Open Set Recognition with Multi-modal Joint Metric Learning

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13534))

Included in the following conference series:

  • 2794 Accesses

Abstract

Open set recognition (OSR) aims to simultaneously identify known classes and reject unknown classes. However, existing researches on open set recognition are usually based on single-modal data. Single-modal perception is susceptible to external interference, which may cause incorrect recognition. The multi-modal perception can be employed to improve the OSR performance thanks to the complementarity between different modalities. So we propose a new multi-modal open set recognition (MMOSR) method in this paper. The MMOSR network is constructed with joint metric learning in logit space. By doing this, it can avoid the feature representation gap between different modalities, and effectively estimate the decision boundaries. Moreover, the entropy-based adaptive weight fusion method is developed to combine the multi-modal perception information. The weights of different modalities are automatically determined according to the entropy in the logit space. A bigger entropy will lead to a smaller weight of the corresponding modality. This can effectively prevent the influence of disturbance. Scaling the fusion logits by the single-modal relative reachability further enhances the unknown detection ability. Experiments show that our method can achieve more robust open set recognition performance with multi-modal input compared with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  2. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  3. Scheirer, W.J., de Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1757–1772 (2012)

    Article  Google Scholar 

  4. Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)

    Article  Google Scholar 

  5. Hong, D., et al.: More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 59(5), 4340–4354 (2020)

    Article  Google Scholar 

  6. Feng, D., et al.: Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 22(3), 1341–1360 (2020)

    Article  Google Scholar 

  7. Elmadany, N.E.D., He, Y., Guan, L.: Multimodal learning for human action recognition via bimodal/multimodal hybrid centroid canonical correlation analysis. IEEE Trans. Multimedia 21(5), 1317–1331 (2018)

    Article  Google Scholar 

  8. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2006), vol. 2, pp. 1735–1742. IEEE (2006)

    Google Scholar 

  9. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

    Google Scholar 

  10. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)

    Google Scholar 

  11. Wen, Y., Zhang, K., Li, Z., Qiao, Yu.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31

    Chapter  Google Scholar 

  12. He, X., Zhou, Y., Zhou, Z., Bai, S., Bai, X.: Triplet-center loss for multi-view 3D object retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1945–1954 (2018)

    Google Scholar 

  13. Cai, J., Meng, Z., Khan, A.S., Li, Z., O’Reilly, J., Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 302–309. IEEE (2018)

    Google Scholar 

  14. Cevikalp, H.: Best fitting hyperplanes for classification. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1076 (2017)

    Article  Google Scholar 

  15. Bendale, A., Boult, T.: Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1893–1902 (2015)

    Google Scholar 

  16. Mendes Júnior, R.R., et al.: Nearest neighbors distance ratio open-set classifier. Mach. Learn. 106(3), 359–386 (2016). https://doi.org/10.1007/s10994-016-5610-8

    Article  MathSciNet  MATH  Google Scholar 

  17. Bendale, A., Boult, T.E.: Towards open set deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1563–1572 (2016)

    Google Scholar 

  18. Yoshihashi, R., Shao, W., Kawakami, R., You, S., Iida, M., Naemura, T.: Classification-reconstruction learning for open-set recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4016–4025 (2019)

    Google Scholar 

  19. Ge, Z., Demyanov, S., Chen, Z., Garnavi, R.: Generative openMax for multi-class open set classification. arXiv preprint arXiv:1707.07418 (2017)

  20. Neal, L., Olson, M., Fern, X., Wong, W.-K., Li, F.: Open set learning with counterfactual images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 620–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_38

    Chapter  Google Scholar 

  21. Oza, P., Patel, V.M.: C2AE: class conditioned auto-encoder for open-set recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2307–2316 (2019)

    Google Scholar 

  22. Yang, H.M., Zhang, X.Y., Yin, F., Liu, C.L.: Robust classification with convolutional prototype learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3474–3482 (2018)

    Google Scholar 

  23. Chen, G., et al.: Learning open set network with discriminative reciprocal points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 507–522. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_30

    Chapter  Google Scholar 

  24. Miller, D., Sunderhauf, N., Milford, M., Dayoub, F.: Class anchor clustering: a loss for distance-based open set recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3570–3578 (2021)

    Google Scholar 

  25. Savinov, N., et al.: Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274 (2018)

  26. Strese, M., Schuwerk, C., Iepure, A., Steinbach, E.: Multimodal feature-based surface material classification. IEEE Trans. Haptics 10(2), 226–239 (2016)

    Article  Google Scholar 

  27. Zheng, H., Fang, L., Ji, M., Strese, M., Özer, Y., Steinbach, E.: Deep learning for surface material classification using haptic and visual information. IEEE Trans. Multimedia 18(12), 2407–2416 (2016)

    Article  Google Scholar 

  28. Dhamija, A.R., Günther, M., Boult, T.E.: Reducing network agnostophobia. arXiv preprint arXiv:1811.04110 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yimin Fu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fu, Y., Liu, Z., Yang, Y., Xu, L., Lan, H. (2022). Adaptive Open Set Recognition with Multi-modal Joint Metric Learning. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13534. Springer, Cham. https://doi.org/10.1007/978-3-031-18907-4_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18907-4_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18906-7

  • Online ISBN: 978-3-031-18907-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics