SimCLR-Inception: An Image Representation Learning and Recognition Model for Robot Vision

Jin, Mengyuan; Zhang, Yin; Cheng, Xiufeng; Ma, Li; Hu, Fang

doi:10.1007/978-3-031-47634-1_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14406))

Included in the following conference series:

Asian Conference on Pattern Recognition

251 Accesses

Abstract

Effective feature extraction is a key component in image recognition for robot vision. This paper presents an improved contrastive learning-based image feature extraction and classification model, termed SimCLR-Inception, to realize effective and accurate image recognition. By using the SimCLR, this model generates positive and negative image samples from unlabeled data through image augmentation and then minimizes the contrastive loss function to learn the image representations by exploring more underlying structure information. Furthermore, this proposed model uses the Inception V3 model to classify the image representations for improving recognition accuracy. The SimCLR-Inception model is compared with four representative image recognition models, including LeNet, VGG16, Inception V3, and EfficientNet V2 on a real-world Multi-class Weather (MW) data set. We use four representative metrics: accuracy, precision, recall, and F1-Score, to verify the performance of different models for image recognition. We show that the presented SimCLR-Inception model achieves all the successful runs and gives almost the best results. The accuracy is at least \(4\%\) improved by the Inception V3 model. It suggests that this model would work better for robot vision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://data.mendeley.com/datasets/4drtyfjtfy/1.

References

Albelwi, S.: Survey on self-supervised learning: auxiliary pretext tasks and contrastive learning methods in imaging. Entropy 24(4), 551 (2022)
Article Google Scholar
Bae, H., et al.: IROS 2019 lifelong robotic vision: object recognition challenge [competitions]. IEEE Robot. Autom. Mag. 27(2), 11–16 (2020)
Article Google Scholar
Cao, M.: Face recognition robot system based on intelligent machine vision image recognition. Int. J. Syst. Assur. Eng. Manage. 14(2), 708–717 (2023)
Article Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar
Dong, Y., Liu, Q., Du, B., Zhang, L.: Weighted feature fusion of convolutional neural network and graph attention network for hyperspectral image classification. IEEE Trans. Image Process. 31, 1559–1572 (2022)
Article Google Scholar
Falcon, W., Cho, K.: A framework for contrastive self-supervised learning and designing a new approach. arXiv preprint arXiv:2009.00104 (2020)
Gao, Q., Liu, J., Ju, Z.: Hand gesture recognition using multimodal data fusion and multiscale parallel convolutional neural network for human-robot interaction. Expert Syst. 38(5), e12490 (2021)
Article Google Scholar
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006). IEEE (2006)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Kim, H.E., Cosa-Linan, A., Santhanam, N., Jannesari, M., Maros, M.E., Ganslandt, T.: Transfer learning for medical image classification: a literature review. BMC Med. Imaging 22(1), 69 (2022)
Article Google Scholar
Lai, X., et al.: Semi-supervised semantic segmentation with directional context-aware consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Lan, R., Sun, L., Liu, Z., Lu, H., Pang, C., Luo, X.: MADNet: a fast and lightweight network for single-image super resolution. IEEE Trans. Cybern. 51(3), 1443–1453 (2020)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, S., et al.: An indoor autonomous inspection and firefighting robot based on slam and flame image recognition. Fire 6(3), 93 (2023)
Article Google Scholar
Li, Y., Yang, S., Zheng, Y., Lu, H.: Improved point-voxel region convolutional neural network: 3D object detectors for autonomous driving. IEEE Trans. Intell. Transp. Syst. 23(7), 9311–9317 (2021)
Article Google Scholar
Oluwafemi, A.G., Zenghui, W.: Multi-class weather classification from still image using said ensemble method. In: 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA). IEEE (2019)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Singh, P., Chaudhury, S., Panigrahi, B.K.: Hybrid MPSO-CNN: multi-level particle swarm optimized hyperparameters of convolutional neural network. Swarm Evol. Comput. 63, 100863 (2021)
Article Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Tan, M., Le, Q.: EfficientNetV2: smaller models and faster training. In: International Conference on Machine Learning (2021)
Google Scholar
Tan, Z., Teng, Z.: Improving generalization of image recognition with multi-branch generation network and contrastive learning. Multimedia Tools Appl. 82(18), 1–21 (2023)
Article Google Scholar
Wan, S., Goudos, S.: Faster R-CNN for multi-class fruit detection using a robotic vision system. Comput. Networks 168, 107036 (2020)
Article Google Scholar
Wang, J., Bertasius, G., Tran, D., Torresani, L.: Long-short temporal contrastive learning of video transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Xie, E., Ding, J., Wang, W., Zhan, X., Xu, H., Sun, P., Li, Z., Luo, P.: Detco: Unsupervised contrastive learning for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)
Google Scholar
Xu, F., Xu, F., Xie, J., Pun, C.M., Lu, H., Gao, H.: Action recognition framework in traffic scene for autonomous driving system. IEEE Trans. Intell. Transp. Syst. 23(11), 22301–22311 (2021)
Article Google Scholar
Yang, J., et al.: Unified contrastive learning in image-text-label space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Zeng, D., et al.: Positional contrastive learning for volumetric medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 221–230. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_21
Chapter Google Scholar
Zhou, W., Wang, H., Wan, Z.: Ore image classification based on improved CNN. Comput. Electr. Eng. 99, 107819 (2022)
Article Google Scholar

Download references

Acknowledgement

We acknowledge the funding support from the National Natural Science Foundation of China (71974069).

Author information

Authors and Affiliations

College of Information Engineering, Hubei University of Chinese Medicine, Wuhan, 430065, People’s Republic of China
Mengyuan Jin, Li Ma & Fang Hu
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, People’s Republic of China
Yin Zhang
School of Information Management, Central China Normal University, Wuhan, 430079, People’s Republic of China
Xiufeng Cheng
Department of Mathematics and Statistics, University of West Florida, Pensacola, 32514, USA
Fang Hu

Authors

Mengyuan Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiufeng Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Li Ma
View author publications
You can also search for this author in PubMed Google Scholar
Fang Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang Hu .

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Fukuoka, Japan
Huimin Lu
The University of Sydney, Sydney, NSW, Australia
Michael Blumenstein
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Osaka University, Osaka, Ibaraki, Japan
Yasushi Yagi
Kyushu Institute of Technology, Kitakyushu, Japan
Tohru Kamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, M., Zhang, Y., Cheng, X., Ma, L., Hu, F. (2023). SimCLR-Inception: An Image Representation Learning and Recognition Model for Robot Vision. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham. https://doi.org/10.1007/978-3-031-47634-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-47634-1_11
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47633-4
Online ISBN: 978-3-031-47634-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SimCLR-Inception: An Image Representation Learning and Recognition Model for Robot Vision