skip to main content
10.1145/3556223.3556236acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicccmConference Proceedingsconference-collections
research-article

Revisiting the Self-supervised Learning Method of Solving Jigsaw Puzzles

Published: 16 October 2022 Publication History

Abstract

Spatial information is important for unsupervised feature learning. It has been proved by previous work that solving jigsaw puzzles as a pretext task can be used to train a convolutional neural network, which is capable of solving other visual tasks, such as image classification and object detection. A jigsaw puzzle solver can tell that an object is made of parts and what these parts are, thus the learned features can capture semantically relevant content. This work is inspiring and proposed a powerful learning mechanism, which outperformed the previous state of the art, as the pre-training method, for object detection and classification tasks on PASCAL VOC 2007. However, the original work still has several deficiencies, especially the evaluating scheme and lack of empirical ablation analysis. In this paper, we choose a more direct evaluating scheme and more appropriate datasets to conduct empirical ablation experiments for the self-supervised learning mechanism of solving jigsaw puzzles, and we also extend the learning mechanism to train auto-encoders for more general evaluation. We have explored how to build convolutional neural networks and auto-encoders by playing jigsaw puzzles with two types of network architecture (siamese-ennead network and straight-line network), resulting in four different jigsaw puzzle solvers. In the experiments, we evaluated the features learned by these solvers and discussed the influences of several training tricks on them. The results showed that solving jigsaw puzzles was very effective for unsupervised representation learning. The best performance of each solver all outperformed the state of the art on STL-10. Especially, one solver achieved 80.07% ± 0.08% classification accuracy, which is about 5.87% higher than the state of the art method on STL-10. This is a huge improvement and well worth reporting. Besides, we reported many empirical results on the influence of different training tricks and network configurations, which are very useful for the application and further research of the jigsaw puzzle solvers.

References

[1]
A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. A. Riedmiller, and T. Brox. 2016. Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 9, 1734–1747.
[2]
P. Agrawal, J. Carreira, and J. Malik. 2015. Learning to see by moving. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile (December 2015), 37–45.
[3]
X. Wang and A. Gupta, “Unsupervised learning of visual representations using videos. 2015. In 2015 IEEE International Conference on Computer Vision, Santiago, Chile (December 2015), 2794–2802.
[4]
C. Doersch, A. Gupta, and A. A. Efros. 2015. Unsupervised visual representation learning by context prediction. In 2015 IEEE International Conference on Computer Vision, Santiago, Chile (December 2015), 1422–1430.
[5]
M. Noroozi and P. Favaro. 2016. Unsupervised learning of visual representations by solving jigsaw puzzles. In Computer Vision-ECCV 2016 -14th European Conference, Amsterdam, The Netherlands (October 2016), Proceedings, Part VI, 69–84.
[6]
Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E.Hubbard, and L. D. Jackel. 1989. Handwritten digit recognition with a backpropagation network. In Advances in Neural Information Processing Systems 2, NIPS Conference, Denver, Colorado, USA (November 1989), 396–404.
[7]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012 (December 2012), 1106–1114.
[8]
A. Coates, A. Y. Ng, and H. Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA (April 2011), 215–223.
[9]
H. Bourlard and Y. Kamp. 1988. Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59, 4 (September 1988), 291–294.
[10]
W. Luo, J. Li, J. Yang, W. Xu, and J. Zhang. 2018. Convolutional sparse autoencoders for image classification. IEEE Trans. Neural Netw. Learning Syst, 29, 7, 3289–3294.
[11]
M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus. 2010. Deconvolutional networks. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA (June 2010), 2528–2535.
[12]
D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. CoRR, vol. abs/1412.6980.
[13]
H. Robbins and S. Monro. 1951. A stochastic approximation method. The Annals of Mathematical Statistics, 22, 3, 400–407.
[14]
Samuel Lukas, Aditya Rama Mitra, Ririn Ikana Desanti, and Dion Krisnadi. 2016. Implementing Discrete Wavelet and Discrete Cosine Transform with Radial Basis Function Neural Network in Facial Image Recognition. Journal of Image and Graphics. 4, 1, 6-10.
[15]
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (June 2009, Miami, Florida, USA, 248–255.
[16]
S. Ioffe and C. Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France (July 2015), 448–456.
[17]
M. Abadi, P. Barham, and 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA (November 2016), 265–283.
[18]
K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. International conference on learning representations, 2015.
[19]
Hwei Jen Lin, Yoshimasa Tokuyama, and Zi Jun Lin. 2019. Residual Learning Based Convolutional Neural Network for Super Resolution, Journal of Image and Graphics. 7, 4, 126-129.
[20]
Afolabi O. Joshua, Fulufhelo V. Nelwamondo, and Gugulethu Mabuza-Hocquet. 2020. Blood Vessel Segmentation from Fundus Images Using Modified U-net Convolutional Neural Network. Journal of Image and Graphics. 8, 1, 21-25.
[21]
Ryo Hasegawa, Yutaro Iwamoto, and Yen-Wei Chen. 2020. Robust Japanese Road Sign Detection and Recognition in Complex Scenes Using Convolutional Neural Networks. Journal of Image and Graphics. 8, 3, 59-66.
[22]
Mengting Liu, Guoying Liu, Yongge Liu, and Qingju Jiao. 2020. Oracle Bone Inscriptions Recognition Based on Deep Convolutional Neural Network. Journal of Image and Graphics, 8, 4, 114-119.

Index Terms

  1. Revisiting the Self-supervised Learning Method of Solving Jigsaw Puzzles

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications Management
    July 2022
    289 pages
    ISBN:9781450396349
    DOI:10.1145/3556223
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Auto-encoders
    2. Convolutional Neural Networks
    3. Jigsaw Puzzles
    4. Self-supervised Learning
    5. Unsupervised Feature Learning
    6. Visuospatial Processes

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCCM 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 118
      Total Downloads
    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media