Skip to main content
Log in

Task-specific image summaries using semantic information and self-supervision

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Large annotated datasets are needed for successful Deep Learning methodologies to achieve human-level performance. These needs restrict the impact of Deep Learning and build the necessity to create smaller and richer representative datasets that can offer a potential solution to this problem. In this paper, we propose task-specific image corpus summarization using semantic information and self-supervision. Our methodology makes use of GAN for the generation of features and leverages rotational invariance for employing self-supervision. All these objectives are facilitated on features from Resnet34. A summary can be obtained efficiently by using k-means clustering on the semantic embedding space and then selecting examples nearest to centroids. In comparison to end-to-end trained models, the proposed model does not require retraining to obtain summaries of different lengths. We also test our model by extensive qualitative and quantitative experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Data available on request from the authors.

Code availibility

Code available on request from the authors.

References

  • Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp. 41–48

  • Benrhouma O, Hermassi H, Abd El-Latif AA, Belghith S (2016) Chaotic watermark for blind forgery detection in images. Multimed Tools Appl 75(14):8695–8718

    Article  Google Scholar 

  • Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: A holistic approach to semi-supervised learning. In: Advances in neural information processing systems, pp. 5049–5059

  • Cai D, He X, Li Z, Ma WY, Wen JR (2004) Hierarchical clustering of www image search results using visual, textual and link information. In: Proceedings of the 12th annual ACM international conference on Multimedia, ACM, pp. 952–959

  • Camargo JE, Gonzalez FA (2009) A multi-class kernel alignment method for image collection summarization. In: Iberoamerican congress on pattern recognition, Springer, pp. 545–552

  • Chen JY, Bouman CA, Dalton JC (2000) Hierarchical browsing and search of large image databases. IEEE Trans Image Process 9(3):442–455

    Article  Google Scholar 

  • Chen W, Chen X, Zhang J, Huang K (2016) A multi-task deep network for person re-identification. CoRR arXiv:1607.05369

  • Deng D (2007) Content-based image collection summarization and comparison using self-organizing maps. Pattern Recognit 40(2):718–727

    Article  Google Scholar 

  • Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp. 248–255

  • Dutta T, Singh A, Biswas S (2020) Adaptive margin diversity regularizer for handling data imbalance in zero-shot sbir. In: European conference on computervVision, Springer, pp. 349–364

  • Dutta T, Singh A, Biswas S (2020) Styleguide: zero-shot sketch-based image retrieval using style-guided image generation. IEEE Trans Multimed

  • Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  • Fisher Y (2012) Fractal image compression: theory and application. Springer Science & Business Media

  • Gad R, Talha M, Abd El-Latif AA, Zorkany M, Ayman ES, Nawal EF, Muhammad G (2018) Iris recognition using multi-algorithmic approaches for cognitive internet of things (ciot) framework. Future Gener Comput Syst 89:178–191

    Article  Google Scholar 

  • Gao B, Liu TY, Qin T, Zheng X, Cheng QS, Ma WY (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Proceedings of the 13th annual ACM international conference on Multimedia, ACM, pp. 112–121

  • Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. In: International conference on learning representations. https://openreview.net/forum?id=S1v4N2l0-

  • Gini C (1912) Variabilita e mutabilita. In: Pizetti E, Salvemini T (eds) Reprinted in memorie di metodologica statistica. Libreria Eredi Virgilio Veschi, Rome

    Google Scholar 

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  • Ionescu B, Gînscă AL, Boteanu B, Lupu M, Popescu A, Muller H (2016) Div150multi: a social image retrieval result diversification dataset with multi-topic queries. In: Proceedings of the 7th international conference on multimedia systems, pp. 46:1–46:6. https://doi.org/10.1145/2910017.2910620

  • Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strateg Manage J 17(6):441–458

    Article  Google Scholar 

  • Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images

  • Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2982–2991

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543

  • Połap D, Włodarczyk-Sielicka M, Wawrzyniak N (2021) Automatic ship classification for a riverside monitoring system using a cascade of artificial intelligence techniques including penalties and rewards. ISA Trans

  • Ruder S (2017) An overview of multi-task learning in deep neural networks. CoRR abs/1706.05098http://arxiv.org/abs/1706.05098

  • Sharma DK, Singh A, Khanna A, Jain A (2017) Evaluation of parameters and techniques for genetic algorithm based channel allocation in cognitive radio networks. In: 2017 tenth international conference on contemporary computing (IC3), IEEE, pp. 1–6

  • Sharma DK, Singh A, Saroha A (2018) Language identification for hindi language transliterated text in roman script using generative adversarial networks. In: Towards extensible and adaptable methods in computing, Springer, pp. 267–279

  • Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In: Proceedings of the IEEE international conference on computer vision, pp. 1–8

  • Singh A, Sharma DK (2020) Image collection summarization: past, present and future. In: Data visualization and knowledge engineering, Springer, pp. 49–78

  • Singh A, Virmani L, Subramanyam A (2019) Image corpus representative summarization. In: 2019 IEEE fifth international conference on multimedia bigdData (BigMM), IEEE, pp. 21–29

  • Sinha P, Mehrotra S, Jain R (2011) Effective summarization of large collections of personal photos. In: Proceedings of the 20th international conference companion on World wide web, ACM, pp. 127–128

  • Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: Summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5179–5187

  • Soni R, Kumar B, Chand S (2019) Optimal feature and classifier selection for text region classification in natural scene images using weka tool. Multimed Tools Appl 78(22):31757–31791

    Article  Google Scholar 

  • Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405

    Article  Google Scholar 

  • Stan D, Sethi IK (2003) eid: a system for exploration of image databases. Inf Process Manage 39(3):335–361

    Article  Google Scholar 

  • Thorndike RL (1953) Who belongs in the family? Psychometrika 18(4):267–276

    Article  Google Scholar 

  • Tschiatschek S, Iyer RK, Wei H, Bilmes JA (2014) Learning mixtures of submodular functions for image collection summarization. In: Advances in neural information processing systems, pp. 1413–1421

  • Wang H, Kawahara Y, Weng C, Yuan J (2017) Representative selection with structured sparsity. Pattern Recognit 63:268–278

    Article  Google Scholar 

  • Wang N, Li Q, Abd El-Latif AA, Zhang T, Niu X (2014) Toward accurate localization and high recognition performance for noisy iris images. Multimed Tools Appl 71(3):1411–1430

    Article  Google Scholar 

  • Xian Y, Lampert CH, Schiele B, Akata Z (2017) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. Preprint arXiv:1707.00600

  • Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Ttrans Pattern Analys Mach Intell 41(9):2251–2265

    Article  Google Scholar 

  • Yang C, Shen J, Peng J, Fan J (2013) Image collection summarization via dictionary learning for sparse representation. Pattern Recognit 46(3):948–961

    Article  Google Scholar 

  • Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: Proceedings of the European conference on computer vision, pp. 766–782

  • Zhou K, Qiao Y, Xiang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Thirty-second AAAI conference on artificial intelligence

Download references

Funding

This article received no Funding from External Sources.

Author information

Authors and Affiliations

Authors

Contributions

DKS: Supervision, Conceptualization, Methodology, Software, Data curation, Validation, Investigation, Visualization, Writing—original draft. AS: Conceptualization, Methodology, Data curation, Validation, Investigation, Writing—review & editing. SK S: Supervision, Methodology, Validation, Writing—review & editing. GS: Methodology, Validation, Writing—review & editing. JC-WL: Methodology, Writing—review & editing.

Corresponding author

Correspondence to Gautam Srivastava.

Ethics declarations

Conflict of interest

The authors have no Conflicts of Interest to declare for this manuscript.

Additional information

Communicated by Irfan Uddin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, D.K., Singh, A., Sharma, S.K. et al. Task-specific image summaries using semantic information and self-supervision. Soft Comput 26, 7581–7594 (2022). https://doi.org/10.1007/s00500-021-06603-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-06603-6

Keywords

Navigation