A robust CBIR framework in between bags of visual words and phrases models for specific image datasets

Ouni, Achref; Urruty, Thierry; Visani, Muriel

doi:10.1007/s11042-018-5841-8

A robust CBIR framework in between bags of visual words and phrases models for specific image datasets

Published: 12 March 2018

Volume 77, pages 26173–26189, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

335 Accesses
Explore all metrics

Abstract

One objective of the Content Based Image Retrieval research field is to propose new methodologies and tools to manage the increasing number of images available. Linked to a specific context of small expert datasets without prior knowledge, our research work focuses on improving the discriminative power of the image representation while keeping the same efficiency for retrieval. Based on the well-known bag of visual words model, we propose three different methodologies inspired by the visual phrase model effectiveness and a compression technique which ensures the same effectiveness for retrieval than the BoVW model. Our experimental results study the performance of our proposals on different well known benchmark datasets and show its good performance compared to other recent approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the Discriminative Power of Bag of Visual Words Model

Enhanced bag of visual words representations for content based image retrieval: a comparative study

Article 27 May 2019

An Analysis of BoVW and cBoVW Based Image Retrieval

References

Aggarwal CC (2015) Data mining—the textbook. Springer, Berlin
MATH Google Scholar
Alqasrawi Y, Neagu D, Cowling PI (2013) Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification. Signal Image Video Process 7(4):759–775
Article Google Scholar
Babenko A, Lempitsky VS (2015) Aggregating local deep features for image retrieval. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 1269–1277
Babenko A, Slesarev A, Chigorin A, Lempitsky VS (2014) Neural codes for image retrieval. In: Fleet D J, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014—13th European conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I, volume 8689 of Lecture Notes in Computer Science. Springer, Berlin, pp 584–599
Google Scholar
Bay H, Tuytelaars T, Gool L (2006) Surf: speeded up robust features. In: Leonardis A, Bischof H, Pinz A (eds) Computer vision—ECCV 2006, volume 3951 of Lecture Notes in Computer Science. Springer, Berlin, pp 404–417
Chapter Google Scholar
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv 44(1):1:1–1:50
Article Google Scholar
Chatoux H, Lecellier F, Fernandez-Maloigne C (2016) Comparative study of descriptors with dense key points. In: 23rd international conference on pattern recognition, ICPR 2016, Cancún, Mexico, December 4–8, 2016, pp 1988–1993
Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2012) The PASCAL visual object classes challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/-VOC/voc2012/workshop/index.html
Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014, Columbus, OH, USA, June 23–28, 2014. IEEE Computer Society, pp 580–587
Harris Z (1954) Distributional structure. Word 10(23):146–162
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR arXiv:1512.03385
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR arXiv:1502.01852
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. CoRR arXiv:1603.05027
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR arXiv:1502.03167
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth AZD, Torr P (eds) European conference on computer vision, volume I of LNCS. Springer, pp 304–317
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 23rd IEEE conference on computer vision & pattern recognition (CVPR ’10). IEEE Computer Society, San Francisco, pp 3304–3311
Jiang W, Zhao Z, Su F (2016) Bayes pooling of visual phrases for object retrieval. Multimedia Tools Appl 75(15):9095–9119
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates Inc., pp 1097–1105
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR 2006), 17–22 June 2006, New York, pp 2169–2178
Li F, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Article Google Scholar
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: Schuurmans D, Wellman MP (eds) Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016. AAAI Press, Phoenix, pp 1266–1272
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. AAAI Press, pp 1617–1623
Lowe DG (1999) Object recognition from local scale-invariant features. In: International conference on computer vision, vol 2, pp 1150–1157
Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 2161–2168
Ouni A, Urruty T, Visani M (2017) Improving the discriminative power of bag of visual words model. In: MultiMedia modeling—23rd international conference, MMM 2017, Reykjavik, Iceland, January 4–6, 2017, Proceedings, Part II, pp 245–256
Google Scholar
Pedrosa G, Traina A (2013) From bag-of-visual-words to bag-of-visual-phrases using n-grams. In: 2013 26th SIBGRAPI—conference on graphics, patterns and images (SIBGRAPI), pp 304–311
Perronnin F, Dance CR (2007) Fisher kernels on visual vocabularies for image categorization. In: 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18–23 June 2007. IEEE Computer Society, Minneapolis
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ren Y, Bugeau A, Benois-Pineau J (2014) Bag-of-bags of words irregular graph pyramids vs spatial pyramid matching for image retrieval. In: 2014 4th international conference on image processing theory, tools and applications (IPTA), pp 1–6
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision, pp 1470–1477
Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR arXiv:1602.07261
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. CoRR arXiv:1409.4842
van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Article Google Scholar
Wang JZ, Li J, Wiederhold G (2001) Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963
Article Google Scholar
Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. CoRR arXiv:1607.06215
Yang Y, Newsam SD (2011) Spatial pyramid co-occurrence for image classification. In: Metaxas DN, Quan L, Sanfeliu A, Gool LJV (eds) IEEE international conference on computer vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011. IEEE Computer Society, pp 1465–1472
Yeganli F, Nazzal M, Özkaramanli H (2015) Image super-resolution via sparse representation over multiple learned dictionaries based on edge sharpness and gradient phase angle. Signal Image Video Process 9:285–293
Article Google Scholar

Download references

Acknowledgments

This research is supported by the Poitou-Charentes Regional Founds for Research activities and the European Regional Development Founds (ERDF) inside the e-Patrimoine project from the ax 1 of the NUMERIC Program.

Author information

Authors and Affiliations

XLIM, UMR CNRS 7252, University of Poitiers, Poitiers, France
Achref Ouni & Thierry Urruty
Laboratory L3i, University of La Rochelle, La Rochelle, France
Muriel Visani
Vietnam-France ICT Laboratory, University of Science and Technology of Hanoï, Hanoi, Vietnam
Muriel Visani
Laboratory LaBRI, University of Bordeaux, Bordeaux, France
Muriel Visani

Authors

Achref Ouni
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Urruty
View author publications
You can also search for this author in PubMed Google Scholar
Muriel Visani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thierry Urruty.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ouni, A., Urruty, T. & Visani, M. A robust CBIR framework in between bags of visual words and phrases models for specific image datasets. Multimed Tools Appl 77, 26173–26189 (2018). https://doi.org/10.1007/s11042-018-5841-8

Download citation

Received: 09 June 2017
Revised: 10 December 2017
Accepted: 26 February 2018
Published: 12 March 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11042-018-5841-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust CBIR framework in between bags of visual words and phrases models for specific image datasets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving the Discriminative Power of Bag of Visual Words Model

Enhanced bag of visual words representations for content based image retrieval: a comparative study

An Analysis of BoVW and cBoVW Based Image Retrieval

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A robust CBIR framework in between bags of visual words and phrases models for specific image datasets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving the Discriminative Power of Bag of Visual Words Model

Enhanced bag of visual words representations for content based image retrieval: a comparative study

An Analysis of BoVW and cBoVW Based Image Retrieval

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation