Skip to main content
Log in

Fractal dimension of bag-of-visual words

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Scene recognition is an important and challenging problem in computer vision. One of the most used scene recognition methods is the bag-of-visual words. Despite the interesting results, this approach does not capture the detail richness of spatial information of the visual words on the image. In this paper, we propose a new method to describe the visual words using the fractal dimension. Our method estimates the fractal dimension of each visual word on image through box-counting method. The fractal dimension is capable of providing complex and spatial information of the visual words in a simple and efficient way. We validate our method on three well-known scene and object datasets, and the experimental results reveal that our method leads to highly discriminative features of the visual words. In addition, the proposed method has achieved competitive results compared to popular methods in scene classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Backes AR, Eler DM, Minghim R, Bruno OM (2010) Characterizing 3d shapes using fractal dimension. In: 15th Iberoamerican congress on pattern recognition. Springer, Berlin, pp 14–21

  2. Bader M (2013) How to construct space-filling curves. Springer, Berlin, pp 15–30

    Google Scholar 

  3. Bhattacharya P, Gavrilova M (2013) A survey of landmark recognition using the bag-of-words framework. In: Intelligent computer graphics 2012, chap. Springer, Berlin, pp 243–263. https://doi.org/10.1007/978-3-642-31745-3_13

  4. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: In workshop on statistical learning in computer vision, ECCV, pp 1–22

  5. Cui Y, Cai Z, Lu W (2008) Scene recognition for mine rescue robot localization based on vision. Trans Nonferrous Metals Soc China 18(2):432–437

    Article  Google Scholar 

  6. Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Conference on computer vision and pattern recognition workshop, 2004. CVPRW ’04, pp 178–178. https://doi.org/10.1109/CVPR.2004.109

  7. Gonçalves WN, Bruno OM (2013) Combining fractal and deterministic walkers for texture analysis and classification. Pattern Recognit. 46(11):2953–2968

    Article  MATH  Google Scholar 

  8. Gonçalves WN, Machado BB, Bruno OM (2014) Texture descriptor combining fractal dimension and artificial crawlers. Physica A Stat Mech Appl 395:358–370. https://doi.org/10.1016/j.physa.2013.10.011

    Article  Google Scholar 

  9. Hearst M, Dumais S, Osman E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28. https://doi.org/10.1109/5254.708428

    Article  Google Scholar 

  10. Huang K, Wang C, Tao D (2015) High-order topology modeling of visual words for image classification. IEEE Trans Image Process 24:3598–3608. https://doi.org/10.1109/TIP.2015.2449081

    Article  MathSciNet  Google Scholar 

  11. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  12. Jiang Y, Yuan J, Yu G (2012) Randomized spatial partition for scene recognition. In: Computer vision–ECCV 2012. Springer, pp 730–743

  13. Johnston R (2014) Least squares regression line. Springer, Dordrecht, pp 3526–3529

    Google Scholar 

  14. Khan R, Barat C, Muselet D, Ducottet C et al (2012) Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British machine vision conference

  15. Kwitt R, Vasconcelos N, Rasiwasia N (2012) Scene recognition on the semantic manifold. In: Computer vision–ECCV 2012. Springer, pp 359–372

  16. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition, CVPR ’06. IEEE Computer Society, Washington, DC, USA, pp 2169–2178. https://doi.org/10.1109/CVPR.2006.68

  17. Li C, Reiter A, Hager GD (2015) Beyond spatial pooling: fine-grained representation learning in multiple domains. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4913–4922 . https://doi.org/10.1109/CVPR.2015.7299125

  18. Li LJ, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: 2007 IEEE 11th international conference on computer vision, pp 1–8. https://doi.org/10.1109/ICCV.2007.4408872

  19. Li LJ, Su H, Fei-Fei L, Xing EP (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Advances in neural information processing systems, pp 1378–1386

  20. Li LJ, Su H, Lim Y, Fei-Fei L (2012) Objects as attributes for scene classification. In: Trends and topics in computer vision. Springer, pp 57–69

  21. Liu C, Yuen J, Torralba A, Sivic J, Freeman WT: Sift flow: dense correspondence across different scenes. In: Computer vision–ECCV 2008. Springer, pp 28–42 (2008)

  22. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  MathSciNet  Google Scholar 

  23. Mandelbrot B (1983) The fractal geometry of nature. Einaudi paperbacks. Henry Holt and Company. https://books.google.co.uk/books?id=0R2LkE3N7-oC

  24. Novianto S, Suzuki Y, Maeda J (2003) Near optimum estimation of local fractal dimension for image segmentation. Pattern Recognit Lett 24(1–3):365–374. https://doi.org/10.1016/S0167-8655(02)00261-1

    Article  Google Scholar 

  25. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175. https://doi.org/10.1023/A:1011139631724

    Article  MATH  Google Scholar 

  26. Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 1307–1314

  27. Parizi SN, Oberlin JG, Felzenszwalb PF (2012) Reconfigurable models for scene recognition. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2775–2782

  28. Peitgen HO, Jürgens H, Saupe D (1992) Chaos and fractals: new frontiers of science. Springer, Berlin

    Book  MATH  Google Scholar 

  29. Sarkar N, Chaudhuri BB (1994) An efficient differential box-counting approach to compute fractal dimension of image. IEEE Trans Syst Man Cybern 24(1):115–120. https://doi.org/10.1109/21.259692

    Article  Google Scholar 

  30. Shabou A, LeBorgne H (2012) Locality-constrained and spatially regularized coding for scene categorization. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3618–3625

  31. Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: Computer vision–ECCV 2010. Springer, pp 776–789

  32. Tsai CF (2012) Bag-of-words representation in image annotation: A review. ISRN Artificial Intelligence 2012

  33. Vailaya A, Figueiredo MAT, Jain AK, Zhang HJ (2001) Image classification for content-based indexing. IEEE Trans Image Process 10(1):117–130. https://doi.org/10.1109/83.892448

    Article  MATH  Google Scholar 

  34. van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283. https://doi.org/10.1109/TPAMI.2009.132

    Article  Google Scholar 

  35. Varma M, Garg R (2007) Locally invariant fractal features for statistical texture classification. In: 2007 IEEE 11th international conference on computer vision, pp 1–8 (2007). https://doi.org/10.1109/ICCV.2007.4408876

  36. Wu J, Rehg JM (2011) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501

    Article  Google Scholar 

  37. Xu S, Weng Y (2006) A new approach to estimate fractal dimensions of corrosion images. Pattern Recognit Lett 27(16):1942–1947. https://doi.org/10.1016/j.patrec.2006.05.005

    Article  Google Scholar 

  38. Xu Y, Huang S, Ji H, Fermuller C (2009) Combining powerful local and global statistics for texture description. In: IEEE conference on computer vision and pattern recognition, pp 573–580. https://doi.org/10.1109/CVPR.2009.5206741

  39. Zhang E, Mayo M (2010) Improving bag-of-words model with spatial information. In: 2010 25th international conference of image and vision computing New Zealand (IVCNZ), pp 1–8. https://doi.org/10.1109/IVCNZ.2010.6148795

  40. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

  41. Zhu J, Li LJ, Fei-Fei L, Xing EP (2010) Large margin learning of upstream scene understanding models. In: Advances in neural information processing systems, pp 2586–2594

Download references

Acknowledgements

This work was supported by the FUNDECT—State of Mato Grosso do Sul Foundation to Support Education, Science and Technology, CAPES—Brazilian Federal Agency for Support and Evaluation of Graduate Education, and CNPq—National Council for Scientific and Technological Development. The Titan X Pascal used for this research was donated by the NVIDIA Corporation. Lucas Correia Ribas gratefully acknowledges the financial support grant #2016/23763-8, São Paulo Research Foundation (FAPESP). Odemir M. Bruno thanks the financial support of CNPq (Grant # 307797/2014-7) and FAPESP (Grant #s 14/08026-1 and 16/18809-9).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wesley Nunes Gonçalves.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ribas, L.C., Gonçalves, D.N., de Andrade Silva, J. et al. Fractal dimension of bag-of-visual words. Pattern Anal Applic 22, 89–98 (2019). https://doi.org/10.1007/s10044-018-0736-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-018-0736-x

Keywords

Navigation