Skip to main content
Log in

Deep learned compact binary descriptor with a lightweight network-in-network architecture for visual description

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Binary descriptors have been widely used for real-time image retrieval and correspondence matching. However, most of the learned descriptors are obtained using a large deep neural network (DNN) with several million parameters, and the learned binary codes are generally not invariant to many geometrical variances which is crucial for accurate correspondence matching. To address this problem, we proposed a new learning approach using a lightweight DNN architecture via a stack of multiple multilayer perceptrons based on the network in network (NIN) architecture, and a restricted Boltzmann machine (RBM). The latter is used for mapping the features to binary codes, and carry out the geometrically invariant correspondence matching task. Our experimental results on several benchmark datasets (e.g., Brown, Oxford, Paris, INRIA Holidays, RomePatches, HPatches, and CIFAR-10) show that the proposed approach produces the learned binary descriptor that outperforms other baseline self-supervised binary descriptors in terms of correspondence matching despite the smaller size of its DNN. Most importantly, the proposed approach does not freeze the features that are obtained while pre-training the NIN model. Instead, it fine-tunes the features while learning the features needed for binary mapping through the RBM. Additionally, its lightweight architecture makes it suitable for resource-constrained devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Chamasemani, F.F., Affendey, L.S., Mustapha, N., Khalid, F.: Video abstraction using density-based clustering algorithm. Vis. Comput. 34, 1299–1314 (2018)

    Article  Google Scholar 

  2. Kabbai, L., Abdellaoui, M., Douik, A.: Image classification by combining local and global features. Vis. Comput. 35, 679–693 (2019)

    Article  Google Scholar 

  3. Ali, M., Jones, M.W., Xie, X., Williams, M.: TimeCluster: dimension reduction applied to temporal data for visual analytics. Vis. Comput. 35, 1013–1026 (2019)

    Article  Google Scholar 

  4. Ranathunga, L., Zainuddin, R., Abdullah, N.A.: Performance evaluation of the combination of Compacted Dither Pattern Codes with Bhattacharyya classifier in video visual concept depiction. Multimed. Tools Appl. 54, 263–289 (2011)

    Article  Google Scholar 

  5. Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: 2011 International Conference on Computer Vision, pp. 2548–2555. IEEE (2011)

  6. Kalpana, J., Krishnamoorthi, R.: Color image retrieval technique with local features based on orthogonal polynomials model and SIFT. Multimed. Tools Appl. 75, 49–69 (2016)

    Article  Google Scholar 

  7. Bandara, A., Ranathunga, L., Abdullah, N.: Invariant properties of a locally salient dither pattern with a spatial-chromatic histogram. In: 2013 8th IEEE International Conference on Industrial and Information Systems (ICIIS), pp. 304–308. IEEE (2013)

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(20), 91–110 (2004)

    Article  Google Scholar 

  9. Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (SURF). CVIU 110(3), 346–359 (2008)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  11. Chen, L., Wang, R., Yang, J., Xue, L., Hu, M.: Multi-label image classification with recurrently learning semantic dependencies. Vis. Comput. 35, 1361–1371 (2019)

    Article  Google Scholar 

  12. Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)

  13. Lin, K., Lu, J., Chen, C.-S., Zhou, J., Sun, M.-T.: Unsupervised deep learning of compact binary descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1501–1514 (2018)

    Article  Google Scholar 

  14. Juan, L., Gwun, O.: A comparison of SIFT, PCA-SIFT and SURF. IJIP 3, 143–152 (2009)

    Google Scholar 

  15. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF, pp. 2564–2571 (2011)

  16. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. In: European conference on computer vision, pp. 778–792. Springer (2010)

  17. Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–517. IEEE (2012)

  18. Trzcinski, T., Christoudias, M., Lepetit, V.: Learning image descriptors with boosting. IEEE Trans. Pattern Anal. Mach. Intell. 37, 597–610 (2015)

    Article  Google Scholar 

  19. Fan, B., Kong, Q., Trzcinski, T., Wang, Z., Pan, C., Fua, P.: Receptive fields selection for binary feature description. IEEE Trans. Image Process. 23, 2583–2595 (2014)

    Article  MathSciNet  Google Scholar 

  20. Zhang, S., Tian, Q., Huang, Q., Gao, W., Rui, Y.: USB: ultrashort binary descriptor for fast visual matching and retrieval. IEEE Trans. Image Process. 23, 3671–3683 (2014)

    Article  MathSciNet  Google Scholar 

  21. Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. IEEE Trans. Image Process. 23, 3368–3380 (2014)

    Article  MathSciNet  Google Scholar 

  22. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)

  23. Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 2 (2016)

    MATH  Google Scholar 

  24. Kumar, B., Carneiro, G., Reid, I., et al. Learning local image descriptors with deep siamese and triplet convolutional networks by minimising global loss functions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5385–5394 (2016)

  25. Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3279–3286 (2015)

  26. Duan, Y., Lu, J., Wang, Z., Feng, J., Zhou, J.: Learning deep binary descriptor with multi-quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1183–1192 (2017)

  27. Trzcinski, T., Lepetit, V.: Efficient discriminative projections for compact binary descriptors. In: European Conference on Computer Vision. Springer, pp. 228–242 (2012)

  28. Strecha, C., Bronstein, A., Bronstein, M., Fua, P.: LDAHash: improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 34, 66–78 (2012)

    Article  Google Scholar 

  29. Tian, Y., Fan, B., Wu, F.: L2-net: deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 661–669 (2017)

  30. Cho, K.H., Raiko, T., Ilin, A.: Gaussian–Bernoulli deep boltzmann machine. In: The 2013 International Joint Conference on Neural Networks (IJCNN) pp. 1–7. IEEE (2013)

  31. Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv:1312.4400

  32. Sinha, A., Banerji, S., Liu, C.: New color GPHOG descriptors for object and scene image classification. Mach. Vis. Appl. 25, 361–375 (2014)

    Article  Google Scholar 

  33. Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin, F., Schmid, C.: Local convolutional features with unsupervised training for image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 91–99 (2015)

  34. Luo, Z., Shen, T., Zhou, L., Zhu, S., Zhang, R., Yao, Y., Fang, T., Quan, L.: Geodesc: learning local descriptors by integrating geometry constraints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 168–183 (2018)

  35. Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11016–11025 (2019)

  36. Markuš, N., Pandžić, I., Ahlberg, J.: Learning local descriptors by optimizing the keypoint-correspondence criterion: applications to face matching, learning from unlabeled videos and 3D-shape retrieval. IEEE Trans. Image Process. 28, 279–290 (2018)

    Article  MathSciNet  Google Scholar 

  37. Bandara, R., Ranathunga, L., Abdullah, N.A.: Nature inspired dimensional reduction technique for fast and invariant visual feature extraction. IJATCSE 8, 696–706 (2019). https://doi.org/10.30534/ijatcse/2019/57832019

    Article  Google Scholar 

  38. Yang, X., Cheng, K.-T.: Local difference binary for ultrafast and distinctive feature description. IEEE Trans. Pattern Anal. Mach. Intell. 36, 188–194 (2014)

    Article  Google Scholar 

  39. Trzcinski, T., Christoudias, M., Fua, P., Lepetit, V.: Boosting binary keypoint descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2874–2881 (2013)

  40. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, 2006, FOCS’06, pp. 459–468. IEEE (2006)

  41. Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approx. Reason. 50, 969–978 (2009)

    Article  Google Scholar 

  42. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556

  43. Liu, H., Wang, R., Shan, S., Chen, X.: Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2064–2072 (2016)

  44. Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S.: Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence. Association for the Advancement of Artificial Intelligence (AAAI) (2014)

  45. Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3270–3278 (2015)

  46. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  47. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

  48. Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: European Conference on Computer Vision, pp. 584–599. Springer (2014)

  49. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, pp. 265–283. USENIX Association (2016)

  50. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  51. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, CVPR’07, pp. 1–8. IEEE (2007)

  52. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  53. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: European Conference on Computer Vision, pp. 304–317. Springer (2008)

  54. Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 43–57 (2011)

    Article  Google Scholar 

  55. Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5173–5182 (2017)

  56. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images, vol 1. Citeseer (2009)

  57. Liu, J., Rosin, P.L., Sun, X., Xiao, J., Lian, Z.: Image-driven unsupervised 3D model co-segmentation. Vis. Comput 35, 909–920 (2019)

    Article  Google Scholar 

  58. Lin, K., Lu, J., Chen, C.-S., Zhou, J.: Learning compact binary descriptors with unsupervised deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1183–1192. (2016)

  59. Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: European Conference on Computer Vision, pp. 430–443. Springer (2006)

  60. Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1704–1716 (2012)

    Article  Google Scholar 

  61. Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813

  62. Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM international conference on multimedia, pp. 157–166. ACM (2014)

  63. Yunchao, G., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2916–2929 (2013)

    Article  Google Scholar 

  64. Reddy Mopuri, K., Venkatesh Babu, R.: Object level deep feature pooling for compact image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 62–70 (2015)

  65. Björkman, M., Bergström, N., Kragic, D.: Detecting, segmenting and tracking unknown objects using multi-label MRF inference. Comput. Vis. Image Underst. 118, 111–127 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work was carried out with the support of the Senate Research Council, University of Moratuwa, Sri Lanka (Grant No. SRC-16-1), and National Research Council, Sri Lanka (Grant No. 12-017).

Funding

This study was funded by the Senate Research Council, University of Moratuwa, Sri Lanka (Grant No. SRC-16-1), and National Research Council, Sri Lanka (Grant No. 12-017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravimal Bandara.

Ethics declarations

Conflict of interest

Authors, Ravimal Bandara, Lochandaka Ranathunga and Nor Aniza Abdullah, declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bandara, R., Ranathunga, L. & Abdullah, N.A. Deep learned compact binary descriptor with a lightweight network-in-network architecture for visual description. Vis Comput 37, 275–290 (2021). https://doi.org/10.1007/s00371-020-01798-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01798-5

Keywords

Navigation