Deep learned compact binary descriptor with a lightweight network-in-network architecture for visual description

Bandara, Ravimal; Ranathunga, Lochandaka; Abdullah, Nor Aniza

doi:10.1007/s00371-020-01798-5

Deep learned compact binary descriptor with a lightweight network-in-network architecture for visual description

Original Article
Published: 30 January 2020

Volume 37, pages 275–290, (2021)
Cite this article

The Visual Computer Aims and scope Submit manuscript

313 Accesses
4 Citations
Explore all metrics

Abstract

Binary descriptors have been widely used for real-time image retrieval and correspondence matching. However, most of the learned descriptors are obtained using a large deep neural network (DNN) with several million parameters, and the learned binary codes are generally not invariant to many geometrical variances which is crucial for accurate correspondence matching. To address this problem, we proposed a new learning approach using a lightweight DNN architecture via a stack of multiple multilayer perceptrons based on the network in network (NIN) architecture, and a restricted Boltzmann machine (RBM). The latter is used for mapping the features to binary codes, and carry out the geometrically invariant correspondence matching task. Our experimental results on several benchmark datasets (e.g., Brown, Oxford, Paris, INRIA Holidays, RomePatches, HPatches, and CIFAR-10) show that the proposed approach produces the learned binary descriptor that outperforms other baseline self-supervised binary descriptors in terms of correspondence matching despite the smaller size of its DNN. Most importantly, the proposed approach does not freeze the features that are obtained while pre-training the NIN model. Instead, it fine-tunes the features while learning the features needed for binary mapping through the RBM. Additionally, its lightweight architecture makes it suitable for resource-constrained devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning-Based Descriptors for Object Instance Search

Restricted Boltzmann machine as an aggregation technique for binary descriptors

Article 11 December 2019

From Local Binary Patterns to Pixel Difference Networks for Efficient Visual Representation Learning

References

Chamasemani, F.F., Affendey, L.S., Mustapha, N., Khalid, F.: Video abstraction using density-based clustering algorithm. Vis. Comput. 34, 1299–1314 (2018)
Article Google Scholar
Kabbai, L., Abdellaoui, M., Douik, A.: Image classification by combining local and global features. Vis. Comput. 35, 679–693 (2019)
Article Google Scholar
Ali, M., Jones, M.W., Xie, X., Williams, M.: TimeCluster: dimension reduction applied to temporal data for visual analytics. Vis. Comput. 35, 1013–1026 (2019)
Article Google Scholar
Ranathunga, L., Zainuddin, R., Abdullah, N.A.: Performance evaluation of the combination of Compacted Dither Pattern Codes with Bhattacharyya classifier in video visual concept depiction. Multimed. Tools Appl. 54, 263–289 (2011)
Article Google Scholar
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: 2011 International Conference on Computer Vision, pp. 2548–2555. IEEE (2011)
Kalpana, J., Krishnamoorthi, R.: Color image retrieval technique with local features based on orthogonal polynomials model and SIFT. Multimed. Tools Appl. 75, 49–69 (2016)
Article Google Scholar
Bandara, A., Ranathunga, L., Abdullah, N.: Invariant properties of a locally salient dither pattern with a spatial-chromatic histogram. In: 2013 8th IEEE International Conference on Industrial and Information Systems (ICIIS), pp. 304–308. IEEE (2013)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(20), 91–110 (2004)
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (SURF). CVIU 110(3), 346–359 (2008)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Chen, L., Wang, R., Yang, J., Xue, L., Hu, M.: Multi-label image classification with recurrently learning semantic dependencies. Vis. Comput. 35, 1361–1371 (2019)
Article Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)
Lin, K., Lu, J., Chen, C.-S., Zhou, J., Sun, M.-T.: Unsupervised deep learning of compact binary descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1501–1514 (2018)
Article Google Scholar
Juan, L., Gwun, O.: A comparison of SIFT, PCA-SIFT and SURF. IJIP 3, 143–152 (2009)
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF, pp. 2564–2571 (2011)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. In: European conference on computer vision, pp. 778–792. Springer (2010)
Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–517. IEEE (2012)
Trzcinski, T., Christoudias, M., Lepetit, V.: Learning image descriptors with boosting. IEEE Trans. Pattern Anal. Mach. Intell. 37, 597–610 (2015)
Article Google Scholar
Fan, B., Kong, Q., Trzcinski, T., Wang, Z., Pan, C., Fua, P.: Receptive fields selection for binary feature description. IEEE Trans. Image Process. 23, 2583–2595 (2014)
Article MathSciNet Google Scholar
Zhang, S., Tian, Q., Huang, Q., Gao, W., Rui, Y.: USB: ultrashort binary descriptor for fast visual matching and retrieval. IEEE Trans. Image Process. 23, 3671–3683 (2014)
Article MathSciNet Google Scholar
Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. IEEE Trans. Image Process. 23, 3368–3380 (2014)
Article MathSciNet Google Scholar
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)
Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 2 (2016)
MATH Google Scholar
Kumar, B., Carneiro, G., Reid, I., et al. Learning local image descriptors with deep siamese and triplet convolutional networks by minimising global loss functions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5385–5394 (2016)
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3279–3286 (2015)
Duan, Y., Lu, J., Wang, Z., Feng, J., Zhou, J.: Learning deep binary descriptor with multi-quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1183–1192 (2017)
Trzcinski, T., Lepetit, V.: Efficient discriminative projections for compact binary descriptors. In: European Conference on Computer Vision. Springer, pp. 228–242 (2012)
Strecha, C., Bronstein, A., Bronstein, M., Fua, P.: LDAHash: improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 34, 66–78 (2012)
Article Google Scholar
Tian, Y., Fan, B., Wu, F.: L2-net: deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 661–669 (2017)
Cho, K.H., Raiko, T., Ilin, A.: Gaussian–Bernoulli deep boltzmann machine. In: The 2013 International Joint Conference on Neural Networks (IJCNN) pp. 1–7. IEEE (2013)
Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv:1312.4400
Sinha, A., Banerji, S., Liu, C.: New color GPHOG descriptors for object and scene image classification. Mach. Vis. Appl. 25, 361–375 (2014)
Article Google Scholar
Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin, F., Schmid, C.: Local convolutional features with unsupervised training for image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 91–99 (2015)
Luo, Z., Shen, T., Zhou, L., Zhu, S., Zhang, R., Yao, Y., Fang, T., Quan, L.: Geodesc: learning local descriptors by integrating geometry constraints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 168–183 (2018)
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11016–11025 (2019)
Markuš, N., Pandžić, I., Ahlberg, J.: Learning local descriptors by optimizing the keypoint-correspondence criterion: applications to face matching, learning from unlabeled videos and 3D-shape retrieval. IEEE Trans. Image Process. 28, 279–290 (2018)
Article MathSciNet Google Scholar
Bandara, R., Ranathunga, L., Abdullah, N.A.: Nature inspired dimensional reduction technique for fast and invariant visual feature extraction. IJATCSE 8, 696–706 (2019). https://doi.org/10.30534/ijatcse/2019/57832019
Article Google Scholar
Yang, X., Cheng, K.-T.: Local difference binary for ultrafast and distinctive feature description. IEEE Trans. Pattern Anal. Mach. Intell. 36, 188–194 (2014)
Article Google Scholar
Trzcinski, T., Christoudias, M., Fua, P., Lepetit, V.: Boosting binary keypoint descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2874–2881 (2013)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, 2006, FOCS’06, pp. 459–468. IEEE (2006)
Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approx. Reason. 50, 969–978 (2009)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Liu, H., Wang, R., Shan, S., Chen, X.: Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2064–2072 (2016)
Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S.: Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence. Association for the Advancement of Artificial Intelligence (AAAI) (2014)
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3270–3278 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: European Conference on Computer Vision, pp. 584–599. Springer (2014)
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, pp. 265–283. USENIX Association (2016)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, CVPR’07, pp. 1–8. IEEE (2007)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: European Conference on Computer Vision, pp. 304–317. Springer (2008)
Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 43–57 (2011)
Article Google Scholar
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5173–5182 (2017)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images, vol 1. Citeseer (2009)
Liu, J., Rosin, P.L., Sun, X., Xiao, J., Lian, Z.: Image-driven unsupervised 3D model co-segmentation. Vis. Comput 35, 909–920 (2019)
Article Google Scholar
Lin, K., Lu, J., Chen, C.-S., Zhou, J.: Learning compact binary descriptors with unsupervised deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1183–1192. (2016)
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: European Conference on Computer Vision, pp. 430–443. Springer (2006)
Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1704–1716 (2012)
Article Google Scholar
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813
Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM international conference on multimedia, pp. 157–166. ACM (2014)
Yunchao, G., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2916–2929 (2013)
Article Google Scholar
Reddy Mopuri, K., Venkatesh Babu, R.: Object level deep feature pooling for compact image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 62–70 (2015)
Björkman, M., Bergström, N., Kragic, D.: Detecting, segmenting and tracking unknown objects using multi-label MRF inference. Comput. Vis. Image Underst. 118, 111–127 (2014)
Article Google Scholar

Download references

Acknowledgements

This work was carried out with the support of the Senate Research Council, University of Moratuwa, Sri Lanka (Grant No. SRC-16-1), and National Research Council, Sri Lanka (Grant No. 12-017).

Funding

This study was funded by the Senate Research Council, University of Moratuwa, Sri Lanka (Grant No. SRC-16-1), and National Research Council, Sri Lanka (Grant No. 12-017).

Author information

Authors and Affiliations

University of Moratuwa, Moratuwa, Sri Lanka
Ravimal Bandara & Lochandaka Ranathunga
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Wilayah Persekutuan, Kuala Lumpur, Malaysia
Nor Aniza Abdullah

Authors

Ravimal Bandara
View author publications
You can also search for this author in PubMed Google Scholar
Lochandaka Ranathunga
View author publications
You can also search for this author in PubMed Google Scholar
Nor Aniza Abdullah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ravimal Bandara.

Ethics declarations

Conflict of interest

Authors, Ravimal Bandara, Lochandaka Ranathunga and Nor Aniza Abdullah, declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bandara, R., Ranathunga, L. & Abdullah, N.A. Deep learned compact binary descriptor with a lightweight network-in-network architecture for visual description. Vis Comput 37, 275–290 (2021). https://doi.org/10.1007/s00371-020-01798-5

Download citation

Published: 30 January 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s00371-020-01798-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learned compact binary descriptor with a lightweight network-in-network architecture for visual description

Abstract

Access this article

Similar content being viewed by others

Deep Learning-Based Descriptors for Object Instance Search

Restricted Boltzmann machine as an aggregation technique for binary descriptors

From Local Binary Patterns to Pixel Difference Networks for Efficient Visual Representation Learning

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learned compact binary descriptor with a lightweight network-in-network architecture for visual description

Abstract

Access this article

Similar content being viewed by others

Deep Learning-Based Descriptors for Object Instance Search

Restricted Boltzmann machine as an aggregation technique for binary descriptors

From Local Binary Patterns to Pixel Difference Networks for Efficient Visual Representation Learning

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation