ABSTRACT
In the last years, content-based image retrieval largely benefited from representation extracted from deeper and more complex convolutional neural networks, which became more effective but also more computationally demanding. Despite existing hardware acceleration, query processing times may be easily saturated by deep feature extraction in high-throughput or real-time embedded scenarios, and usually, a trade-off between efficiency and effectiveness has to be accepted. In this work, we experiment with the recently proposed continuous neural networks defined by parametric ordinary differential equations, dubbed ODE-Nets, for adaptive extraction of image representations. Given the continuous evolution of the network hidden state, we propose to approximate the exact feature extraction by taking a previous "near-in-time" hidden state as features with a reduced computational cost. To understand the potential and the limits of this approach, we also evaluate an ODE-only architecture in which we minimize the number of classical layers in order to delegate most of the representation learning process --- and thus the feature extraction process --- to the continuous part of the model. Preliminary experiments on standard benchmarks show that we are able to dynamically control the trade-off between efficiency and effectiveness of feature extraction at inference-time by controlling the evolution of the continuous hidden state. Although ODE-only networks provide the best fine-grained control on the effectiveness-efficiency trade-off, we observed that mixed architectures perform better or comparably to standard residual nets in both the image classification and retrieval setups while using fewer parameters and retaining the controllability of the trade-off.
- Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, and Lucia Vadicamo. 2019. Large-scale instance-level image retrieval. Information Processing & Management (2019), 102100.Google Scholar
- Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Lucia Vadicamo. 2016. Deep permutations: deep convolutional neural networks and permutation-based indexing. In International Conference on Similarity Search and Applications. Springer, 93--106.Google ScholarCross Ref
- Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5297--5307.Google ScholarCross Ref
- Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In European conference on computer vision. Springer, 584--599.Google ScholarCross Ref
- Fatih Cakir, Kun He, Sarah Adel Bargal, and Stan Sclaroff. 2017. MIHash: Online hashing with mutual information. In Proceedings of the IEEE International Conference on Computer Vision. 437--445.Google ScholarCross Ref
- Fabio Carrara, Giuseppe Amato, Fabrizio Falchi, and Claudio Gennaro. 2019. Evaluation of continuous image features learned by ode nets. In International Conference on Image Analysis and Processing. Springer, 432--442.Google ScholarDigital Library
- Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham. 2018a. Reversible architectures for arbitrarily deep residual neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence .Google ScholarCross Ref
- Bo Chang, Lili Meng, Eldad Haber, Frederick Tung, and David Begert. 2018b. Multi-level Residual Networks from Dynamical Systems View. In International Conference on Learning Representations. https://openreview.net/forum?id=SyJS-OgR-Google Scholar
- Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. Neural ordinary differential equations. In Advances in Neural Information Processing Systems. 6572--6583.Google Scholar
- John R Dormand and Peter J Prince. 1980. A family of embedded Runge-Kutta formulae. Journal of computational and applied mathematics, Vol. 6, 1 (1980), 19--26.Google ScholarCross Ref
- Matthijs Douze, Hervé Jégou, and Florent Perronnin. 2016. Polysemous codes. In European Conference on Computer Vision. Springer, 785--801.Google ScholarCross Ref
- Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. 2019. Augmented Neural ODEs. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8--14 December 2019, Vancouver, BC, Canada. 3134--3144. http://papers.nips.cc/paper/8577-augmented-neural-odesGoogle Scholar
- Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2017. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, Vol. 124, 2 (2017), 237--254.Google ScholarDigital Library
- Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. 2019. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. International Conference on Learning Representations (2019).Google Scholar
- Eldad Haber and Lars Ruthotto. 2017. Stable architectures for deep neural networks. Inverse Problems, Vol. 34, 1 (2017), 014004.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016a. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016b. Identity mappings in deep residual networks. In European conference on computer vision. Springer, 630--645.Google ScholarCross Ref
- Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.Google ScholarCross Ref
- Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In European conference on computer vision. Springer, 304--317.Google ScholarDigital Library
- Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In European conference on computer vision. Springer, 685--701.Google ScholarCross Ref
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.Google Scholar
- Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2017. FractalNet: Ultra-Deep Neural Networks without Residuals. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings. https://openreview.net/forum?id=S1VaB4cexGoogle Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2074--2081.Google ScholarDigital Library
- Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. 2018. Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018. 3282--3291. http://proceedings.mlr.press/v80/lu18d.htmlGoogle Scholar
- Yury A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence (2018).Google Scholar
- Nicola Messina, Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, and Claudio Gennaro. 2018. Learning relationship-aware visual features. In Proceedings of the European Conference on Computer Vision (ECCV). 0--0.Google Scholar
- James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In 2007 IEEE conference on computer vision and pattern recognition. IEEE, 1--8.Google ScholarCross Ref
- Filip Radenović, Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondvr ej Chum. 2018a. Revisiting oxford and paris: Large-scale image retrieval benchmarking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5706--5715.Google ScholarCross Ref
- Filip Radenović, Giorgos Tolias, and Ondrej Chum. 2018b. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence (2018).Google Scholar
- Ali S Razavian, Josephine Sullivan, Stefan Carlsson, and Atsuto Maki. 2016. Visual instance retrieval with deep convolutional networks. ITE Transactions on Media Technology and Applications, Vol. 4, 3 (2016), 251--258.Google ScholarCross Ref
- Yulia Rubanova, Ricky TQ Chen, and David Duvenaud. 2019. Latent odes for irregularly-sampled time series. arXiv preprint arXiv:1907.03907 (2019).Google Scholar
- Lars Ruthotto and Eldad Haber. 2019. Deep Neural Networks Motivated by Partial Differential Equations. Journal of Mathematical Imaging and Vision (18 Sep 2019). https://doi.org/10.1007/s10851-019-00903--1Google Scholar
- Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813.Google ScholarDigital Library
- Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google Scholar
- Yuxin Wu and Kaiming He. 2018. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV). 3--19.Google ScholarDigital Library
- Xingcheng Zhang, Zhizhong Li, Chen Change Loy, and Dahua Lin. 2017. Polynet: A pursuit of structural diversity in very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 718--726.Google ScholarCross Ref
- Mai Zhu, Bo Chang, and Chong Fu. 2018. Convolutional Neural Networks combined with Runge-Kutta Methods. arXiv preprint arXiv:1802.08831 (2018).Google Scholar
Index Terms
- Continuous ODE-defined Image Features for Adaptive Retrieval
Recommendations
Based on Image Salient Features Network Image Retrieval Method
CIS '12: Proceedings of the 2012 Eighth International Conference on Computational Intelligence and SecurityTraditional image retrieval depends on the images embedded in text messages, text description of the limitations of image content, resulting in low quality of image retrieval. The local information extracted image itself, the use of local features LSH ...
Image Retrieval Using Fused Deep Convolutional Features
This paper proposes an image retrieval using fused deep convolutional features to solve the semantic gap between low-level features and high-level semantic features of traditional contend-based image retrieval method. Firstly, the improved network ...
Self-adaptive Feature Extraction Scheme for Mobile Image Retrieval of Flowers
SITIS '12: Proceedings of the 2012 Eighth International Conference on Signal Image Technology and Internet Based SystemsThis paper proposes a new self-adaptive feature extraction scheme to improve retrieval precision for Content-based Image Retrieval (CBIR) systems on mobile phones such that users can search similar pictures for a query image taken from their mobile ...
Comments