One net to rule them all: efficient recognition and retrieval of POI from geo-tagged photos

Peng, Pai; Gu, Xiaoling; Zhu, Suguo; Shou, Lidan; Chen, Gang

doi:10.1007/s11042-018-6847-y

One net to rule them all: efficient recognition and retrieval of POI from geo-tagged photos

Published: 04 January 2019

Volume 78, pages 24347–24371, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Pai Peng¹,
Xiaoling Gu²,
Suguo Zhu²,
Lidan Shou³ &
…
Gang Chen³

282 Accesses
2 Citations
Explore all metrics

Abstract

In this work, we present DeepCamera, a novel framework that combines visual recognition and spatial recognition for identifying places-of-interest (POIs) from smartphone photos. Both deep visual features and geographic features of images are explored in our framework. For visual recognition, we first design the HashNet model extended from an ordinary convolutional neural network (ConvNet) by adding a “hash layer” following the last fully connected layer. Furthermore, we compress multiple pre-trained deep HashNets into one single shallow and hash network namely “SHNet” that outputs semantic labels and compact hash codes simultaneously. As a result, it significantly reduces the time and memory consumption during POI recognition. For spatial recognition, a new layer called Spatial Layer is appended to a ConvNet to capture spatial information. Finally, both visual and spatial knowledge contribute to generating a hybrid probability distribution over all possible POI candidates by plugging the spatial layer into SHNet. Notably, the proposed SHNet model can be used for general visual recognition and retrieval. The experiments conducted on real-world datasets and classic datasets (MNIST and CIFAR-10) demonstrate the competitive accuracy and run-time performance of our proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning hashing for mobile visual search

Article Open access 21 February 2017

BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition

Article 16 November 2017

Optimal Densely Connected Networks with Pyramid Spatial Matching Scheme for Visual Place Recognition

Notes

https://techcrunch.com/2015/06/02/6-1b-smartphone-users-globally-by-2020-overtaking-basic-fixed-phone-subscriptions/
https://www.brandwatch.com/blog/amazing-social-media-statistics-and-facts/
FOV is a tuple of 4 camera parameters: GPS location, the angles of view, the maximum visible distance, and the direction of the camera.
The weight matrix W^H connects the fully connected layer and the hash layer, while the weight matrix W^L connects the hash layer and the output layer
http://www.geonames.org/
http://yann.lecun.com/exdb/mnist/ and CIFAR-10^{Footnote 7}
https://www.cs.toronto.edu/ kriz/cifar.html
https://github.com/fchollet/keras
http://www.cs.toronto.edu/~tijmen/csc321/slides/

References

Ba LJ, Caurana R (2014) Do deep nets really need to be deep? In: NIPS, pp 2654–2662
Cheng Z, Ding Y, He X, Zhu L, Song X, Kankanhalli MS (2018) A3ncf: an adaptive aspect attention model for rating prediction. In: IJCAI, pp 3748–3754
Cheng Z, Shen J, Nie L, Chua TS, Kankanhalli MS (2017) Exploring user-specific information in music retrieval. In: SIGIR, pp 655–664
Cheng Z, Shen J, Zhu L, Kankanhalli MS, Nie L (2017) Exploiting music play sequence for music recommendation. In: IJCAI, pp 3654–3660
Gao F, Wang Y, Li P, Tan M, Yu J, Zhu Y (2017) Deepsim: deep similarity for image quality assessment. Neurocomputing 257:104–114
Article Google Scholar
Gao F, Yu J, Zhu S, Huang Q, Tian Q (2018) Blind image quality prediction by exploiting multi-level deep representations. Pattern Recogn 81:432–442
Article Google Scholar
Girshick RB (2015) Fast r-cnn. arXiv:1504.08083
Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587
Gu X, Peng P, Li M, Wu S, Shou L, Chen G (2015) Cross-scenario eyeglasses retrieval via Egypt model. In: ICMR. ACM, pp 463–466
Gu X, Wong Y, Peng P, Shou L, Chen G, Kankanhalli M S (2017) Understanding fashion trends from street photos via neighbor-constrained embedding learning. In: ACM multimedia, pp 190–198
Gu X, Wu S, Peng P, Shou L, Chen K, Chen G (2017) Csir4g: an effective and efficient cross-scenario image retrieval model for glasses. Inf Sci 417:310–327
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR. IEEE Computer Society, pp 770–778
He K, Gkioxari G, Dollár P, Girshick RB (2017) Mask r-cnn. In: ICCV. IEEE Computer Society, pp 2980–2988
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257
Article MathSciNet Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, vol 37, pp 448–456
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
Li R, Jia J (2016) Visual question answering with question representation update (qru). In: NIPS, pp 4655–4663
Liong VE, Lu J, Wang G, Moulin P, Zhou J (2015) Deep hashing for compact binary codes learning. In: CVPR, pp 2475–2483
Luo X, Nie L, He X, Wu Y, Chen ZD, Xu XS (2018) Fast scalable supervised hashing. In: SIGIR, pp 735–744
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):13
Article Google Scholar
Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: ACM multimedia, pp 59–68
Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: CVPR. IEEE Computer Society, pp 2161–2168
Peng P, Shou L, Chen K, Chen G, Wu S (2013) The knowing camera: recognizing places-of-interest in smartphone photos. In: SIGIR, pp 969–972
Peng P, Shou L, Chen K, Chen G, Wu S (2014) The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos. In: SIGIR, pp 707–716
Ren S, He K, Girshick RB, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv:1506.01497
Rui Y, Huang TS, Chang SF (1999) Image retrieval: current techniques, promising directions, and open issues. J Vis Commun Image Represent 10(1):39–62
Article Google Scholar
Salakhutdinov R, Hinton GE (2009) Semantic hashing. Int J Approx Reasoning 50:969–978
Article Google Scholar
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. JLMR 15:1929–1958
MathSciNet MATH Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: AAAI, pp 2156–2162
Xie L, Shen J, Zhu L (2016) Online cross-modal hashing for web image retrieval. In: AAAI, pp 294–300
Xie L, Shen J, Han J, Zhu L, Shao L (2017) Dynamic multi-view hashing for online image retrieval. In: IJCAI, pp 3133–3139
Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: ICML, vol 37, pp 2048–2057
Yang HF, Lin K, Chen CS (2015) Supervised learning of semantics-preserving hashing via deep neural networks for large-scale image search. arXiv:1507.00101
Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024
Article Google Scholar
Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) Iprivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016
Article Google Scholar
Yu Z, Yu J, Xiang C, Fan J, Tao D (2018) Beyond bilinear: generalized multi-modal factorized high-order pooling for visual question answering. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2018.2817340
Yu J, Kuang Z, Zhang B, Zhang W, Lin D, Fan J (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur 13(5):1317–1332
Article Google Scholar
Zhou ZH (2012) Ensemble methods: foundations and algorithms, 1st edn. Chapman & Hall/CRC
Zhu L, Huang Z, Chang X, Song J, Shen HT (2017) Exploring consistent preferences: discrete hashing with pair-exemplar for scalable landmark search. In: ACM Multimedia, pp 726– 734
Zhu L, Huang Z, Liu X, He X, Sun J, Zhou X (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans Multimed 19(9):2066–2079
Article Google Scholar
Zhu L, Huang Z, Li Z, Xie L, Shen HT (2018) Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Trans Neural Netw Learn Syst 29:5264–5276
Article MathSciNet Google Scholar

Download references

Acknowledgments

The project was supported by the National Basic Research Program (973 Program, GrantNo.2015CB352400), and the National Science Foundation of China (GrantNo. 61802100, 61672455, 61528207 and 61472348). The project was also supported by the Natural Science Foundation of Zhejiang Province of China (GrantNo. LY18F020005).

Author information

Authors and Affiliations

YoutuLab, Tencent Technology (Shanghai) Co., Shanghai, 200233, China
Pai Peng
Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
Xiaoling Gu & Suguo Zhu
Database Laboratory, College of Computer Science and Technology, Zhejiang University, Hangzhou, 310007, China
Lidan Shou & Gang Chen

Authors

Pai Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoling Gu
View author publications
You can also search for this author in PubMed Google Scholar
Suguo Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lidan Shou
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoling Gu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was done while the author was with Zhejiang university.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, P., Gu, X., Zhu, S. et al. One net to rule them all: efficient recognition and retrieval of POI from geo-tagged photos. Multimed Tools Appl 78, 24347–24371 (2019). https://doi.org/10.1007/s11042-018-6847-y

Download citation

Received: 19 June 2018
Revised: 21 October 2018
Accepted: 05 November 2018
Published: 04 January 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s11042-018-6847-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

One net to rule them all: efficient recognition and retrieval of POI from geo-tagged photos

Abstract

Access this article

Similar content being viewed by others

Deep learning hashing for mobile visual search

BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition

Optimal Densely Connected Networks with Pyramid Spatial Matching Scheme for Visual Place Recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

One net to rule them all: efficient recognition and retrieval of POI from geo-tagged photos

Abstract

Access this article

Similar content being viewed by others

Deep learning hashing for mobile visual search

BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition

Optimal Densely Connected Networks with Pyramid Spatial Matching Scheme for Visual Place Recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation