Skip to main content
Log in

One net to rule them all: efficient recognition and retrieval of POI from geo-tagged photos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this work, we present DeepCamera, a novel framework that combines visual recognition and spatial recognition for identifying places-of-interest (POIs) from smartphone photos. Both deep visual features and geographic features of images are explored in our framework. For visual recognition, we first design the HashNet model extended from an ordinary convolutional neural network (ConvNet) by adding a “hash layer” following the last fully connected layer. Furthermore, we compress multiple pre-trained deep HashNets into one single shallow and hash network namely “SHNet” that outputs semantic labels and compact hash codes simultaneously. As a result, it significantly reduces the time and memory consumption during POI recognition. For spatial recognition, a new layer called Spatial Layer is appended to a ConvNet to capture spatial information. Finally, both visual and spatial knowledge contribute to generating a hybrid probability distribution over all possible POI candidates by plugging the spatial layer into SHNet. Notably, the proposed SHNet model can be used for general visual recognition and retrieval. The experiments conducted on real-world datasets and classic datasets (MNIST and CIFAR-10) demonstrate the competitive accuracy and run-time performance of our proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://techcrunch.com/2015/06/02/6-1b-smartphone-users-globally-by-2020-overtaking-basic-fixed-phone-subscriptions/

  2. https://www.brandwatch.com/blog/amazing-social-media-statistics-and-facts/

  3. FOV is a tuple of 4 camera parameters: GPS location, the angles of view, the maximum visible distance, and the direction of the camera.

  4. The weight matrix WH connects the fully connected layer and the hash layer, while the weight matrix WL connects the hash layer and the output layer

  5. http://www.geonames.org/

  6. http://yann.lecun.com/exdb/mnist/ and CIFAR-10Footnote 7

  7. https://www.cs.toronto.edu/ kriz/cifar.html

  8. https://github.com/fchollet/keras

  9. http://www.cs.toronto.edu/~tijmen/csc321/slides/

References

  1. Ba LJ, Caurana R (2014) Do deep nets really need to be deep? In: NIPS, pp 2654–2662

  2. Cheng Z, Ding Y, He X, Zhu L, Song X, Kankanhalli MS (2018) A3ncf: an adaptive aspect attention model for rating prediction. In: IJCAI, pp 3748–3754

  3. Cheng Z, Shen J, Nie L, Chua TS, Kankanhalli MS (2017) Exploring user-specific information in music retrieval. In: SIGIR, pp 655–664

  4. Cheng Z, Shen J, Zhu L, Kankanhalli MS, Nie L (2017) Exploiting music play sequence for music recommendation. In: IJCAI, pp 3654–3660

  5. Gao F, Wang Y, Li P, Tan M, Yu J, Zhu Y (2017) Deepsim: deep similarity for image quality assessment. Neurocomputing 257:104–114

    Article  Google Scholar 

  6. Gao F, Yu J, Zhu S, Huang Q, Tian Q (2018) Blind image quality prediction by exploiting multi-level deep representations. Pattern Recogn 81:432–442

    Article  Google Scholar 

  7. Girshick RB (2015) Fast r-cnn. arXiv:1504.08083

  8. Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587

  9. Gu X, Peng P, Li M, Wu S, Shou L, Chen G (2015) Cross-scenario eyeglasses retrieval via Egypt model. In: ICMR. ACM, pp 463–466

  10. Gu X, Wong Y, Peng P, Shou L, Chen G, Kankanhalli M S (2017) Understanding fashion trends from street photos via neighbor-constrained embedding learning. In: ACM multimedia, pp 190–198

  11. Gu X, Wu S, Peng P, Shou L, Chen K, Chen G (2017) Csir4g: an effective and efficient cross-scenario image retrieval model for glasses. Inf Sci 417:310–327

    Article  Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR. IEEE Computer Society, pp 770–778

  14. He K, Gkioxari G, Dollár P, Girshick RB (2017) Mask r-cnn. In: ICCV. IEEE Computer Society, pp 2980–2988

  15. Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257

    Article  MathSciNet  Google Scholar 

  16. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, vol 37, pp 448–456

  17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105

  18. Li R, Jia J (2016) Visual question answering with question representation update (qru). In: NIPS, pp 4655–4663

  19. Liong VE, Lu J, Wang G, Moulin P, Zhou J (2015) Deep hashing for compact binary codes learning. In: CVPR, pp 2475–2483

  20. Luo X, Nie L, He X, Wu Y, Chen ZD, Xu XS (2018) Fast scalable supervised hashing. In: SIGIR, pp 735–744

  21. Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):13

    Article  Google Scholar 

  22. Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: ACM multimedia, pp 59–68

  23. Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: CVPR. IEEE Computer Society, pp 2161–2168

  24. Peng P, Shou L, Chen K, Chen G, Wu S (2013) The knowing camera: recognizing places-of-interest in smartphone photos. In: SIGIR, pp 969–972

  25. Peng P, Shou L, Chen K, Chen G, Wu S (2014) The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos. In: SIGIR, pp 707–716

  26. Ren S, He K, Girshick RB, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv:1506.01497

  27. Rui Y, Huang TS, Chang SF (1999) Image retrieval: current techniques, promising directions, and open issues. J Vis Commun Image Represent 10(1):39–62

    Article  Google Scholar 

  28. Salakhutdinov R, Hinton GE (2009) Semantic hashing. Int J Approx Reasoning 50:969–978

    Article  Google Scholar 

  29. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR

  30. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  31. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. JLMR 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  32. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9

  33. Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: AAAI, pp 2156–2162

  34. Xie L, Shen J, Zhu L (2016) Online cross-modal hashing for web image retrieval. In: AAAI, pp 294–300

  35. Xie L, Shen J, Han J, Zhu L, Shao L (2017) Dynamic multi-view hashing for online image retrieval. In: IJCAI, pp 3133–3139

  36. Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: ICML, vol 37, pp 2048–2057

  37. Yang HF, Lin K, Chen CS (2015) Supervised learning of semantics-preserving hashing via deep neural networks for large-scale image search. arXiv:1507.00101

  38. Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024

    Article  Google Scholar 

  39. Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) Iprivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016

    Article  Google Scholar 

  40. Yu Z, Yu J, Xiang C, Fan J, Tao D (2018) Beyond bilinear: generalized multi-modal factorized high-order pooling for visual question answering. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2018.2817340

  41. Yu J, Kuang Z, Zhang B, Zhang W, Lin D, Fan J (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur 13(5):1317–1332

    Article  Google Scholar 

  42. Zhou ZH (2012) Ensemble methods: foundations and algorithms, 1st edn. Chapman & Hall/CRC

  43. Zhu L, Huang Z, Chang X, Song J, Shen HT (2017) Exploring consistent preferences: discrete hashing with pair-exemplar for scalable landmark search. In: ACM Multimedia, pp 726– 734

  44. Zhu L, Huang Z, Liu X, He X, Sun J, Zhou X (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans Multimed 19(9):2066–2079

    Article  Google Scholar 

  45. Zhu L, Huang Z, Li Z, Xie L, Shen HT (2018) Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Trans Neural Netw Learn Syst 29:5264–5276

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The project was supported by the National Basic Research Program (973 Program, GrantNo.2015CB352400), and the National Science Foundation of China (GrantNo. 61802100, 61672455, 61528207 and 61472348). The project was also supported by the Natural Science Foundation of Zhejiang Province of China (GrantNo. LY18F020005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoling Gu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was done while the author was with Zhejiang university.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, P., Gu, X., Zhu, S. et al. One net to rule them all: efficient recognition and retrieval of POI from geo-tagged photos. Multimed Tools Appl 78, 24347–24371 (2019). https://doi.org/10.1007/s11042-018-6847-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6847-y

Keywords

Navigation