Skip to main content

A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10191))

Included in the following conference series:

Abstract

Content-Based Image Retrieval (CBIR) method analyzes the content of an image and extracts the features to describe images, also called the image annotations (or called image labels). A machine learning (ML) algorithm is commonly used to get the annotations, but it is a time-consuming process. In addition, the semantic gap is another problem in image labeling. To overcome the first difficulty, Google Cloud Vision API is a solution because it can save much computational time. To resolve the second problem, a transformation method is defined for mapping the undefined terms by using the WordNet. In the experiments, a well-known dataset, Pascal VOC 2007, with 4952 testing figures is used and the Cloud Vision API on image labeling implemented by R language, called Cloud Vision API. At most ten labels of each image if the scores are over 50. Moreover, we compare the Cloud Vision API with well-known ML algorithms. This work found this API yield 42.4% mean average precision (mAP) among the 4,952 images. Our proposed approach is better than three well-known ML algorithms. Hence, this work could be extended to test other image datasets and as a benchmark method while evaluating the performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://cloud.google.com/vision/.

  2. 2.

    https://github.com/worldstar/GoogleCloudVisionAPI_RLanguage.

References

  1. Bashir, F.I., Khokhar, A.A., Schonfeld, D.: Object trajectory-based activity classification and recognition using hidden markov models. IEEE Trans. Image Process. 16(7), 1912–1919 (2007)

    Article  MathSciNet  Google Scholar 

  2. Chang, S.-F., Ma, W.-Y., Smeulders, A.: Recent advances and challenges of semantic image/video search. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 4, pp. IV-1205. IEEE (2007)

    Google Scholar 

  3. Dorai, C., Venkatesh, S.: Bridging the semantic gap with computational media aesthetics. IEEE Multimed. 10(2), 15–17 (2003)

    Article  Google Scholar 

  4. Fang, Q., Xu, C., Sang, J., Hossain, M., Ghoneim, A.: Folksonomy-based visual ontology construction and its applications. IEEE Trans. Multimed. 18(4), 702–713 (2016)

    Article  Google Scholar 

  5. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)

    Article  Google Scholar 

  6. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  7. Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. II-1002. IEEE (2004)

    Google Scholar 

  8. Feng, S., Feng, Z., Jin, R.: Learning to rank image tags with limited training examples. IEEE Trans. Image Process. 24(4), 1223–1234 (2015)

    Article  MathSciNet  Google Scholar 

  9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  10. Hong, R., Yang, Y., Wang, M., Hua, X.-S.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data 1(4), 152–161 (2015)

    Article  Google Scholar 

  11. Hu, X., Li, K., Han, J., Hua, X., Guo, L., Liu, T.: Bridging the semantic gap via functional brain imaging. IEEE Trans. Multimed. 14(2), 314–325 (2012)

    Article  Google Scholar 

  12. Im, D.-H., Park, G.-D.: Linked tag: image annotation using semantic relationships between image tags. Multimed. Tools Appl. 74(7), 2273–2287 (2015)

    Article  Google Scholar 

  13. Kekre, H., Sarode, T.K., Thepade, S.D., Vaishali, V.: Improved texture feature based image retrieval using kekres fast codebook generation algorithm. In: Pise, S.J. (ed.) Thinkquest\({}^{\sim }\) 2010, 143–149. Springer, Heidelberg (2011)

    Google Scholar 

  14. Kesorn, K., Poslad, S.: An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Trans. Multimed. 14(1), 211–222 (2012)

    Article  Google Scholar 

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  16. Kuric, E., Bielikova, M.: ANNOR: efficient image annotation based on combining local and global features. Comput. Graph. 47, 1–15 (2015)

    Article  Google Scholar 

  17. Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: Advances in Neural Information Processing Systems (2003). p. None

    Google Scholar 

  18. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  19. Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1075–1088 (2003)

    Article  Google Scholar 

  20. Li, L.-J., Wang, C., Lim, Y., Blei, D.M., Fei-Fei, L.: Building and using a semantivisual image hierarchy. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3336–3343. IEEE (2010)

    Google Scholar 

  21. Lim, J.J., Zitnick, C.L., Dollár, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3158–3165 (2013)

    Google Scholar 

  22. Liu, G.-H., Yang, J.-Y.: Content-based image retrieval using color difference histogram. Pattern Recogn. 46(1), 188–198 (2013)

    Article  Google Scholar 

  23. Liu, Y., Zhang, D., Lu, G., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)

    Article  MATH  Google Scholar 

  24. Lu, Z., Wang, L.: Learning descriptive visual representation for image classification and annotation. Pattern Recogn. 48(2), 498–508 (2015)

    Article  Google Scholar 

  25. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  26. Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: First International Workshop on Multimedia Intelligent Storage and Retrieval Management, pp. 1–9. Citeseer (1999)

    Google Scholar 

  27. Murala, S., Maheshwari, R., Balasubramanian, R.: Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Trans. Image Process. 21(5), 2874–2886 (2012)

    Article  MathSciNet  Google Scholar 

  28. Osman, T., Thakker, D., Schaefer, G.: Utilising semantic technologies for intelligent indexing and retrieval of digital images. Computing 96(7), 651–668 (2014)

    Article  Google Scholar 

  29. Pan, Y., Yao, T., Mei, T., Li, H., Ngo, C.-W., Rui, Y.: Click-through-based cross-view learning for image search. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 717–726. ACM (2014)

    Google Scholar 

  30. Pesquita, C., Ferreira, J.D., Couto, F.M., Silva, M.J.: The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources. J. Biomed. Semant. 5, 4 (2014)

    Article  Google Scholar 

  31. Poslad, S., Kesorn, K.: A multi-modal incompleteness ontology model (mmio) to enhance information fusion for image retrieval. Inf. Fusion 20, 225–241 (2014)

    Article  Google Scholar 

  32. Ren, X., Ramanan, D.: Histograms of sparse codes for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3246–3253 (2013)

    Google Scholar 

  33. Rodríguez-García, M.Á., Valencia-García, R., García-Sánchez, F., Samper-Zapater, J.J.: Ontology-based annotation and retrieval of services in the cloud. Knowl. Based Syst. 56, 15–25 (2014)

    Article  Google Scholar 

  34. Sarker, I.H., Iqbal, S.: Content-based image retrieval using haar wavelet transform and color moment. SmartCR 3(3), 155–165 (2013)

    Article  Google Scholar 

  35. Su, J.-H., Chou, C.-L., Lin, C.-Y., Tseng, V.S.: Effective semantic annotation by image-to-concept distribution model. IEEE Trans. Multimed. 13(3), 530–538 (2011)

    Article  Google Scholar 

  36. Xia, Z., Peng, J., Feng, X., Fan, J.: Automatic abstract tag detection for social image tag refinement and enrichment. J. Signal Process. Syst. 74(1), 5–18 (2014)

    Article  Google Scholar 

  37. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1794–1801. IEEE (2009)

    Google Scholar 

  38. Yuan, Z., Xu, C., Sang, J., Yan, S., Hossain, M.S.: Learning feature hierarchies: a layer-wise tag-embedded approach. IEEE Trans. Multimed. 17(6), 816–827 (2015)

    Article  Google Scholar 

  39. Zhang, S., Tian, Q., Hua, G., Huang, Q., Gao, W.: Objectpatchnet: towards scalable and semantic image annotation and retrieval. Comput. Vis. Image Underst. 118, 16–29 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Hui Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chen, SH., Chen, YH. (2017). A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10191. Springer, Cham. https://doi.org/10.1007/978-3-319-54472-4_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54472-4_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54471-7

  • Online ISBN: 978-3-319-54472-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics