A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet

Chen, Shih-Hsin; Chen, Yi-Hui

doi:10.1007/978-3-319-54472-4_61

Shih-Hsin Chen^17,18 &
Yi-Hui Chen^17,18

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10191))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

2302 Accesses
10 Citations

Abstract

Content-Based Image Retrieval (CBIR) method analyzes the content of an image and extracts the features to describe images, also called the image annotations (or called image labels). A machine learning (ML) algorithm is commonly used to get the annotations, but it is a time-consuming process. In addition, the semantic gap is another problem in image labeling. To overcome the first difficulty, Google Cloud Vision API is a solution because it can save much computational time. To resolve the second problem, a transformation method is defined for mapping the undefined terms by using the WordNet. In the experiments, a well-known dataset, Pascal VOC 2007, with 4952 testing figures is used and the Cloud Vision API on image labeling implemented by R language, called Cloud Vision API. At most ten labels of each image if the scores are over 50. Moreover, we compare the Cloud Vision API with well-known ML algorithms. This work found this API yield 42.4% mean average precision (mAP) among the 4,952 images. Our proposed approach is better than three well-known ML algorithms. Hence, this work could be extended to test other image datasets and as a benchmark method while evaluating the performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bashir, F.I., Khokhar, A.A., Schonfeld, D.: Object trajectory-based activity classification and recognition using hidden markov models. IEEE Trans. Image Process. 16(7), 1912–1919 (2007)
Article MathSciNet Google Scholar
Chang, S.-F., Ma, W.-Y., Smeulders, A.: Recent advances and challenges of semantic image/video search. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 4, pp. IV-1205. IEEE (2007)
Google Scholar
Dorai, C., Venkatesh, S.: Bridging the semantic gap with computational media aesthetics. IEEE Multimed. 10(2), 15–17 (2003)
Article Google Scholar
Fang, Q., Xu, C., Sang, J., Hossain, M., Ghoneim, A.: Folksonomy-based visual ontology construction and its applications. IEEE Trans. Multimed. 18(4), 702–713 (2016)
Article Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. II-1002. IEEE (2004)
Google Scholar
Feng, S., Feng, Z., Jin, R.: Learning to rank image tags with limited training examples. IEEE Trans. Image Process. 24(4), 1223–1234 (2015)
Article MathSciNet Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Hong, R., Yang, Y., Wang, M., Hua, X.-S.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data 1(4), 152–161 (2015)
Article Google Scholar
Hu, X., Li, K., Han, J., Hua, X., Guo, L., Liu, T.: Bridging the semantic gap via functional brain imaging. IEEE Trans. Multimed. 14(2), 314–325 (2012)
Article Google Scholar
Im, D.-H., Park, G.-D.: Linked tag: image annotation using semantic relationships between image tags. Multimed. Tools Appl. 74(7), 2273–2287 (2015)
Article Google Scholar
Kekre, H., Sarode, T.K., Thepade, S.D., Vaishali, V.: Improved texture feature based image retrieval using kekres fast codebook generation algorithm. In: Pise, S.J. (ed.) Thinkquest\({}^{\sim }\) 2010, 143–149. Springer, Heidelberg (2011)
Google Scholar
Kesorn, K., Poslad, S.: An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Trans. Multimed. 14(1), 211–222 (2012)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Kuric, E., Bielikova, M.: ANNOR: efficient image annotation based on combining local and global features. Comput. Graph. 47, 1–15 (2015)
Article Google Scholar
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: Advances in Neural Information Processing Systems (2003). p. None
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1075–1088 (2003)
Article Google Scholar
Li, L.-J., Wang, C., Lim, Y., Blei, D.M., Fei-Fei, L.: Building and using a semantivisual image hierarchy. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3336–3343. IEEE (2010)
Google Scholar
Lim, J.J., Zitnick, C.L., Dollár, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3158–3165 (2013)
Google Scholar
Liu, G.-H., Yang, J.-Y.: Content-based image retrieval using color difference histogram. Pattern Recogn. 46(1), 188–198 (2013)
Article Google Scholar
Liu, Y., Zhang, D., Lu, G., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)
Article MATH Google Scholar
Lu, Z., Wang, L.: Learning descriptive visual representation for image classification and annotation. Pattern Recogn. 48(2), 498–508 (2015)
Article Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: First International Workshop on Multimedia Intelligent Storage and Retrieval Management, pp. 1–9. Citeseer (1999)
Google Scholar
Murala, S., Maheshwari, R., Balasubramanian, R.: Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Trans. Image Process. 21(5), 2874–2886 (2012)
Article MathSciNet Google Scholar
Osman, T., Thakker, D., Schaefer, G.: Utilising semantic technologies for intelligent indexing and retrieval of digital images. Computing 96(7), 651–668 (2014)
Article Google Scholar
Pan, Y., Yao, T., Mei, T., Li, H., Ngo, C.-W., Rui, Y.: Click-through-based cross-view learning for image search. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 717–726. ACM (2014)
Google Scholar
Pesquita, C., Ferreira, J.D., Couto, F.M., Silva, M.J.: The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources. J. Biomed. Semant. 5, 4 (2014)
Article Google Scholar
Poslad, S., Kesorn, K.: A multi-modal incompleteness ontology model (mmio) to enhance information fusion for image retrieval. Inf. Fusion 20, 225–241 (2014)
Article Google Scholar
Ren, X., Ramanan, D.: Histograms of sparse codes for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3246–3253 (2013)
Google Scholar
Rodríguez-García, M.Á., Valencia-García, R., García-Sánchez, F., Samper-Zapater, J.J.: Ontology-based annotation and retrieval of services in the cloud. Knowl. Based Syst. 56, 15–25 (2014)
Article Google Scholar
Sarker, I.H., Iqbal, S.: Content-based image retrieval using haar wavelet transform and color moment. SmartCR 3(3), 155–165 (2013)
Article Google Scholar
Su, J.-H., Chou, C.-L., Lin, C.-Y., Tseng, V.S.: Effective semantic annotation by image-to-concept distribution model. IEEE Trans. Multimed. 13(3), 530–538 (2011)
Article Google Scholar
Xia, Z., Peng, J., Feng, X., Fan, J.: Automatic abstract tag detection for social image tag refinement and enrichment. J. Signal Process. Syst. 74(1), 5–18 (2014)
Article Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1794–1801. IEEE (2009)
Google Scholar
Yuan, Z., Xu, C., Sang, J., Yan, S., Hossain, M.S.: Learning feature hierarchies: a layer-wise tag-embedded approach. IEEE Trans. Multimed. 17(6), 816–827 (2015)
Article Google Scholar
Zhang, S., Tian, Q., Hua, G., Huang, Q., Gao, W.: Objectpatchnet: towards scalable and semantic image annotation and retrieval. Comput. Vis. Image Underst. 118, 16–29 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Management, Cheng Shiu University, No. 840, Chengcing Road, Niaosong District, Kaohsiung City, 83347, Taiwan R.O.C.
Shih-Hsin Chen & Yi-Hui Chen
Department of M-Commerce and Multimedia Applications, Asia University, No. 500, Lioufeng Road, Wufeng, Taichung, 41354, Taiwan R.O.C.
Shih-Hsin Chen & Yi-Hui Chen

Authors

Shih-Hsin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Hui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi-Hui Chen .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology , Wroclaw, Poland
Ngoc Thanh Nguyen
Japan Advanced Institute of Science and Technology , Nomi, Japan
Satoshi Tojo
Japan Advanced Institute of Science and Technology , Nomi, Japan
Le Minh Nguyen
Wrocław University of Science and Technology , Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, SH., Chen, YH. (2017). A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10191. Springer, Cham. https://doi.org/10.1007/978-3-319-54472-4_61

Download citation

DOI: https://doi.org/10.1007/978-3-319-54472-4_61
Published: 26 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54471-7
Online ISBN: 978-3-319-54472-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics