Skip to main content

Learning a Semantic Space for Modeling Images, Tags and Feelings in Cross-Media Search

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11607))

Abstract

This paper contributes a new, real-world web image dataset for cross-media retrieval called FB5K. The proposed FB5K dataset contains the following attributes: (1) 5130 images crawled from Facebook; (2) images that are categorized according to users’ feelings; (3) images independent of text and language rather than using feelings for search. Furthermore, we propose a novel approach through the use of Optical Character Recognition (OCR) and explicit incorporation of high-level semantic information. We comprehensively compute the performance of four different subspace-learning methods and three modified versions of the Correspondence Auto Encoder (Corr-AE), alongside numerous text features and similarity measurements comparing Wikipedia, Flickr30k and FB5K. To check the characteristics of FB5K, we propose a semantic-based cross-media retrieval method. To accomplish cross-media retrieval, we introduced a new similarity measurement in the embedded space, which significantly improved system performance compared with the conventional Euclidean distance. Our experimental results demonstrated the efficiency of the proposed retrieval method on three different public datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://host.robots.ox.ac.uk/pascal/VOC/.

  2. 2.

    http://www.svcl.ucsd.edu/projects/crossmodal/.

  3. 3.

    facebook.com.

  4. 4.

    http://www.xyz.com/.

  5. 5.

    https://github.com/tesseract-ocr/tesseract.

References

  1. Lu, Y.-J., Nguyen, P.A., Zhang, H., Ngo, C.-W.: Concept-based interactive search system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 463–468. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_42

    Chapter  Google Scholar 

  2. Kambau, R.A., Hasibuan, Z.A.: Concept-based multimedia information retrieval system using ontology search in cultural heritage. In: Second International Conference on Informatics and Computing (ICIC), pp. 1–6. IEEE (2017)

    Google Scholar 

  3. Hwang, S.J., Grauman, K.: Reading between the lines: object localization using implicit cues from image tags. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1145–1158 (2012)

    Article  Google Scholar 

  4. Peng, Y., Huang, X., Zhao, Y.: An overview of cross-media retrieval: concepts, methodologies, benchmarks and challenges. IEEE Trans. Circ. Syst. Video Technol. 28, 2372–2385 (2017)

    Article  Google Scholar 

  5. Grubinger, M., Clough, P., Muller, H., Deselaers, T.: The IAPR TC-12 benchmark: a new evaluation resource for visual information systems. In: International Workshop onto Image, vol. 5, p. 10 (2006)

    Google Scholar 

  6. Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 251–260. ACM (2010)

    Google Scholar 

  7. Li, J., Wang, J.Z.: Real-time computerized annotation of pictures. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 985–1002 (2008)

    Article  Google Scholar 

  8. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 394–410 (2007)

    Article  Google Scholar 

  9. Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 319–326. ACM (2004)

    Google Scholar 

  10. Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)

    Article  Google Scholar 

  11. Wang, X.-J., Zhang, L., Jing, F., Ma, W.-Y.: Annosearch: Image auto-annotation by search. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1483–1490. IEEE (2006)

    Google Scholar 

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  13. Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  14. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Google Scholar 

  15. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  16. Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106(2), 210–233 (2014)

    Article  Google Scholar 

  17. Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)

    Article  Google Scholar 

  18. Tenenbaum, J.B., Freeman, W.T.: Separating style and content. In: Advances in Neural Information Processing Systems, pp. 662–668 (1997)

    Google Scholar 

  19. Rosipal, R., Krämer, N.: Overview and recent advances in partial least squares. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 34–51. Springer, Heidelberg (2006). https://doi.org/10.1007/11752790_2

    Chapter  Google Scholar 

  20. Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2160–2167. IEEE (2012)

    Google Scholar 

  21. Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 7–16. ACM (2014)

    Google Scholar 

  22. Rasiwasia, N., Mahajan, D., Mahadevan, V., Aggarwal, G.: Cluster canonical correlation analysis. In: Artificial Intelligence and Statistics, pp. 823–831 (2014)

    Google Scholar 

  23. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)

    Google Scholar 

  24. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    Google Scholar 

  25. Rehman, S.U., Tu, S., Huang, Y., Liu, G.: CSFL: a novel unsupervised convolution neural network approach for visual pattern classification. AI Commun. 30(5), 311–324 (2017)

    Google Scholar 

  26. Rehman, S.U., Tu, S., Huang, Y., Yang, Z.: Face recognition: a novel un-supervised convolutional neural network method. In: IEEE International Conference of Online Analysis and Computing Science (ICOACS), pp. 139–144. IEEE (2016)

    Google Scholar 

  27. Rehman, S., et al.: Optimization of CNN through novel training strategy for visual classification problems. Entropy 20(4), 290 (2018)

    Article  Google Scholar 

  28. Damer, N., Opel, A., Nouak, A.: CMC curve properties and biometric source weighting in multi-biometric score-level fusion. In: 2014 17th International Conference on Information Fusion (FUSION), pp. 1–6. IEEE (2014)

    Google Scholar 

  29. Seha, S., Hatzinakos, D.: Human recognition using transient auditory evoked potentials: a preliminary study. IET Biometrics, IET 7, 242–250 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sadaqat ur Rehman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

ur Rehman, S., Huang, Y., Tu, S., Ahmad, B. (2019). Learning a Semantic Space for Modeling Images, Tags and Feelings in Cross-Media Search. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26142-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26141-2

  • Online ISBN: 978-3-030-26142-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics