Skip to main content
Log in

Deep-MATEM: TEM query image based cross-modal retrieval for material science literature

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid increasing of published material science literatures, an effective literature retrieving system is important for researchers to obtain relevant information. In this paper we propose a cross-modal material science literatures retrieval method using transmission electron microscopy(TEM) image as query information, which provide a access of using material experiment generated TEM image data to retrieve literatures. In this method, terminologies are extracted and topic distribution are inferred from text part of literatures by using LDA, and we design a multi-task Convolutional Neuron Network(CNN) mapping query TEM image to the relevant terminologies and topic distribution predictions. The ranking score is calculated from output for query image and text data. Experimental results shows our method achieves better performance than multi-label CCA, Deep Semantic Matching(Deep SM) and Modality-Specific Deep Structure(MSDS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Alex K, Ilya S, Geoffrey H (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, pp 1097–1105

  2. Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  3. Blei D, Jordan M (2003) Modeling annotated data. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp 127–134

  4. Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Callister W, Rethwisch D (2013) Materials science and engineering an introduction, 9th. Wiley, USA

    Google Scholar 

  6. Cao G, Iosifidis A, Chen K, Gabbouj M (2018) Generalized multi-view embedding for visual recognition and cross-modal retrieval. IEEE Trans Cybern 99:1–14

    Google Scholar 

  7. Cheng MM, Zhang Z, Lin WY, Torr P (2014) Bing: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, pp 3286–3293

  8. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE conference on computer vision, pp 1440–1448

  9. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition, pp 346–361

  10. He K, Zhang X, Ren S, Sun J (2016) Learning and transferring representations for image steganalysis using convolutional neural network. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770778

  11. Jiang X, Wu F, Li X, Zhao Z, Lu W, Tang S, Zhuang Y (2015) Deep compositional cross-modal learning to rank via local-global alignment. In: Proceedings of the 23rd ACM international conference on multimedia, pp 69–78

  12. Johnson J, Karpathy A, Fei-Fei L (2016) Densecap: fully convolutional localization networks for dense captioning. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4565–4574

  13. Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image descriptions. IEEE Trans Pattern Anal Mach Intell 39:664–676

    Article  Google Scholar 

  14. Li K, Qi GJ, Ye J, Hua KA (2017) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838

    Article  Google Scholar 

  15. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  16. Nikhil R, Jose CP, Emanuele C, Gabriel D, Lanckriet1 GRG, Roger L, Nuno V (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260

  17. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  Google Scholar 

  18. Qian YL, Dong J, Wang W, Tan T (2015) Learning representations for steganalysis from regularized cnn model with auxiliary tasks. In: International conference on communications, signal processing, and systems (CSPS2015)

  19. Qian YL, Dong J, Wang W, Tan T (2016) Learning and transferring representations for image steganalysis using convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), pp 2752–2756

  20. Ranjan V, Rasiwasia N, Jawahar CV (2015) Multi-label cross-modal retrieval. In: 2015 IEEE international conference on computer vision (ICCV), pp 4094–4102

  21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556

  22. Teh Y, Jordan M, Beal M, Blei D (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581

    Article  MathSciNet  Google Scholar 

  23. Wang Y, Wu F, Song J, Li X, Zhuang Y (2014) Multi-modal mutual topic reinforce modeling for cross-media retrieval. In: Proceedings of the international ACM SIGIR conference on research and development in informaion retrieval, pp 307–316

  24. Wang J, He Y, Kang C, Xiang S, Pan C (2015) Image-text cross-modal retrieval via modality-specific feature learning. In: Proceedings of the 5th ACM on international conference on multimedia retrieval, pp 347–354

  25. Wang D, Gao X, Wang X, He L, Yuan B (2016) Multimodal discriminative binary embedding for large-scale cross-modal retrieval. IEEE Trans Image Process 25:4540–4554

    Article  MathSciNet  Google Scholar 

  26. Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: a unified framework for multi-label image classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2285–2294

  27. Wei Y, Xia W, Huang J, Ni B, Dong J, Zhao Y, Yan S (2014) CNN: single-label to multi-label. CoRR arXiv:1406.5726

  28. Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cyber 47 (2):449–460

    Google Scholar 

  29. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(8):207–224

    MATH  Google Scholar 

  30. Xu X, Shen F, Yang Y, Shen H, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26:2494–2507

    Article  MathSciNet  Google Scholar 

  31. You X, Li Q, Tao D, Ou W, Gong M (2014) Local metric learning for exemplar-based object detection. IEEE Trans Circ Syst Video Technol 24(8):1265–1276

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under U1536105, NO.51474237, U1536120, U1636201, the National Key Research and Development Program of China(No. 2016YFB1001003)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingxiao Guan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Guan, Q., Wang, H. et al. Deep-MATEM: TEM query image based cross-modal retrieval for material science literature. Multimed Tools Appl 77, 30269–30290 (2018). https://doi.org/10.1007/s11042-018-6043-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6043-0

Keywords

Navigation