Loading [a11y]/accessibility-menu.js
Cross-modal Retrieval Based on Stacked Bimodal Auto-Encoder | IEEE Conference Publication | IEEE Xplore

Cross-modal Retrieval Based on Stacked Bimodal Auto-Encoder


Abstract:

Deep learning (DL) has achieved excellent results in dealing with various types of single-modal problems, and many researchers have applied DL in the field of cross-modal...Show More

Abstract:

Deep learning (DL) has achieved excellent results in dealing with various types of single-modal problems, and many researchers have applied DL in the field of cross-modal retrieval, in which the popular approaches are based on two-stage learning. The first stage obtains a separate representation of each modality, and the second stage is responsible for learning the inter-modal correlation, which is the key to retrieval. The traditional solutions to the second stage obtain the inter-modal shared representation through a shallow network structure, which cannot effectively learn the inter-modal multi-level correlation. Motivated by this, in this paper, a novel hybrid deep structure combining two stages was presented for cross-modal retrieval task. To learn the inter-modal correlation, we incorporated a Stacked Bimodal Auto-Encoder (Stacked-BAE) to the third layer. On the one hand, our model introduced Stacked-BAE to learn the rich cross-modal correlation further and enhanced learning ability of the model. On the other hand, we utilized the layer-wise learning method to obtain the inter-modal multi-level correlation, which improves the accuracy of cross-modal retrieval. Extensive experiments on several cross-modal datasets show that our model is superior to the baseline correlation analysis and three common multi-modal deep models regarding cross-modal retrieval tasks.
Date of Conference: 07-09 June 2019
Date Added to IEEE Xplore: 29 July 2019
ISBN Information:
Electronic ISSN: 2573-3311
Conference Location: Guilin, China

References

References is not available for this document.