Abstract
With the popularity of social networks, people can easily generate rich content with multiple modalities. How to effectively and simply estimate the similarity of multi-modal content is becoming more and more important for providing better information searching service of rich media. This work attempts to enhance the similarity estimation so as to improve the accuracy of multi-modal data searching. Toward this end, a novel multi-modal feature extraction approach, which involves the neighborhood reversibility verifying of information objects with different modalities, is proposed to build reliable similarity estimation among multimedia documents. By verifying the neighborhood reversibility in both single- and multi-modal instances, the reliability of multi-modal subspace can be remarkably improved. In addition, a new adaptive strategy, which fully employs the distance distribution of returned searching instances, is proposed to handle the neighbor selection problem. To further address the out-of-sample problem, a new prediction scheme is proposed to predict the multi-modal features for new coming instances, which is essentially to construct an over-complete set of bases. Extensive experiments demonstrate that introducing the neighborhood reversibility verifying can significantly improve the searching accuracy of multi-modal documents.
Similar content being viewed by others
References
Bokhari M, Hasan F (2013) Multimodal information retrieval: challenges and future trends. Int J Comput Appl 74(14):9–12
Borlund P (2016) Interactive information retrieval: an evaluation perspective. In: Proceedings of the 2016 ACM on conference on human information interaction and retrieval. ACM, pp 151–151
Chandrasekhar V, Sharifi M, Ross DA (2011) Survey and evaluation of audio fingerprinting schemes for mobile audio search. In: ISMIR
Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval. ACM, p 48
Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14(3):734–746
Fan J, Li G, Zhou L, Chen S, Hu J (2012) Seal: spatio-textual similarity search. Proceedings of the VLDB Endowment 5(9):824–835
Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for ν-support vector regression. Neural Netw 67:140–150
Jegou H, Schmid C, Harzallah H, Verbeek J (2010) Accurate image search using the contextual dissimilarity measure. IEEE Trans Pattern Anal Mach Intell 32(1):2–11
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM International conference on multimedia, pp 675–678
Johnson J, Krishna R, Stark M, Li LJ, Shamma DA, Bernstein MS, Fei-Fei L (2015) Image retrieval using scene graphs. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3668–3678
Kalpathycramer J, De Herrera AGS, Demnerfushman D, Antani S, Bedrick S, Muller H (2015) Evaluating performance of biomedical image retrieval systems–an overview of the medical image retrieval task at imageclef 2004–2013. Comput Med Imaging Graph 39:55–61
Knight PA (2008) The sinkhorn-knopp algorithm: convergence and applications. SIAM J Matrix Anal Appl 30(1):261–275
Li Y, Wang P, Su Y (2015) Robust image hashing based on selective quaternion invariance. IEEE Signal Process Lett 22(12):2396–2400
Li Y, Zeng S, Yang Y (2015) Image matching with multi-order features. IEEE Signal Process Lett 22(12):2214–2218
Mao X, Lin B, Cai D, He X, Pei J Parallel field alignment for cross media retrieval. In: Proceedings of the 21st ACM international conference on Multimedia, ACM, pp 897–906
Masci J, Bronstein M, Bronstein A (2014) J.schmidhuber, Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 251–260
Ren J, Jiang X, Yuan J (2015) LBP Encoding schemes jointly utilizing the information of current bit and other lbp bits. IEEE Signal Process Lett 22(12):2373–2377
Sánchez J., Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Shen L, Sun G, Huang Q, Wang S, Lin Z, Wu E (2015) Multi-level discriminative dictionary learning with application to large scale image classification. IEEE Trans Image Process 24(10):3109–3123
Wang H, Wang J (2014) An effective image representation method using kernel classification. In: 2014 IEEE 26th international conference on tools with artificial intelligence. IEEE, pp 853–858
Wang M, Hua X. -S., Tang J, Hong R (2009) Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Trans Multimedia 11 (3):465–476
Wang F, Zuo W, Zhang L, Meng D, Zhang D (2015) A kernel classification framework for metric learning. IEEE Transactions on Neural Networks and Learning Systems 26(9):1950–1962
Wang J, Shi L, Wang H, Meng J, Wang JJ-Y, Sun Q, Gu Y Optimizing top precision performance measure of content-based image retrieval by learning similarity function. arXiv:1604.06620
Wang J, Zhou Y, Duan K, Wang JJ-Y, Bensmail H (2015) Supervised cross-modal factor analysis for multiple modal data classification. In: 2015 IEEE international conference on systems, man, and cybernetics. IEEE, pp 1882–1888
Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S Modality-dependent cross-media retrieval. ACM Trans Intell Syst Technol 7(4)(57):1–13
Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406
Wu F, Zhang H, Zhuang Y (2006) Learning semantic correlations for cross-media retrieval. In: IEEE international conference on image processing, pp 1465–1468
Xia Z, Feng X, Peng J, Wu J, Fan J (2015) A regularized optimization framework for tag completion and image retrieval. Neurocomputing 147:500–508
Xia Z, Wang X, Sun X, Wang Q A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: ACM international conference on multimedia, pp 175–184
Zhang H, Weng J (2006) Measuring multi-modality similarities via subspace learning for cross-media retrieval. In: Advances in multimedia information processing, pp 979–988
Zhang S, Yang M, Cour T, Yu K, Metaxas DN (2015) Query specific rank fusion for image retrieval. IEEE Trans Pattern Anal Mach Intell 37(4):803–815
Zhangjie F, Xingming S, Qi L, Lu Z, Jiangang S (2015) Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans Commun 98(1):190–200
Zheng Z, Zhao Y, Wei S, Zhu Z (2013) Neighborhood reversibility verifying for image search. In: IEEE international conference on multimedia and expo (ICME), pp 1–6
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: SIGIR, pp 415–424
Zhou Z, Wang Y, Wu QJ, Yang C-N, Sun X (2017) Effective and efficient global context verification for image copy detection. IEEE Trans Inf Forensics Secur 12(1):48–63
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (No.61572065, No.61532005), Joint Fund of Ministry of Education of China and China Mobile (No.MCM20160102), and Fundamental Research Funds for the Central Universities (No.2015JBM028, No.2015JBZ002).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, S., Zhao, Y., Yang, T. et al. Enhancing heterogeneous similarity estimation via neighborhood reversibility. Multimed Tools Appl 77, 1437–1452 (2018). https://doi.org/10.1007/s11042-017-4347-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4347-0