An automatic image-text alignment method for large-scale web image retrieval

Zhang, Baopeng; Qu, Yanyun; Peng, Jinye; Fan, Jianping

doi:10.1007/s11042-016-4059-x

An automatic image-text alignment method for large-scale web image retrieval

Published: 27 October 2016

Volume 76, pages 21401–21421, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Baopeng Zhang ORCID: orcid.org/0000-0003-2592-2354¹,
Yanyun Qu²,
Jinye Peng³ &
…
Jianping Fan⁴

417 Accesses
2 Citations
Explore all metrics

Abstract

For reducing huge uncertainty on the relatedness between the web images and their auxiliary text terms, an automatic image-text alignment algorithm is developed to achieve more accurate indexing and retrieval of large-scale web images by assigning the web images into their most relevant visual text terms precisely. First, large-scale web pages are crawled, where the informative images and their most relevant auxiliary text blocks are extracted. Second, parallel image clustering is performed to partition large-scale informative web images into a large number of clusters. By grouping the visually-similar web images into the same cluster, our parallel image clustering algorithm can significantly reduce the huge uncertainty on the relatedness between the web images and their auxiliary text terms, which can provide a good starting point for supporting automatic image-text alignment. Finally, a relevance re-ranking algorithm is developed to identify the most relevant text terms for characterizing the semantics of the visually-similar web images in the same cluster, e.g., assigning the web images into their most relevant visual text terms. Our experiments on large-scale web images have obtained very positive results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive and analytical review of text clustering techniques

Article 08 April 2024

RefinerHash: a new hashing-based re-ranking technique for image retrieval

Article 08 April 2024

Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval

Article 11 April 2024

References

Barnard K, Duygulu P, Forsyth D, Freitas ND, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135
MATH Google Scholar
Berg TL, Berg AC, Edwards J, Forsyth DA (2004) Whos in the picture?. In: Advances in Neural Information Processing Systems. NIPS2004, pp 137–144
Blei DM, Jordan MI (2003) Modeling annotated data. Paper presented at the Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, Toronto
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Cai D, He X, Li Z, Ma W-Y, Wen J-R (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. Paper presented at the Proceedings of the 12th annual ACM international conference on Multimedia, New York
Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern analysis and machine intelligence 29(3):394–410. doi:10.1109/TPAMI.2007.61
Cheng D, Rongrong J, Dacheng T, Xinbo G, Xuelong L (2014) Weakly supervised Multi-Graph learning for robust image reranking. IEEE Transactions on Multimedia 16(3):785–795. doi:10.1109/TMM.2014.2298841
Chong W, Blei D, Fei-Fei L (2009) Simultaneous image classification and annotation. In: IEEE conference on Computer vision and pattern recognition, 2009. CVPR 2009, pp 1903–1910, doi:10.1109/CVPR.2009.5206800, (to appear in print)
Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in Cross-Modal multimedia retrieval. IEEE Transactions on Pattern analysis and machine intelligence 36(3):521–535. doi:10.1109/TPAMI.2013.142
Cuicui K, Shiming X, Shengcai L, Changsheng X, Chunhong P (2015) Learning consistent feature representation for Cross-Modal multimedia retrieval. IEEE Transactions on Multimedia 17(3):370–381. doi:10.1109/TMM.2015.2390499
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. ACM Commun. 51(1):107–113. doi:10.1145/1327452.1327492
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Transactions on Pattern analysis and machine intelligence 35(8):1915–1929. doi:10.1109/TPAMI.2012.231
Feng SL, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition, 2004. CVPR 2004, vol 1002, pp II-1002–II-1009. doi:10.1109/CVPR.2004.1315274
Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from Google’s image search. In: 10th IEEE International Conference on Computer Vision, 2005. ICCV 2005, vol 1812, pp 1816–1823. doi:10.1109/ICCV.2005.142
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976. doi:10.1126/science.1136800
Fujiwara Y, Irie G, Kitahara T (2011) Fast algorithm for affinity propagation. Paper presented at the Proceedings of the 22nd international joint conference on Artificial Intelligence - Volume Volume Three, Barcelona
Gao B, Liu T-Y, Qin T, Zheng X, Cheng Q-S, Ma W-Y (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Paper presented at the Proceedings of the 13th annual ACM international conference on Multimedia, Hilton
Givoni I, Chung c, Frey BJ (2012) Hierarchical Affinity Propagation
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A Multi-View embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233. doi:10.1007/s11263-013-0658-4
Gunhee K, Seungwhan M, Sigal L (2015) Joint photo stream and blog post summarization and exploration. In: 2015 IEEE conference on Computer vision and pattern recognition (CVPR), 7–12 june 2015, pp 3081–3089. doi:10.1109/CVPR.2015.7298927
Hardoon DR, Szedmak SR, Shawe-taylor JR (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664. doi:10.1162/0899766042321814
Hofmann T (1999) Probabilistic latent semantic indexing. Paper presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley
Hofmann T (2001) Unsupervised Learning by Probabilistic Latent Semantic Analysis. Mach Learn 42(1-2):177–196. doi:10.1023/a:1007617005950
Hsu WH, Kennedy LS, Chang S-F (2006) Video search reranking via information bottleneck principle. Paper presented at the Proceedings of the 14th ACM international conference on Multimedia, Santa Barbara
Hsu WH, Kennedy LS, Chang S-F (2007) Video search reranking through random walk over document-level context graph. Paper presented at the Proceedings of the 15th ACM international conference on Multimedia, Augsburg
Jamieson M, Fazly A, Stevenson S, Dickinson S, Wachsmuth S (2010) Using language to learn structured appearance models for image annotation. IEEE Transactions on Pattern analysis and machine intelligence 32(1):148–164. doi:10.1109/TPAMI.2008.283
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. Paper presented at the Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, Toronto
Jia L, Wang JZ (2003) Automatic Linguistic Indexing of Pictures by a statistical modeling approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(9):1075–1088. doi:10.1109/TPAMI.2003.1227984
Jia Y, Wang J, Zhang C, Hua X-S (2008) Finding image exemplars using fast sparse affinity propagation. Paper presented at the Proceedings of the 16th ACM international conference on Multimedia, Vancouver
Jun-Bin Y, Chung-Hsien W, Sheng-Xiong C (2011) Unsupervised alignment of news video and text using visual patterns and textual concepts. IEEE Transactions on Multimedia 13(2):206–215. doi:10.1109/TMM.2010.2095412
Lei W, Xian-sheng H, Nenghai Y et al (2012) Flickr Distance: A Relationship Measure for Visual Concepts. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(5):863–875. doi:10.1109/TPAMI.2011.195
Li-Jia L, Socher R, Li F-F (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on Computer vision and pattern recognition, 2009. CVPR 2009. 20-25 june 2009, pp 2036–2043. doi:10.1109/CVPR.2009.5206718
Liu J, Lai W, Hua X-S, Huang Y, Li S (2007) Video search re-ranking via multi-graph propagation. In: Paper presented at the Proceedings of the 15th ACM international conference on Multimedia, Augsburg
Liu D, Hua X-S, Yang L, Wang M, Zhang H-J (2009) Tag ranking. In: Paper presented at the Proceedings of the 18th international conference on World wide web, Madrid
Lowe D (2004) Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94
Monay F, Gatica-Perez D (2007) Modeling Semantic Aspects for Cross-Media Image Indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(10):1802–1817. doi:10.1109/TPAMI.2007.1097
Mori Y (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Proceedings of 1st Intl Workshop on Multimedia Intelligent Storage and Retrieval Management, p 1999
Phi TP, Moens M, Tuytelaars T (2010) Cross-Media Alignment of names and faces. IEEE Transactions on Multimedia 12(1):13–27. doi:10.1109/TMM.2009.2036232
Quattoni A, Collins M, Darrell T (2007) Learning Visual Representations using Images with Captions. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR ’07, pp 1–8. doi:10.1109/CVPR.2007.383173
Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. Paper presented at the Proceedings of the 18th ACM international conference on Multimedia, Firenze
Rose DM, Rouly JM, Haber R, Mijatovic N, Peter AM (2014) Parallel Hierarchical Affinity Propagation with MapReduce
Satoh S, Nakamura Y, Kanade T (1999) Name-It: naming and detecting faces in news videos. IEEE MultiMedia 6(1):22–35. doi:10.1109/93.752960
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12):1349–1380. doi:10.1109/34.895972
Socher R, Li F-F (2010) Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 13–18 June 2010, pp 966–973. doi:10.1109/CVPR.2010.5540112
Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep boltzmann machines. J Mach Learn Res 15:2949–2980
MathSciNet MATH Google Scholar
Stan S, Marco L, Saratendu S (1999) Unifying textual and visual cues for Content-Based image retrieval on the world wide web. Comput Vis Image Underst 75(12):86–98. doi:10.1006/cviu.1999.0765
Tan H-K, Ngo C-W, Wu X (2008) Modeling video hyperlinks with hypergraph for web video reranking. Paper presented at the Proceedings of the 16th ACM international conference on Multimedia, Vancouver
Victor L, Manmatha R, Jiwoon J (2004) A Model for Learning the Semantics of Pictures
Wang X-J, Ma W-Y, Xue G-R, Li X (2004) Multi-model similarity propagation and its application for web image retrieval. Paper presented at the Proceedings of the 12th annual ACM international conference on Multimedia, New York
Wang C, Jing F, Zhang L, Zhang H-J (2006) Image annotation refinement using random walk with restarts. Paper presented at the Proceedings of the 14th ACM international conference on Multimedia, Santa Barbara
Weston J, Bengio S, Usunier N (2010) Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81(1):21–35. doi:10.1007/s10994-010-5198-3
Xiaogang W, Shi Q, Ke L, Xiaoou T (2014) Web Image Re-Ranking UsingQuery-Specific Semantic Signatures. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(4):810–823. doi:10.1109/TPAMI.2013.214
Yahong H, Fei W, Qi T, Yueting Z (2012) Image annotation by Input-Output structural grouping sparsity. IEEE Trans Image Process 21(6):3066–3079. doi:10.1109/TIP.2012.2183880
Yahong H, Xingxing W, Xiaochun C, Yi Y, Xiaofang Z (2014) Augmenting image descriptions using structured prediction output. IEEE Transactions on Multimedia 16(6):1665–1676. doi:10.1109/TMM.2014.2321530
Yahong H, Yi Y, Zhigang M, Haoquan S, Nicu S, Xiaofang Z (2014) Image attribute adaptation. IEEE Transactions on Multimedia 16(4):1115–1126. doi:10.1109/TMM.2014.2306092
Yansong F, Lapata M (2013) Automatic caption generation for news images. IEEE Transactions on Pattern analysis and machine intelligence 35(4):797–812. doi:10.1109/TPAMI.2012.118
Yanyun Q, Baopeng Z, Jianping F (2015) Parallel AP Clustering and Re-ranking for Automatic Image-Text Alignment and Large-Scale Web Image Search. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, shanghai, pp 451–454. doi:10.1145/2671188.2749294
Yushi J, Baluja S (2008) Visualrank: Applying PageRank to Large-Scale Image Search. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (11):1877–1890. doi:10.1109/TPAMI.2008.121
Yushi J, Michele C, David T, James MR (2013) Learning Query-Specific distance functions for Large-Scale web image search. IEEE Transactions on Multimedia 15(8):2022–2034. doi:10.1109/TMM.2013.2279663
Zhixin L, Xi L, Zhiping S, Zhongzhi S (2009) Learning image semantics with latent aspect model. In: IEEE international conference on Multimedia and expo, 2009. ICME 2009, pp 366–369. doi:10.1109/ICME.2009.5202510

Download references

Acknowledgments

This research is partly supported by National Science Foundation of China under (Grant No.61272285 and No. 61373077), National High-Technology Program of China (No.2014AA012301), National Key Technology Support Program of China (No.2014BAH24F02), Program for Changjiang Scholars and Innovative Research Team in University (No.IRT13090), and Program of Shaanxi Province Innovative Research Team (No.2014KCT-17).

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
Baopeng Zhang
Department of Computer Science, XiaMen University, Fujian, China
Yanyun Qu
School of Information Science & Technology, Northwest University, Xi’an, China
Jinye Peng
Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, USA
Jianping Fan

Authors

Baopeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanyun Qu
View author publications
You can also search for this author in PubMed Google Scholar
Jinye Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baopeng Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, B., Qu, Y., Peng, J. et al. An automatic image-text alignment method for large-scale web image retrieval. Multimed Tools Appl 76, 21401–21421 (2017). https://doi.org/10.1007/s11042-016-4059-x

Download citation

Received: 08 January 2016
Revised: 05 September 2016
Accepted: 10 October 2016
Published: 27 October 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11042-016-4059-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An automatic image-text alignment method for large-scale web image retrieval

Abstract

Access this article

Similar content being viewed by others

A comprehensive and analytical review of text clustering techniques

RefinerHash: a new hashing-based re-ranking technique for image retrieval

Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An automatic image-text alignment method for large-scale web image retrieval

Abstract

Access this article

Similar content being viewed by others

A comprehensive and analytical review of text clustering techniques

RefinerHash: a new hashing-based re-ranking technique for image retrieval

Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation