Abstract
Keyword-based image search engines are now very popular for accessing large amounts of Web images on the Internet. Most existing keyword-based image search engines may return large amounts of junk images (which are irrelevant to the given query word), because the text terms that are loosely associated with the Web images are also used for image indexing. The objective of the proposed work is to effectively filter out the junk images from image search results. Therefore, bilingual image search results for the same keyword-based query are integrated to identify the clusters of the junk images and the clusters of the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. Experiments for a large number of bilingual keyword-based queries (5,000 query words) are simultaneously performed on two keyword-based image search engines (Google Images in English and Baidu Images in Chinese), and our experimental results have shown that integrating bilingual image search results can filter out the junk images effectively.
Similar content being viewed by others
References
Barnard K, Duygulu P, Forsyth DA (2001) Clustering art. IEEE CVPR 434–441
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: Speeded up robust features. Comput Vis Image Underst 110(3):346–359
Cai D, He X, Li Z, Ma W-Y, Wen J-R (2004) Hierarchical clustering of WWW image search results using visual, textual, and link information. ACM Multimedia
Chen Y, Wang JZ, Krovetz R (2005) Clue: cluster-based retrieval of images by unsupervised learning. IEEE Trans IP 14(8):1187–1201
Ding C, He X, Zha H, Gu M, Simon H (2001) A min-max cut algorithm for graph partitioning and data clustering. In: ICDM
Fan J, Yang C, Shen Y, Babaguchi N, Luo H (2009) Leveraging large-scale weakly-tagged images to train inter-related classifiers for multi-label annotation. In: ACM Multimedia workshop on large-scale image retrieval
Feng SL, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. In: IEEE CVPR
Fergus R, Perona P, Zisserman A (2004) A visual category filter for Google images. In: Proc ECCV
Fergus R, Fei-Fei L, Perona P, Zisserman A (2006) Learning object categories from Google’s image search. In: Proc IEEE CVPR
Gao B, Liu T-Y, Qin T, Zhang X, Cheng Q-S, Ma W-Y (2005) Web image clustering by consistent utilization of visual features and surrounding texts. ACM Multimedia
Gao Y, Fan J, Luo H, Satoh S (2008) A novel approach for filtering junk images from Google search results. In: Intl conf on Multimedia Modeling (MMM’08), pp 1–12
He X, Ma W-Y, King O, Li M, Zhang HJ (2002) Learning and inferring a semantic space from user’s relevance feedback. ACM Multimedia
Jaimes A, Chang S-F, Loui AC (2003) Detection of non-identical duplicate consumer photographs. In: Proc PCM
Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: ACM CIVR
Jing Y, Baluja S (2008) PageRank for product image search. In: ACM WWW, pp 307–315
Ke Y, Sukthankar R, Huston L (2004) Effective near-duplicate detection and sub-image retrieval. ACM Multimedia
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2:83–97
Lipson P, Grimson E, Sinha P (1997) Configuration based scene classification and image indexing. In: CVPR
Loeff N, Alm CO, Forsyth DA (2006) Discriminating image senses by clustering with multimodal features. In: Proc of COLING/ACL, pp 547–554
Lowe D (2004) Distinctive image features from scale invariant keypoints. Int J Comput Vis 60:91–110
Meng Y, Chang E, Li B (2003) Enhancing dpf for near-replica image recognition. In: IEEE CVPR
Quelhas P, Monay F, Odobez J-M, Gatica-Perez D, Tuytelaars T, Van Gool LJ (2005) Modeling scenes with local descriptors and latent aspects. IEEE ICCV 883–890
Rege M, Dong M, Hua J (2008) Graph theoretical framework for simultaneously integrating visual and textual features for efficient Web image clustering. In: WWW
Rui Y, Huang TS, Ortega M, Mehrotra S (1998) Relevance feedback: a power tool in interactive content-based image retrieval. IEEE Trans CSVT 8(5):644–655
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans PAMI 22:888–905
Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans PAMI 28(7):1088–1099
Tao D, Tang X, Li X, Rui Y (2006) Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm. IEEE Trans Multimedia 8(4):716–727
Tong S, Chang EY (2001) Support vector machine active learning for image retrieval. ACM Multimedia, pp 107–118
Wang X-J, Ma W-Y, Xue G-R, Li X (2004) Multi-modal similarity propagation and its application for Web image retrieval. ACM Multimedia
Wang B, Li Z, Li M, Ma W-Y (2006) Large-scale duplicate detection for Web image search. In: IEEE ICME
Weston J, Bengio S, Usunier N (2010) Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81(1):21–35
Wnuk K, Soatto S (2008) Filtering internet image search results towards keyword based category recognition. In: CVPR
Wu X, Ngo C-W, Hauptmann AG, Tan HK (2009) Real-time near-duplicate elimination for Web video search with content and context. IEEE Trans Multimedia 11(2):196–207
Xie F, Shen Y, He X (2010) K-way min-max cut for image clustering and junk images filtering from Google images. In: ACM MM
Yang L, Hanjalic A (2010) Supervised reranking for Web image search. In: Proceedings of the international conference on multimedia, MM’10, ACM MM
Zhang D, Chang S-F (2004) Detecting image near-duplicate by stochastic attributed relational graph matching with learning. ACM Multimedia
Zhang J, Marszalek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238
Zhou X, Huang T (2001) Small sample learning during multimedia retrieval. In: Proc IEEE CVPR, pp 11–17
Acknowledgements
This work is partly supported by NSFC-61075014 and NSFC-60875016, by the Program for New Century Excellent Talents in University under Grant NCET-07-0693, NCET-08-0458 and NCET-10-0071 and the Research Fund for the Doctoral Program of Higher Education of China (Grant No.20096102110025).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, C., Peng, J., Feng, X. et al. Integrating bilingual search results for automatic junk image filtering. Multimed Tools Appl 70, 661–688 (2014). https://doi.org/10.1007/s11042-012-1051-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1051-y