Social-oriented visual image search

https://doi.org/10.1016/j.cviu.2013.06.011Get rights and content

Highlights

  • We exploit social information to help understanding user intention in image search.

  • We propose a novel scenario by combining social factor and visual factor.

  • Social relevance is important in predicting user intention.

  • For best performance, the weights of social factor and visual factor need a balance.

  • Experiment result shows effectiveness in both social relevance and visual quality

Abstract

Many research have been focusing on how to match the textual query with visual images and their surrounding texts or tags for Web image search. The returned results are often unsatisfactory due to their deviation from user intentions, particularly for queries with heterogeneous concepts (such as “apple”, “jaguar”) or general (non-specific) concepts (such as “landscape”, “hotel”). In this paper, we exploit social data from social media platforms to assist image search engines, aiming to improve the relevance between returned images and user intentions (i.e., social relevance). Facing the challenges of social data sparseness, the tradeoff between social relevance and visual relevance, and the complex social and visual factors, we propose a community-specific Social-Visual Ranking (SVR) algorithm to rerank the Web images returned by current image search engines. The SVR algorithm is implemented by PageRank over a hybrid image link graph, which is the combination of an image social-link graph and an image visual-link graph. By conducting extensive experiments, we demonstrated the importance of both visual factors and social factors, and the advantages of social-visual ranking algorithm for Web image search.

Introduction

Image search engines play the role of a bridge between user intentions and visual images. By simply representing user intentions with textual query, many existing research works have been focusing on how to match the textual query with visual images and their surrounding texts or tags. However, the returned results are often unsatisfactory due to their deviation from user intentions. Let’s take the image search case “jaguar” as an example scenario, as shown in Fig. 1. Different users have different intentions when inputting the query “jaguar”. Some are expecting leopard images, while others are expecting automobile images. This scenario is quite common, particularly for queries with heterogeneous concepts (such as “apple”, “jaguar”) or general (non-specific) concepts (such as “landscape”, “hotel”). This raises a fundamental but rarely-researched problem in Web image search: how to understand user intentions when users conducting image search?

In the past years, this problem is very difficult to resolve due to the lack of social (i.e., inter-personal and personal) or personal data to reveal user intentions. On one hand, the user search logs, which contain rich user information, are maintained by search engine companies and kept confidential; on the other hand, the lack of ID (user identifier) information in the user search logs makes them hard to be exploited for intention representation and discovery. However, with the development of social media platforms, such as Flickr and Facebook, the way people can get social (including personal) data has been changed: users’ profiles, interests and their favorite images are exposed online and open to public, which are crucial information sources to implicitly understand user intentions.

Thus, let’s imagine a novel and interesting image search scenario: what if we know users’ Flickr ID when they conducting image search with textual queries? Can we exploit users’ social information to understand their intentions, and further improve the image search performances? In this paper, we exploit social data from social media platforms to assist image search engines, aiming to improve the relevance between returned images and user intentions (i.e., user interests), which is termed as Social Relevance.

However, the combination of social media platforms and image search engine is not easy in that:

  • (1)

    Social data sparseness. With respect to image search, the most important social data is the favored images of users. However, the large volume of users and images intrinsically decide the sparseness of user-image interactions. Therefore most users only possess a small number of favored images, from which it is difficult to discover user intentions. This problem can be alleviated by grouping users into communities, with the hypothesis that users in the same community share similar interests. Thus, a community-specific method is more practical and effective than a user-specific method.

  • (2)

    The tradeoff between social relevance and visual relevance. Although this paper aims to improve the social relevance of returned image search results, there still exists another important aspect: the Visual Relevance between the query and returned images. The visual relevance may guarantee the quality and representativeness of returned images for the query, while the social relevance may guarantee the interest of returned images for the user, both of which are necessary for good search results. Thus, both social relevance and visual relevance are needed to be addressed and subtly balanced.

  • (3)

    Complex factors. To generate the final image ranking, we need to consider the user query, returned images from current search engines, and many complex social factors (e.g. interest groups, group-user relations, group-image relations, etc.) derived from social media platforms. How to integrate these heterogeneous factors in an effective and efficient way is quite challenging. In order to deal with the above issues, in this paper, we propose a community-specific Social-Visual Ranking (SVR) algorithm to rerank the Web images returned by current image search engines. More specifically, given the preliminary image search results (returned by current image search engines, such as Flickr search and Google Image) and the user’s Flickr ID, we will use group information in social platform and visual contents of the images to rerank the Web images for a group that the user belongs to, which is termed as the user’s membership group. The SVR algorithm is implemented by PageRank over a hybrid image link graph, which is the combination of an image social-link graph and an image visual-link graph. In the image social-link graph, the weights of the edges are derived from social strength of the groups. In the image visual-link graph, the weights of the edges are based on visual similarities. Through SVR, the Web images are reranked according to their interests to the users while maintaining high visual quality and representativeness for the query.

It is worthwhile to highlight our contributions as follows:

  • (1)

    We propose a novel image search scenario by combining the information in social media platforms and image search engines to address the user intention understanding problem in Web image search, which is of ample significance to improve image search performances.

  • (2)

    We propose a community-specific social-visual ranking algorithm to rerank Web images according to their social relevances and visual relevances. In this algorithm, complex social and visual factors are effectively and efficiently incorporated by hybrid image link graph, and more factors can be naturally enriched.

  • (3)

    We have conducted intensive experiments, indicated the importance of both visual factors and social factors, and demonstrated the advantages of social-visual ranking algorithms for Web image search. Except image search, our algorithm can also be straightforwardly applied in other related areas, such as product recommendation and personalized advertisement.

The rest of the paper is organized as follows. We introduce some related works in Section 2. Image link graph generation and image ranking is presented in Section 3. Section 4 presents the details and analysis of our experiments. Finally, Section 5 concludes the paper.

Section snippets

Related work

Aiming at improving the visual relevance, a series of methods are proposed based on incorporating visual factors into image ranking. The approaches can be classified into three categories: classification [1], [2], [3], clustering [4] and link graph analysis [5], [6], [7]. An essential problem in these methods is to measure the visual similarity [8], assuming that similar images should have similar ranks. Besides, many kinds of features can be selected to estimate the similarity, including

Social-visual reranking

Fig. 2 illustrates the framework of our social-visual ranking algorithm. In this framework there are four major intermediate results: global group link graph for group ranking, local group link graph, image social-link graph and image visual-link graph. Images are reranked by PageRank based on the linear combination of the image social-link graph and the image visual-link graph. We will first analyze the factors in random walk, then we will show the details of the definition of each graph.

Dataset and settings

To implement our algorithm, we conduct experiments with data including images, groups, users, group-user relations and group-image relations from Flickr.com. Thirty queries are collected and 1000 images are downloaded for each query. These selected queries cover a series of categories tightly related to our daily life, including:

  • 1.

    Daily articles with no less than two different meanings, such as “apple”, “jaguar” and “golf”.

  • 2.

    Natural scenery photos with multiple visual categories, such as

Conclusions and future work

In this paper, we propose a novel framework of community-specific social-visual image ranking for Web image search. We explore to combine the social factor and visual factor together based on image link graph to improve the performance of social relevance under the premise of visual relevance. Comprehensive experiments show effectiveness of our approach. Our proposed method is significantly better than VisualRank and Flickr search engine in social relevance as well as visual relevance. Besides,

Acknowledgments

This work is supported by National Natural Science Foundation of China, No. 61370022, No. 61003097, No. 60933013, and No. 61210008; International Science and Technology Cooperation Program of China, No. 2013DFG12870; National Program on Key Basic Research Project, No. 2011CB302206. This work is also supported in part to Dr. Qi Tian by ARO grant W911NF-12-1-0057, NSF IIS 1052851, Faculty Research Awards by Google, FXPAL, and NEC Laboratories of America, and 2012 UTSA START-R Research Award

References (32)

  • A. Broder et al.

    Graph structure in the Web

    Comput. Netw.

    (2000)
  • W. Zhou et al.

    Latent visual context learning for web image applications

    Pattern Recognit.

    (2011)
  • R. Yan et al.

    Multimedia search with pseudo-relevance feedback

  • Y. Yang et al.

    A multimedia retrieval framework based on semi-supervised ranking and relevance feedback

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • Y. Yang et al.

    Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval

    IEEE Trans. Multimedia

    (2008)
  • W.H. Hsu et al.

    Video search reranking via information bottleneck principle

  • Y. Jing et al.

    VisualRank: applying pagerank to large-scale image search

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • J. Liu et al.

    Video search re-ranking via multi-graph propagation

  • W.H. Hsu et al.

    Video search reranking through random walk over document-level context graph

  • R.I. Kondor, J. Lafferty, Diffusion kernels on graphs and other discrete structures, in: Proceedings of the ICML, 2002,...
  • X. Tian et al.

    Bayesian video search reranking

  • H. Zitouni, S. Sevil, D. Ozkan, P. Duygulu, Re-ranking of web image search results using a graph algorithm, in: ICPR...
  • B. Geng et al.

    The role of attractiveness in web image search

  • K. Järvelin et al.

    IR evaluation methods for retrieving highly relevant documents

  • Shiliang et al.

    Descriptive visual words and visual phrases for image applications

  • X. Zhou et al.

    SIFT-Bag kernel for video event analysis

  • Cited by (0)

    1

    Tsinghua National Laboratory for Information Science and Technology, China.

    2

    Beijing Key Laboratory of Networked Multimedia, Tsinghua University, China.

    View full text