Elsevier

Information Sciences

Volume 281, 10 October 2014, Pages 399-417
Information Sciences

Exploiting social bookmarking services to build clustered user interest profile for personalized search

https://doi.org/10.1016/j.ins.2014.05.008Get rights and content

Abstract

Search engine users tend to write short queries, generally comprising of two or three query words. As these queries are often ambiguous or incomplete, search engines tend to return results whose rankings reflect a community of intent. Moreover, search engines are designed to satisfy the needs of the general populace, not those of a specific searcher. To address these issues, we propose two methods that use Singular Value Decomposition (SVD) to build a Clustered User Interest Profile (CUIP), for each user, from the tags annotated by a community of users to web resources of interest. A CUIP consists of clusters of semantically or syntactically related tags, each cluster identifying a topic of the user’s interest. The matching cluster, to the given user’s query, aids in disambiguation of user search needs and assists the search engine to generate a set of personalized search results. A series of experiments was executed against two data sets to judge the clustering tendency of the cluster structure CUIP, and to evaluate the quality of personalized search. The experiment results indicate that the CUIP based personalized search outperforms the baseline search and is better than the other approaches that use social bookmarking services for building a user profile and use it for personalized search.

Introduction

The abundance of information available on the Web has made search engines (SEs) an indispensable tool. Higher availability of information means that there is a greater chance of finding sought-after information on the Web, but with increased complexity of discovering relevant information. While SEs do a good job of ranking results to maximize global happiness, they fail to do a very good job for specific individuals [39]; it appears that the rankings reflect a community of intent rather than the goals of individuals. There are many reasons for the ineffectiveness of SEs. First, user queries are of poor quality: the average length of user queries ranges between two to three words [36]; such short queries cannot effectively describe the user search intent or user information needs. Second, some queries are polysemous [33]: they have different meanings in different contexts; hence it is impossible for the SE to judge the user intent from the short polysemous queries.

The major shortcoming of SEs is the inability to incorporate user modeling with search and unadaptiveness to individual users. Personalization has emerged as an appealing approach when dealing with the issues caused by the variation of on-line behaviors and individual differences observed in user interests, information needs, search goals, query contexts, and others [5]. Many methods [1], [4], [8], [15], [16], [26], [28], [41], [44], [45] are proposed to study user search behavior and use it to build a profile of user interests based on the user interactions on the Web. These methods focus on analyzing the content of queries and web pages, but in some cases there are no suitable descriptors available, such as topics and genres, that can be used to accurately describe user interests. Moreover, these methods rely on mining data sources, such as user’s email [38], click-through history [4], desktop files [10], and bookmark history [17], all of which tend to be noise-infested. Information sources, such as social bookmarking services, which are low on noise, and provide precise information, are very much desirable. Social bookmarking services, such as flickr, Delicious, and Pinterest, allow users to annotate resources; this facilitates management, organization, and sharing of resources [41], and also provides an indication of sources of user interests. Noll and Meinel [28], Xu et al. [45] proposed solutions that use a social bookmarking service to extract tags from resources of user interests, and use the tags for building a User Interest Profile (UIP). User interests can be viewed as contextual variants that may help to disambiguate the user query intent when the original query is vague or there are too many search results that the user has to wade through to find the most relevant ones.

This paper makes the following contributions:

  • 1.

    We propose two methods to build a CUIP for personalized search: one that uses Singular Value Decomposition (SVD) to generate svdCUIP, and the other a variation of SVD, modSVD, to geneate a modSvdCUIP. A set of pairs of the form (t,tw), where t is a tag and tw is the accumulated weight of the tag t, constitutes a User Interest Profile (UIP). A CUIP is defined as a set of term clusters, where each term cluster consists of semantically related tags of user interests and tag weights.

  • 2.

    An automatic evaluation method is proposed to test the proposed methods with the baseline search and folksonomy based personalized search approaches.

  • 3.

    We performed experiments to evaluate the proposed methods on two different data sets. The first data set, called custom data set, was created from the search histories of 12 volunteers. This data set was organized to establish the ground truth for the evaluation of clustering tendency and clustering accuracy of CUIPs generated by the proposed methods. The second data set is a much bigger data set harvested from the AOL search query log. This data set was used to test the improvement in personalized search for the two proposed methods, and their comparisons with other methods.

  • 4.

    Our results show that personalized search using the modSvdCUIP is better than using the tfUIP (term frequency UIP) [28] and tfIdfUIP (term frequency Inverse Document Frequency UIP) [45], and exhibits modestly better performance than the tfIdfCUIP [34] and svdCUIP. Each cluster, in the cluster structure CUIP, identifies a topic, and the application of CUIP helps disambiguate the context of user query, which is particularly needed for vague queries.

The rest of the paper is organized as follows. Section 2 discusses the related work starting with the traditional approaches to user profiling for personalized search, followed by the current approaches to user profiling that involve social bookmarking services. Section 3 presents our proposed methods. The experiments are detailed in Section 4, and the paper is concluded in Section 5.

Section snippets

Background and related work

In this section, we first present the state-of-the-art in building a UIP, and discuss the most recent approaches to building a UIP from social bookmarking services for personalized search. Differences in approaches are tabulated in Table 1. Finally, we discuss two well-known approaches to obtaining personalized search results.

Personalized search based on CUIP

This section explains (1) how a CUIP is built from a user search history by applying matrix factorization and a clustering algorithm, and how the CUIP is used for personalized search.

Data set and experiment methodology

To examine the effectiveness of the proposed methods, we conducted a series of experiments on two different data sets. First, to evaluate the clustering tendency and clustering accuracy of the CUIP, we recruited 12 users whose search histories were harvested to construct the first data set, referred as Custom Data Set. Second, to evaluate the quality of personalized search using the proposed methods, we constructed another data set from the AOL search query log.

Conclusions and future work

In this paper, we proposed two novel methods that exploited user search history and social bookmarking services for building a Clustered User Interest Profile (CUIP) that consists of term clusters of user interests. The first method is based on the Singular Value Decomposition (SVD) to compute a tag-tag similarity matrix and use the Hierarchical Agglomerative Clustering (HAC) on the matrix to generate a cluster structure, svdCUIP. The second method is an extension of the first method, called

Acknowledgement

This research was supported by MSIP (the Ministry of Science, ICT and Future Planning), Korea, under the IT-CRSP (IT Convergence Research Support Program) (NIPA-2013-H0401-13-1001) supervised by the NIPA (National IT Industry Promotion Agency).

References (45)

  • Sergey Brin et al.

    The anatomy of a large-scale hypertextual web search engine

    Comput. Netw. ISDN Syst.

    (1998)
  • Peter J. Rousseeuw

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

    J. Comput. Appl. Math.

    (1987)
  • Fabian Abel et al.

    Interweaving public user profiles on the web

  • Fabian Abel et al.

    Analyzing user modeling on twitter for personalized news recommendations

  • Eugene Agichtein et al.

    Improving web search ranking by incorporating user behavior information

  • Eugene Agichtein et al.

    Learning user interaction models for predicting web search result preferences

  • Ioannis Arapakis, Konstantinos Athanasakos, Joemon M. Jose, A comparison of general vs personalised affective models...
  • Alexander Budanitsky, Graeme Hirst, Semantic distance in wordnet: an experimental, application-oriented evaluation of...
  • Yi Cai et al.

    Personalized search by tag-based user profile and resource profile in collaborative tagging systems

  • John M. Carroll et al.

    Interfacing Thought: Cognitive Aspects of Human–Computer Interaction

    (1987)
  • Paul-Alexandru Chirita et al.
  • Rudi L. Cilibrasi et al.

    The Google similarity distance

    IEEE Trans. Knowl. Data Eng.

    (2007)
  • Abhinandan S. Das et al.

    Google news personalization: scalable online collaborative filtering

  • Scott Deerwester et al.

    Indexing by latent semantic analysis

    J. Am. Soc. Inform. Sci.

    (1990)
  • Byron E. Dom

    An information-theoretic external cluster-validity measure

  • Zhicheng Dou et al.

    A large-scale evaluation and analysis of personalized search strategies

  • Paolo Ferragina et al.

    A personalized search engine based on web-snippet hierarchical clustering

  • Susan Gauch et al.

    Ontology based personalized search and browsing

    Web Intelli. Agent Syst.

    (2003)
  • J.C. Gower et al.

    Minimum spanning trees and single linkage cluster analysis

    J. Roy. Stat. Soc. Ser. C (Appl. Stat.)

    (1969)
  • L. Kaufman et al.

    Finding Groups in Data: An Introduction to Cluster Analysis

    (1990)
  • Diane Kelly et al.

    Implicit feedback for inferring user preference: a bibliography

    SIGIR Forum

    (2003)
  • Sherry Koshman et al.

    Web searching on the vivisimo search engine

    J. Am. Soc. Inf. Sci. Technol.

    (2006)
  • Cited by (22)

    • Collaboratively augmented UIP – Filtered RIP with relevancy mapping for personalization of web search

      2021, Information Sciences
      Citation Excerpt :

      The second methodology, i.e., CaiNTF, was a personalization methodology for web search [13], where the weights of tags both in a user profile and resource profile were determined through NTF values. The third methodology, i.e., KumSvd, was presented by Kumar et al. [8] for personalizing the user’s web search as per user preferences where both user’s own tags and its augmentation were used to model a user profile. Further, the profile was used for query disambiguation and matching documents were searched and ranked using the Vector Space Model.

    • SoTaRePo: Society-Tag Relationship Protocol based architecture for UIP construction

      2020, Expert Systems with Applications
      Citation Excerpt :

      The second methodology, denoted by BM2, is a personalization methodology for web search designed by Cai et al. (2014), where the weights of tags are normalized term frequency values. The third methodology, denoted by BM3, is presented by Kumar et al. (2014) for the construction of a UIP using own tags of user and its augmentation. So, it is an aggregation of user information predicted by term-frequency and inverse-document-frequency values and clusters of semantically similar tags.

    • Folksonomy-based user profile enrichment using clustering and community recommended tags in multiple levels

      2018, Neurocomputing
      Citation Excerpt :

      It is also being used in many recommender system to recommend an item to a user which was not visited or seen by a user before. Kumar et al. [16] devised two matrix factorization based methods for the construction of UIP: svdCUIP and modSvdCUIP. As the name suggests, the concept of Singular Value Decomposition is used to make clusters of tags used to annotate different web pages.

    • A new web personalization decision-support artifact for utility-sensitive customer review analysis

      2017, Decision Support Systems
      Citation Excerpt :

      When δ > 0.86, f deteriorated noticeably. We set δ = 0.86 because p should be prioritized [13,30]. Also, high accuracy is easier to achieve with a low δ than with a high δ [61].

    • Integrating social profile to improve the source selection and the result merging process in distributed information retrieval

      2016, Information Sciences
      Citation Excerpt :

      This framework integrated content information, user feedback, and social information, in order to resolve data sparsity and cold-start problems, to improve the recommendation efficiency. Kumar et al. [17] proposed two methods: SVD and modSVD, to build for each user, a Clustered User Interest Profile (CUIP), using the set of tags. Each (CUIP) contains many clusters, and each cluster identifying a topic of the user’s interest.

    View all citing articles on Scopus
    1

    Principal corresponding author.

    View full text