Exploiting social bookmarking services to build clustered user interest profile for personalized search

doi:10.1016/j.ins.2014.05.008

Information Sciences

Volume 281, 10 October 2014, Pages 399-417

https://doi.org/10.1016/j.ins.2014.05.008 Get rights and content

Abstract

Search engine users tend to write short queries, generally comprising of two or three query words. As these queries are often ambiguous or incomplete, search engines tend to return results whose rankings reflect a community of intent. Moreover, search engines are designed to satisfy the needs of the general populace, not those of a specific searcher. To address these issues, we propose two methods that use Singular Value Decomposition (SVD) to build a Clustered User Interest Profile (CUIP), for each user, from the tags annotated by a community of users to web resources of interest. A CUIP consists of clusters of semantically or syntactically related tags, each cluster identifying a topic of the user’s interest. The matching cluster, to the given user’s query, aids in disambiguation of user search needs and assists the search engine to generate a set of personalized search results. A series of experiments was executed against two data sets to judge the clustering tendency of the cluster structure CUIP, and to evaluate the quality of personalized search. The experiment results indicate that the CUIP based personalized search outperforms the baseline search and is better than the other approaches that use social bookmarking services for building a user profile and use it for personalized search.

Introduction

The abundance of information available on the Web has made search engines (SEs) an indispensable tool. Higher availability of information means that there is a greater chance of finding sought-after information on the Web, but with increased complexity of discovering relevant information. While SEs do a good job of ranking results to maximize global happiness, they fail to do a very good job for specific individuals [39]; it appears that the rankings reflect a community of intent rather than the goals of individuals. There are many reasons for the ineffectiveness of SEs. First, user queries are of poor quality: the average length of user queries ranges between two to three words [36]; such short queries cannot effectively describe the user search intent or user information needs. Second, some queries are polysemous [33]: they have different meanings in different contexts; hence it is impossible for the SE to judge the user intent from the short polysemous queries.

The major shortcoming of SEs is the inability to incorporate user modeling with search and unadaptiveness to individual users. Personalization has emerged as an appealing approach when dealing with the issues caused by the variation of on-line behaviors and individual differences observed in user interests, information needs, search goals, query contexts, and others [5]. Many methods [1], [4], [8], [15], [16], [26], [28], [41], [44], [45] are proposed to study user search behavior and use it to build a profile of user interests based on the user interactions on the Web. These methods focus on analyzing the content of queries and web pages, but in some cases there are no suitable descriptors available, such as topics and genres, that can be used to accurately describe user interests. Moreover, these methods rely on mining data sources, such as user’s email [38], click-through history [4], desktop files [10], and bookmark history [17], all of which tend to be noise-infested. Information sources, such as social bookmarking services, which are low on noise, and provide precise information, are very much desirable. Social bookmarking services, such as flickr, Delicious, and Pinterest, allow users to annotate resources; this facilitates management, organization, and sharing of resources [41], and also provides an indication of sources of user interests. Noll and Meinel [28], Xu et al. [45] proposed solutions that use a social bookmarking service to extract tags from resources of user interests, and use the tags for building a User Interest Profile (UIP). User interests can be viewed as contextual variants that may help to disambiguate the user query intent when the original query is vague or there are too many search results that the user has to wade through to find the most relevant ones.

This paper makes the following contributions:

1.
We propose two methods to build a CUIP for personalized search: one that uses Singular Value Decomposition (SVD) to generate svdCUIP, and the other a variation of SVD, modSVD, to geneate a modSvdCUIP. A set of pairs of the form $(t, tw)$ , where t is a tag and tw is the accumulated weight of the tag t, constitutes a User Interest Profile (UIP). A CUIP is defined as a set of term clusters, where each term cluster consists of semantically related tags of user interests and tag weights.
2.
An automatic evaluation method is proposed to test the proposed methods with the baseline search and folksonomy based personalized search approaches.
3.
We performed experiments to evaluate the proposed methods on two different data sets. The first data set, called custom data set, was created from the search histories of 12 volunteers. This data set was organized to establish the ground truth for the evaluation of clustering tendency and clustering accuracy of CUIPs generated by the proposed methods. The second data set is a much bigger data set harvested from the AOL search query log. This data set was used to test the improvement in personalized search for the two proposed methods, and their comparisons with other methods.
4.
Our results show that personalized search using the modSvdCUIP is better than using the tfUIP (term frequency UIP) [28] and tfIdfUIP (term frequency Inverse Document Frequency UIP) [45], and exhibits modestly better performance than the tfIdfCUIP [34] and svdCUIP. Each cluster, in the cluster structure CUIP, identifies a topic, and the application of CUIP helps disambiguate the context of user query, which is particularly needed for vague queries.

The rest of the paper is organized as follows. Section 2 discusses the related work starting with the traditional approaches to user profiling for personalized search, followed by the current approaches to user profiling that involve social bookmarking services. Section 3 presents our proposed methods. The experiments are detailed in Section 4, and the paper is concluded in Section 5.

Section snippets

Background and related work

In this section, we first present the state-of-the-art in building a UIP, and discuss the most recent approaches to building a UIP from social bookmarking services for personalized search. Differences in approaches are tabulated in Table 1. Finally, we discuss two well-known approaches to obtaining personalized search results.

Personalized search based on CUIP

This section explains (1) how a CUIP is built from a user search history by applying matrix factorization and a clustering algorithm, and how the CUIP is used for personalized search.

Data set and experiment methodology

To examine the effectiveness of the proposed methods, we conducted a series of experiments on two different data sets. First, to evaluate the clustering tendency and clustering accuracy of the CUIP, we recruited 12 users whose search histories were harvested to construct the first data set, referred as Custom Data Set. Second, to evaluate the quality of personalized search using the proposed methods, we constructed another data set from the AOL search query log.

Conclusions and future work

In this paper, we proposed two novel methods that exploited user search history and social bookmarking services for building a Clustered User Interest Profile (CUIP) that consists of term clusters of user interests. The first method is based on the Singular Value Decomposition (SVD) to compute a tag-tag similarity matrix and use the Hierarchical Agglomerative Clustering (HAC) on the matrix to generate a cluster structure, svdCUIP. The second method is an extension of the first method, called

Acknowledgement

This research was supported by MSIP (the Ministry of Science, ICT and Future Planning), Korea, under the IT-CRSP (IT Convergence Research Support Program) (NIPA-2013-H0401-13-1001) supervised by the NIPA (National IT Industry Promotion Agency).

References (45)

Sergey Brin et al.
The anatomy of a large-scale hypertextual web search engine
Comput. Netw. ISDN Syst.
(1998)
Peter J. Rousseeuw
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
J. Comput. Appl. Math.
(1987)
Fabian Abel et al.
Interweaving public user profiles on the web
Fabian Abel et al.
Analyzing user modeling on twitter for personalized news recommendations
Eugene Agichtein et al.
Improving web search ranking by incorporating user behavior information
Eugene Agichtein et al.
Learning user interaction models for predicting web search result preferences
Ioannis Arapakis, Konstantinos Athanasakos, Joemon M. Jose, A comparison of general vs personalised affective models...
Alexander Budanitsky, Graeme Hirst, Semantic distance in wordnet: an experimental, application-oriented evaluation of...
Yi Cai et al.
Personalized search by tag-based user profile and resource profile in collaborative tagging systems
John M. Carroll et al.
Interfacing Thought: Cognitive Aspects of Human–Computer Interaction
(1987)

Paul-Alexandru Chirita et al.

Rudi L. Cilibrasi et al.

The Google similarity distance

IEEE Trans. Knowl. Data Eng.

(2007)

Abhinandan S. Das et al.

Google news personalization: scalable online collaborative filtering

Scott Deerwester et al.

Indexing by latent semantic analysis

J. Am. Soc. Inform. Sci.

(1990)

Byron E. Dom

An information-theoretic external cluster-validity measure

Zhicheng Dou et al.

A large-scale evaluation and analysis of personalized search strategies

Paolo Ferragina et al.

A personalized search engine based on web-snippet hierarchical clustering

Susan Gauch et al.

Ontology based personalized search and browsing

Web Intelli. Agent Syst.

(2003)

J.C. Gower et al.

Minimum spanning trees and single linkage cluster analysis

J. Roy. Stat. Soc. Ser. C (Appl. Stat.)

(1969)

L. Kaufman et al.

Finding Groups in Data: An Introduction to Cluster Analysis

(1990)

Diane Kelly et al.

Implicit feedback for inferring user preference: a bibliography

SIGIR Forum

(2003)

Sherry Koshman et al.

Web searching on the vivisimo search engine

J. Am. Soc. Inf. Sci. Technol.

(2006)

Cited by (22)

Collaboratively augmented UIP – Filtered RIP with relevancy mapping for personalization of web search
2021, Information Sciences
Citation Excerpt :
The second methodology, i.e., CaiNTF, was a personalization methodology for web search [13], where the weights of tags both in a user profile and resource profile were determined through NTF values. The third methodology, i.e., KumSvd, was presented by Kumar et al. [8] for personalizing the user’s web search as per user preferences where both user’s own tags and its augmentation were used to model a user profile. Further, the profile was used for query disambiguation and matching documents were searched and ranked using the Vector Space Model.
Personalized information retrieval has become more crucial and challenging in today’s time, as every user is demanding information according to their own perspectives. To achieve personalization, this paper proposes a methodology with a focus on the selection of an effective composition corresponding to various supporting modules of a personalization methodology. Collaborative tagging can be quite helpful in constructing User Interest Profile (UIP) and Resource Illustration Profile (RIP). The proposed methodology also focuses on UIP augmentation using multiple strategies; and a novel approach has also been designed to handle outlier tags which caused ambiguity in collective RIP. Even a good UIP and RIP alone cannot create an efficient personalization methodology; they also require a suitable mapping with user’s query requirement. Therefore, in the proposed methodology, the fuzzy satisfaction requirement-based novel mapping functions have been designed to measure query relevance score and user interest relevance score for a web resource. These scores have been further used to calculate the post-relevance score of a web resource after a suitable trade-off. Experiments using the del.icio.us dataset show that the proposed methodology has outperformed each and every baseline by a considerable margin.
SoTaRePo: Society-Tag Relationship Protocol based architecture for UIP construction
2020, Expert Systems with Applications
Citation Excerpt :
The second methodology, denoted by BM2, is a personalization methodology for web search designed by Cai et al. (2014), where the weights of tags are normalized term frequency values. The third methodology, denoted by BM3, is presented by Kumar et al. (2014) for the construction of a UIP using own tags of user and its augmentation. So, it is an aggregation of user information predicted by term-frequency and inverse-document-frequency values and clusters of semantically similar tags.
As with the advancement of web services, there has been a rapid proliferation in web size and number of web users, where, each user holds a different viewpoint towards the same information. This, in turn, has become a big challenge for the web search platforms to interpret the preferences of the users and provide the desired information to them. The most suitable solution to the problem of search platforms is personalization of web search. A personalization system is a kind of expert and intelligent system which can automatically learn about the preferences of a user so that the system can provide the search results as per their relevance to a user. The process of acquiring knowledge about user’s preferences by a personalization system is known as User Interest Profile (UIP). In the field of search personalization, it can also not be denied that only an efficient and complete UIP can lead to an effective and high performing web search personalization methodology design. But most of the studies conducted for web search personalization have only focused on UIP modeling without any thought about the quality of UIP. Rather limited attention has been paid to sparsity issue of UIP modeling. In this paper, we propose a novel protocol based architecture model to create an efficient UIP by exploiting direct and indirect interest of a user. Direct interest aims at mining user’s preferences from his own activities on a social information platform. The explicitly defined society and real-world activity relationships of a user on a social platform are used to predict his indirect interest as UIP constructed solely on the basis of direct interest is sparse and ineffective. In order to unearth user’s activity relationships the concept of semantic relatedness, computed using Word2vec model, has been used. Moreover, different trust levels in society relationships have also been incorporated into the proposed model to facilitate the prediction of user’s indirect interest. A series of experiments have been conducted on a del.icio.us dataset to evaluate the effectiveness of the proposed model. The results show that the model has outperformed each and every baseline in relation to complete and efficient UIP construction.
Folksonomy-based user profile enrichment using clustering and community recommended tags in multiple levels
2018, Neurocomputing
Citation Excerpt :
It is also being used in many recommender system to recommend an item to a user which was not visited or seen by a user before. Kumar et al. [16] devised two matrix factorization based methods for the construction of UIP: svdCUIP and modSvdCUIP. As the name suggests, the concept of Singular Value Decomposition is used to make clusters of tags used to annotate different web pages.
Folksonomy (aka Collaborative tagging) systems provide a platform to the users where they can annotate a web resource by using any tag of interest. It is a first-hand information directly given by user without any middleman modification, therefore, it is more reliable than any other means. This paper proposes a novel methodology to construct a strong User Interest Profile (UIP) by exploiting user’s own activities and other activities occurring in user’s social network. UIP will provide a complete list of user preferences along with his level of interest in that preference. The proposed methodology is different from other strategies used for UIP enrichment as user’s own tags are not enough to construct a strong UIP. In the current research work, two strategies have been employed for the enrichment of UIP. First one is clustering of tags based on the concept of semantic relatedness between two tags in the real world. This has been measured using Word2vec model. The second one is the utilization of user’s real friendship network. It is believed that the present work is the first one to integrate the concept of semantic relatedness for tag clustering. The performance of proposed methodology has been evaluated on the basis of evaluation metrics i.e. MRR, imp, completeness and P@k using a dataset of del.icio.us. To analyse the impact of parameters, similarity measure and number of clusters in cluster set, on the performance of UIP constructed by proposed methodology extensive experiments are performed. The results reveal that the proposed methodology outperforms all the state of the art methodologies in terms of accurate and efficient UIP construction for every value of the parameters under consideration.
A new web personalization decision-support artifact for utility-sensitive customer review analysis
2017, Decision Support Systems
Citation Excerpt :
When δ > 0.86, f deteriorated noticeably. We set δ = 0.86 because p should be prioritized [13,30]. Also, high accuracy is easier to achieve with a low δ than with a high δ [61].
In recent years there has been increased consumer use of the vast array of online reviews. Given the increasingly high volume of such reviews, automatic analyses of their quality have become imperative. Not surprisingly, this situation has attracted the interest of researchers. However, prior approaches are insufficient to address the consumers' need for non-burdensome sense making of online reviews. This research attempts to close this gap by proposing novel design science artifacts (i.e. construct, architecture, algorithms and prototype) to address the consumers' need. We evaluate these artifacts using a set of experiments and hypothesis tests. The results validate the effectiveness and efficiency of the proposed artifacts. We demonstrate their practical utility and relevance using real world pilot experiments. This paper contributes theoretical knowledge to the review quality literature and, what we believe is the first exemplifier for adequately validating the solutions of review quality research.
PerSaDoR: Personalized social document representation for improving web search
2016, Information Sciences
In this paper, we discuss a contribution towards the integration of social information in the index structure of an IR system. Since each user has his/her own understanding and point of view of a given document, we propose an approach in which the index model provides a Personalized Social Document Representation (PerSaDoR) of each document per user based on his/her activities in a social tagging system. The proposed approach relies on matrix factorization to compute the PerSaDoR of documents that match a query, at query time. The complexity analysis shows that our approach scales linearly with the number of documents that match the query, and thus, it can scale to very large datasets. PerSaDoR has been also intensively evaluated by an offline study and by a user survey operated on a large public dataset from delicious showing significant benefits for personalized search compared to state of the art methods.
Integrating social profile to improve the source selection and the result merging process in distributed information retrieval
2016, Information Sciences
Citation Excerpt :
This framework integrated content information, user feedback, and social information, in order to resolve data sparsity and cold-start problems, to improve the recommendation efficiency. Kumar et al. [17] proposed two methods: SVD and modSVD, to build for each user, a Clustered User Interest Profile (CUIP), using the set of tags. Each (CUIP) contains many clusters, and each cluster identifying a topic of the user’s interest.
In this paper we present a new personalized approach, which integrates a social profile into a distributed search system. Most previous studies on distributed information retrieval are based on textual information, and rarely consider any social information. Based on this observation, we propose an approach which exploits the social profile and the different relations between social entities. We believe that this method can: (i) enhance a query expansion, (ii) personalize and improve both the source selection and the result merging process in distributed information retrieval systems.

View all citing articles on Scopus

¹: Principal corresponding author.

View full text