Exploiting social bookmarking services to build clustered user interest profile for personalized search
Introduction
The abundance of information available on the Web has made search engines (SEs) an indispensable tool. Higher availability of information means that there is a greater chance of finding sought-after information on the Web, but with increased complexity of discovering relevant information. While SEs do a good job of ranking results to maximize global happiness, they fail to do a very good job for specific individuals [39]; it appears that the rankings reflect a community of intent rather than the goals of individuals. There are many reasons for the ineffectiveness of SEs. First, user queries are of poor quality: the average length of user queries ranges between two to three words [36]; such short queries cannot effectively describe the user search intent or user information needs. Second, some queries are polysemous [33]: they have different meanings in different contexts; hence it is impossible for the SE to judge the user intent from the short polysemous queries.
The major shortcoming of SEs is the inability to incorporate user modeling with search and unadaptiveness to individual users. Personalization has emerged as an appealing approach when dealing with the issues caused by the variation of on-line behaviors and individual differences observed in user interests, information needs, search goals, query contexts, and others [5]. Many methods [1], [4], [8], [15], [16], [26], [28], [41], [44], [45] are proposed to study user search behavior and use it to build a profile of user interests based on the user interactions on the Web. These methods focus on analyzing the content of queries and web pages, but in some cases there are no suitable descriptors available, such as topics and genres, that can be used to accurately describe user interests. Moreover, these methods rely on mining data sources, such as user’s email [38], click-through history [4], desktop files [10], and bookmark history [17], all of which tend to be noise-infested. Information sources, such as social bookmarking services, which are low on noise, and provide precise information, are very much desirable. Social bookmarking services, such as flickr, Delicious, and Pinterest, allow users to annotate resources; this facilitates management, organization, and sharing of resources [41], and also provides an indication of sources of user interests. Noll and Meinel [28], Xu et al. [45] proposed solutions that use a social bookmarking service to extract tags from resources of user interests, and use the tags for building a User Interest Profile (UIP). User interests can be viewed as contextual variants that may help to disambiguate the user query intent when the original query is vague or there are too many search results that the user has to wade through to find the most relevant ones.
This paper makes the following contributions:
- 1.
We propose two methods to build a CUIP for personalized search: one that uses Singular Value Decomposition (SVD) to generate svdCUIP, and the other a variation of SVD, modSVD, to geneate a modSvdCUIP. A set of pairs of the form , where t is a tag and tw is the accumulated weight of the tag t, constitutes a User Interest Profile (UIP). A CUIP is defined as a set of term clusters, where each term cluster consists of semantically related tags of user interests and tag weights.
- 2.
An automatic evaluation method is proposed to test the proposed methods with the baseline search and folksonomy based personalized search approaches.
- 3.
We performed experiments to evaluate the proposed methods on two different data sets. The first data set, called custom data set, was created from the search histories of 12 volunteers. This data set was organized to establish the ground truth for the evaluation of clustering tendency and clustering accuracy of CUIPs generated by the proposed methods. The second data set is a much bigger data set harvested from the AOL search query log. This data set was used to test the improvement in personalized search for the two proposed methods, and their comparisons with other methods.
- 4.
Our results show that personalized search using the modSvdCUIP is better than using the tfUIP (term frequency UIP) [28] and tfIdfUIP (term frequency Inverse Document Frequency UIP) [45], and exhibits modestly better performance than the tfIdfCUIP [34] and svdCUIP. Each cluster, in the cluster structure CUIP, identifies a topic, and the application of CUIP helps disambiguate the context of user query, which is particularly needed for vague queries.
The rest of the paper is organized as follows. Section 2 discusses the related work starting with the traditional approaches to user profiling for personalized search, followed by the current approaches to user profiling that involve social bookmarking services. Section 3 presents our proposed methods. The experiments are detailed in Section 4, and the paper is concluded in Section 5.
Section snippets
Background and related work
In this section, we first present the state-of-the-art in building a UIP, and discuss the most recent approaches to building a UIP from social bookmarking services for personalized search. Differences in approaches are tabulated in Table 1. Finally, we discuss two well-known approaches to obtaining personalized search results.
Personalized search based on CUIP
This section explains (1) how a CUIP is built from a user search history by applying matrix factorization and a clustering algorithm, and how the CUIP is used for personalized search.
Data set and experiment methodology
To examine the effectiveness of the proposed methods, we conducted a series of experiments on two different data sets. First, to evaluate the clustering tendency and clustering accuracy of the CUIP, we recruited 12 users whose search histories were harvested to construct the first data set, referred as Custom Data Set. Second, to evaluate the quality of personalized search using the proposed methods, we constructed another data set from the AOL search query log.
Conclusions and future work
In this paper, we proposed two novel methods that exploited user search history and social bookmarking services for building a Clustered User Interest Profile (CUIP) that consists of term clusters of user interests. The first method is based on the Singular Value Decomposition (SVD) to compute a tag-tag similarity matrix and use the Hierarchical Agglomerative Clustering (HAC) on the matrix to generate a cluster structure, svdCUIP. The second method is an extension of the first method, called
Acknowledgement
This research was supported by MSIP (the Ministry of Science, ICT and Future Planning), Korea, under the IT-CRSP (IT Convergence Research Support Program) (NIPA-2013-H0401-13-1001) supervised by the NIPA (National IT Industry Promotion Agency).
References (45)
- et al.
The anatomy of a large-scale hypertextual web search engine
Comput. Netw. ISDN Syst.
(1998) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
J. Comput. Appl. Math.
(1987)- et al.
Interweaving public user profiles on the web
- et al.
Analyzing user modeling on twitter for personalized news recommendations
- et al.
Improving web search ranking by incorporating user behavior information
- et al.
Learning user interaction models for predicting web search result preferences
- Ioannis Arapakis, Konstantinos Athanasakos, Joemon M. Jose, A comparison of general vs personalised affective models...
- Alexander Budanitsky, Graeme Hirst, Semantic distance in wordnet: an experimental, application-oriented evaluation of...
- et al.
Personalized search by tag-based user profile and resource profile in collaborative tagging systems
- et al.
Interfacing Thought: Cognitive Aspects of Human–Computer Interaction
(1987)
The Google similarity distance
IEEE Trans. Knowl. Data Eng.
Google news personalization: scalable online collaborative filtering
Indexing by latent semantic analysis
J. Am. Soc. Inform. Sci.
An information-theoretic external cluster-validity measure
A large-scale evaluation and analysis of personalized search strategies
A personalized search engine based on web-snippet hierarchical clustering
Ontology based personalized search and browsing
Web Intelli. Agent Syst.
Minimum spanning trees and single linkage cluster analysis
J. Roy. Stat. Soc. Ser. C (Appl. Stat.)
Finding Groups in Data: An Introduction to Cluster Analysis
Implicit feedback for inferring user preference: a bibliography
SIGIR Forum
Web searching on the vivisimo search engine
J. Am. Soc. Inf. Sci. Technol.
Cited by (22)
Collaboratively augmented UIP – Filtered RIP with relevancy mapping for personalization of web search
2021, Information SciencesCitation Excerpt :The second methodology, i.e., CaiNTF, was a personalization methodology for web search [13], where the weights of tags both in a user profile and resource profile were determined through NTF values. The third methodology, i.e., KumSvd, was presented by Kumar et al. [8] for personalizing the user’s web search as per user preferences where both user’s own tags and its augmentation were used to model a user profile. Further, the profile was used for query disambiguation and matching documents were searched and ranked using the Vector Space Model.
SoTaRePo: Society-Tag Relationship Protocol based architecture for UIP construction
2020, Expert Systems with ApplicationsCitation Excerpt :The second methodology, denoted by BM2, is a personalization methodology for web search designed by Cai et al. (2014), where the weights of tags are normalized term frequency values. The third methodology, denoted by BM3, is presented by Kumar et al. (2014) for the construction of a UIP using own tags of user and its augmentation. So, it is an aggregation of user information predicted by term-frequency and inverse-document-frequency values and clusters of semantically similar tags.
Folksonomy-based user profile enrichment using clustering and community recommended tags in multiple levels
2018, NeurocomputingCitation Excerpt :It is also being used in many recommender system to recommend an item to a user which was not visited or seen by a user before. Kumar et al. [16] devised two matrix factorization based methods for the construction of UIP: svdCUIP and modSvdCUIP. As the name suggests, the concept of Singular Value Decomposition is used to make clusters of tags used to annotate different web pages.
A new web personalization decision-support artifact for utility-sensitive customer review analysis
2017, Decision Support SystemsCitation Excerpt :When δ > 0.86, f deteriorated noticeably. We set δ = 0.86 because p should be prioritized [13,30]. Also, high accuracy is easier to achieve with a low δ than with a high δ [61].
PerSaDoR: Personalized social document representation for improving web search
2016, Information SciencesIntegrating social profile to improve the source selection and the result merging process in distributed information retrieval
2016, Information SciencesCitation Excerpt :This framework integrated content information, user feedback, and social information, in order to resolve data sparsity and cold-start problems, to improve the recommendation efficiency. Kumar et al. [17] proposed two methods: SVD and modSVD, to build for each user, a Clustered User Interest Profile (CUIP), using the set of tags. Each (CUIP) contains many clusters, and each cluster identifying a topic of the user’s interest.
- 1
Principal corresponding author.