Using neural word embeddings to model user behavior and detect user segments

doi:10.1016/j.knosys.2016.05.002

Knowledge-Based Systems

Volume 108, 15 September 2016, Pages 5-14

https://doi.org/10.1016/j.knosys.2016.05.002 Get rights and content

Abstract

Modeling user behavior to detect segments of users to target and to whom address ads (behavioral targeting) is a problem widely-studied in the literature. Various sources of data are mined and modeled in order to detect these segments, such as the queries issued by the users. In this paper we first show the need for a user segmentation system to employ reliable user preferences, since nearly half of the times users reformulate their queries in order to satisfy their information need. Then we propose a method that analyzes the description of the items positively evaluated by the users and extracts a vector representation of the words in these descriptions (word embeddings). Since it is widely-known that users tend to choose items of the same categories, our approach is designed to avoid the so-called preference stability, which would associate the users to trivial segments. Moreover, we make sure that the interpretability of the generated segments is a characteristic offered to the advertisers who will use them. We performed different sets of experiments on a large real-world dataset, which validated our approach and showed its capability to produce effective segments.

Introduction

Behavioral targeting is the process of detecting segments of users with similar behaviors, in order to address effective ads to them. Given the high interest that extracting effective segments has, both the industry and the academia are studying ways to model user behavior. In the industry, the systems try to monitor the behavior of the users in implicit ways, in order to extract their preferences and form the segments; usually, neither the algorithms not the data are publicly made available, to avoid disclosing both industrial secrets and private information about the users. In the academic literature it has been highlighted that classic approaches to segmentation (like k-means) cannot take into account the semantics of the user behavior [1]. Tu and Lu [2] proposed a user segmentation approach based on a semantic analysis of the queries issued by the users, while Gong et al. [1] proposed a LDA-based semantic segmentation that groups users with similar query and click behaviors.

However, several problems remain open in the literature when performing a user segmentation by considering the user behavior.

Data sources reliability. In order to satisfy the users’ information need, query reformulation characterizes nearly 50% of the queries issued by the users [3], [4], [5]. Therefore, the semantic analysis of a query is not a reliable source of information, since it does not contain any information about whether or not a query led to what the user was really looking for. Moreover, performing a semantic analysis on the items evaluated by the users, in order to perform a filtering on them, can increase the accuracy of a system [6], [7], [8]. Considering these aspects, a possible solution to this issue would be a semantic analysis on the description of the items a user positively evaluated through an explicitly given rating. However, another issue arises in cascade.

Preference stability. The analysis of reliable information about the users, such as the description of the items evaluated by them, would probably lead to trivial segments, since users tend to evaluate items of the same categories (e.g., they usually watch movies of the same genres or by the same director/actor). This problem is known as preference stability [9] and on the one hand it leads to high-quality knowledge sources, while on the other hand there is no way to target the users with serendipitous and effective ads (overspecialization [10]).

Segmentation interpretability. Another open issue widely-studied in this research area is the capability for a segmentation to be easily interpreted. A recent survey on user segmentation (mostly focused on the library domain) [11], highlighted that, in order to create a proper segmentation of the users, it is important to understand them. On the one hand, easily interpretable approaches generate trivial segments, and even a partitioning with the k-means clustering algorithm has proven to be more effective than this method [12], while on the other hand, when a larger set of features is combined, the problem of properly understanding and interpreting results arises [13], [14]. This is mostly due to the lack of guidance on how to interpret the results of a segmentation [15]. The fact that easily understandable approaches generate ineffective segments, and that more complex ones are accurate but not easy to use in practice, generates an important gap in this research area.

Our contributions. In this paper, we present an approach to user segmentation, such that the sources of information used to build it are reliable, the generated user segmentation is not trivial and it is easily interpretable.

As previously mentioned, the problem of using reliable sources of information will be solved by considering the items positively evaluated by the users with an explicitly-assigned rating. In particular, we employ the vector representation of the words in a description (word embedding) [16]. Word embeddings are built by considering as input a text corpus, that leads to the building of a vocabulary, and to the learning of the vector representation of the words. They are largely employed nowadays in several NLP tasks, such as the representations of sentences and paragraphs [17], [18], relational entities [[19], [20], general text-based attributes [21], descriptive text of images [22], and nodes in graph structure [23]. According to the authors’ knowledge, no approach uses word embeddings for user segmentation purposes.

For each class of items that the users can be targeted with (e.g., movie genres), our approach builds a vector representation based on the word embeddings, which characterizes the words that represent the class. In a similar way, we also build a user model that captures the user interests. By matching the vector representations of a class of items with the user model, thanks to a similarity metric, we can associate a user to the segment that represents that class of items. It is trivial to notice that each user can have a strong similarity also with classes of items she never evaluated (thus avoiding the preference stability problem) and that the segments can be easily interpreted (even if a class and a user are represented by tens of features in the vector, the advertiser is only required to specify which interests she wants to target). In order to allow advertisers to specify more complex targets, we also present a Boolean algebra that combines multiple classes of items with simple operations (e.g., to extract a vector representation of what characterizes comedy AND romantic movies).

More formally, the problem statement is the following:

Problem 1

We are given a set of users $U = {u_{1}, \dots, u_{N}},$ a set of items $I = {i_{1}, \dots, i_{M}},$ and a set R of ratings used to express the user preferences (e.g., $R = [1, 5]$ or $R = {l i k e, d i s l i k e}$ ). The set of all possible preferences expressed by the users is a ternary relation P⊆U × I × R. We denote as $P_{+} \subseteq P$ the subset of preferences with a positive value, as $I_{+}$ the items for which there is a positive preference, and as I_u the items positively evaluated by a user u. The set of item descriptions is denoted as $D = {d_{1}, \dots, d_{M}}$ (note that we have a description for each item, so $∣ D ∣ = ∣ I ∣$ ), and the vocabulary of the words in D is denoted as $V = {v_{1}, \dots, v_{W}}$ . Let $N W E_{v_{w}} = {l_{1}, \dots, l_{Z}}$ be the vector representation (neural word embedding) of each word v_w ∈ V. We denote as $C = {c_{1}, \dots, c_{K}}$ the set of primitive classes used to classify the items. Our first aim is to extract a vector representation of each class c_k ∈ C based on the neural word embeddings of the description of the items classified with c_k (neural class embedding, NCE), and a vector representation of each user u ∈ U (user model, m_u). The objective of this paper is to build a function f: C → U that, given a class, returns a set of users to target T⊆U, such that the similarity between the neural class embedding and the models of the users in T is higher than a threshold value.

The scientific contributions of our proposal are now presented:

•
we propose a novel use of neural word embeddings for user segmentation purposes;
•
we introduce a novel data structure, called NCE (Neural Class Embedding), able to model the words that characterize a class of items;
•
we consider, for the first time in the user segmentation literature, the reliability of the data sources. Indeed, with respect to the literature that usually performs an analysis of the queries issued by the users, we rely on the description of the items a user positively rated;
•
we avoid preference stability by considering the similarity between each user model and the vector representation of a classes of items, in order to allow the approach to include a user in a segment that represents a class of items she has never evaluated, but that is highly similar to her preferences;
•
we present a Boolean algebra that allows us to specify, in a simple but punctual way, the interests that the segment should cover; the algebra, along with the built models, avoids the interpretability issues that usually characterize the segmentations based on several features.

The rest of the paper is organized as follows: we first present the works in the literature related with our approach (Section 2), then we continue with the implementation details (Section 3) and the description of the performed experiments (Section 4), ending with some concluding remarks (Section 5).

Section snippets

Related work

Here, we report the main approaches developed in the industry and in the literature for each of the topics related to our work.

Behavioral targeting. Most of the approaches to behavioral targeting have been developed by the industry as real-word systems. Among the different types of targeting that Google’s AdWords¹ developed to present ads to the users, the closest to our proposal is “Topic targeting”, in which the system groups and reaches

Using word embeddings to model user behavior and detect user segments

Here we present the algorithms employed to build our user segmentation. The approach works in five steps:

1.
Neural word embeddings extraction: processing of the textual information of all the items, in order to remove the useless elements from the text (e.g., punctuation marks, stop words, etc.), and extract the word embeddings;
2.
Neural item embeddings extraction: for each item, we sum the correspondent vector elements of the words that compose its description, to obtain as result a vector

Experiments

This section describes the experiments performed to validate our proposal. In Section 4.1 we present the experimental setup and strategy, in Section 4.2 the dataset employed for the evaluation, Section 4.3 illustrates the involved metric, and Section 4.4 contains the results.

Conclusions and future work

This paper presented an approach to segment the users by analyzing the items positively evaluated by them, in order to consider reliable user preferences. These items were analyzed by extracting the word embeddings and by building a novel type of class model, named Neural Class Embedding. The models allowed us to understand what characterizes each class of items and to generate segments of users with certain characteristics. In this way, we designed an approach that does not generate trivial

Acknowledgments

This work is partially funded by Regione Sardegna under project NOMAD (Next generation Open Mobile Apps Development), through PIA - Pacchetti Integrati di Agevolazione “Industria Artigianato e Servizi” (annualità 2013), and by MIUR PRIN 2010-11 under project “Security Horizons”.

References (42)

S.Y. Rieh et al.
Analysis of multiple query reformulations on the web: the interactive information retrieval context
Inf. Process. Manage.
(2006)
S.C. Bourassa et al.
Defining housing submarkets
J. Hous. Econ.
(1999)
GongX. et al.
Search behavior based latent semantic user segmentation for advertising targeting
Data Mining (ICDM), 2013 IEEE 13th International Conference on
(2013)
TuS. et al.
Topic-based user segmentation for online advertising with latent dirichlet allocation
Proceedings of the 6th International Conference on Advanced Data Mining and Applications - Volume Part II
(2010)
A. Spink et al.
From e-sex to e-commerce: web search changes
Computer
(2002)
P. Boldi et al.
From ”dango” to ”japanese cakes”: query reformulation models and patterns
Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
(2009)
G. Armano et al.
Semantic enrichment of contextual advertising by using concepts
G. Armano et al.
Studying the impact of text summarization on contextual advertising
R. Saia et al.
Semantic coherence-based user profile modeling in the recommender systems context
Proceedings of the 6th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2014, Rome, Italy, October 21-24, 2014
(2014)
R.D. Burke et al.
Matching recommendation technologies and domains

P. Lops et al.

Content-based recommender systems: state of the art and trends

C. Gustav Johannsen

Understanding users: from man-made typologies to computer-generated clusters

New Libr. World

(2014)

A. Nairn et al.

Something approaching science? Cluster analysis procedures in the crm era

S. Dolnicar et al.

Methodological reasons for the theory/practice divide in market segmentation

J. Market. Manage.

(2009)

S. Dibb et al.

A program for implementing market segmentation

J. Bus. Ind. Market.

(1997)

T. Mikolov et al.

Exploiting similarities among languages for machine translation

CoRR

(2013)

N. Djuric et al.

Hierarchical neural language models for joint representation of streaming documents and their content

Proceedings of the 24th International Conference on World Wide Web

(2015)

Q.V. Le et al.

Distributed representations of sentences and documents

Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014

(2014)

A. Bordes et al.

Translating embeddings for modeling multi-relational data

R. Socher et al.

Reasoning with neural tensor networks for knowledge base completion

R. Kiros et al.

A multiplicative model for learning distributed text-based attribute representations

Cited by (27)

Node proximity preserved dynamic network embedding via matrix perturbation
2020, Knowledge-Based Systems
Citation Excerpt :
One paradigm to overcome this obstacle is network embedding, which represents each node of a given network as dense real-valued vectors in a low-dimensional latent space while preserving essential structural properties (i.e. edges and other high-order proximities [1,2]) in the network. The learned embedding vectors can benefit a variety of practical applications [3,4], such as multi-label classification [5–7], [8], link prediction [9,10], and so on. Supplies of excellent network embedding methods are proposed recently.
In recent years, network embedding has attracted extensive interests, which aims at representing nodes of an original network in a low-dimensional vector space while preserving the inherent topological structures of the network. Despite the remarkable advantages of complex networks, most existing network embedding methods are mainly focused on static networks while ignoring the evolving characteristic, which is proved to be an essential property of real-world networks. In this paper, we propose Node Proximity Preserved Dynamic Network Embedding via Matrix Perturbation (NPDNE) to tackle the dilemma. Specifically, our method implements a low-rank transformation on the normalized Laplacian matrix of the given networks and then derives the embedding vectors through generalized SVD. Subsequently, the node proximities are preserved in the embedding vectors by exploiting the eigen-decomposition reweighing theorem, which reveals the intrinsic relationship among different-order proximities. Moreover, a generalized eigen perturbation is adopted to update the embedding vectors so that the evolution of given networks can be captured over time. Finally, we conduct experiments of multi-label classification, link prediction, and visualization on several real-world datasets. The experimental results demonstrate the superiority of the proposed NPDNE model compared with state-of-the-art baselines.
Proximity-aware heterogeneous information network embedding
2020, Knowledge-Based Systems
Citation Excerpt :
Unlike the traditional homogeneous networks where all the vertexes available are in the same type, nodes and their relations of heterogeneous networks fall into multiple types, which contain a lot of potential semantics. These informative relations can benefit substantial machine learning tasks [1,2] in networks, such as node classification [3,4], node clustering [5], similarity search [6] and link prediction [7]. For example, DBLP, a typical citation network shown in Fig. 1, consists of three kinds of nodes (i.e., authors, papers and venues) and three relations (i.e., an author writes a paper, a paper is published at a venue and a paper cites another paper).
Network embedding, which aims to learn a high-quality low-dimensional representation for each node in a network, has attracted increasing attention recently. Heterogeneous information networks, with distinguishing types of nodes and relations, are one of the most significant networks. In the past years, heterogeneous information network embedding has been intensively studied. Most popular methods generate a set of node sequences, and feed them into an unsupervised feature learning model to obtain a low-dimensional vector for each node. However, the limitations of these approaches are that their generative node sequences neglect the different importances of diverse relations and they ignore the great value of proximity information which reveals whether two nodes are close or not in the network. To tackle these limitations, this paper presents a novel framework named Proximity-Aware Heterogeneous Information Network Embedding (PAHINE). The native information of a network is extracted from node sequences, which are generated by walking on a probability-sensitive metagraph. Afterwards, the extracted information is fed into deep neural networks to derive the desired embedding vectors. The experimental results on four different heterogeneous networks indicate that the proposed method is efficient and it outperforms the state-of-the-art heterogeneous networks embedding algorithms.
Bilingual embeddings with random walks over multilingual wordnets
2018, Knowledge-Based Systems
Citation Excerpt :
In later work, Goikoetxea et al. [19] show that the synthetic corpus can be combined with text corpora, yielding embeddings which combine both sources of information, and surpassing other methods in word similarity tasks. Finally, distributional representation have also been combined with network graphs with good results in areas such as sentiment analysis [23], user behavior modelling [8] or cross lingual plagiarism detection [16]. All the methods mentioned here focus on monolingual embeddings, as there is very little work on combining knowledge bases with distributional embeddings in bilingual settings.
Bilingual word embeddings represent words of two languages in the same space, and allow to transfer knowledge from one language to the other without machine translation. The main approach is to train monolingual embeddings first and then map them using bilingual dictionaries. In this work, we present a novel method to learn bilingual embeddings based on multilingual knowledge bases (KB) such as WordNet. Our method extracts bilingual information from multilingual wordnets via random walks and learns a joint embedding space in one go. We further reinforce cross-lingual equivalence adding bilingual constraints in the loss function of the popular Skip-gram model. Our experiments on twelve cross-lingual word similarity and relatedness datasets in six language pairs covering four languages show that: 1) our method outperforms the state-of-the-art mapping method using dictionaries; 2) multilingual wordnets on their own improve over text-based systems in similarity datasets; 3) the combination of wordnet-generated information and text is key for good results. Our method can be applied to richer KBs like DBpedia or BabelNet, and can be easily extended to multilingual embeddings. All our software and resources are open source.
Improving user recommendation by extracting social topics and interest topics of users in uni-directional social networks
2018, Knowledge-Based Systems
Citation Excerpt :
Also, except for topic model, there are other methods can be used to class users with their preferences. Work on [38] models user behaviors to detect segments of users to target and to whom address ads. It presents an approach to segment the users by analyzing the items positively evaluated by them, in order to consider reliable user preferences.
With the rapid growth of population on social networks, people are confronted with information overload problem. This clearly makes filtering the targeted users a demanding and key research task. Uni-directional social networks are the scenarios where users provide limited follow or not binary features. Related works prefer to utilize these follower-followee relations for recommendation. However, a major problem of these methods is that they assume every follower-followee user pairs are equally likely, and this leads to the coarse user following preferences inferring. Intuitively, a user’s adoption of others as followees may be motivated by her interests as well as social connections, hence a good recommender should be able to separate the two situations and take both factors into account for better recommendation results. In this regard, we propose a new user recommendation framework namely UIS-MF in this work. UIS-MF can well capture user preferences by involving both interest and social factors in prediction, and targeted to recommend Top-N followees who have similar interest and close social connection relevant to a target user. Specifically, we first present a unified probabilistic topic model on follower-followee relations, namely UIS-LDA, and it employs Generalized Pólya Urn (GPU) models on mutual-following relations for discovering interest topics and social topics of users. Next we propose a community-based method for user recommendation, it organizes social communities and interest communities based on the estimation of topics obtained from UIS-LDA, and then performs Matrix Factorization (MF) method on each community to generate N most likely followees for individual user. Systematic experiments on Twitter, Sina Weibo and Epinions datasets have not only revealed the significant effect of our UIS-LDA model for the extraction of interest and social topics of users in improving recommending accuracy, but also demonstrated the advantage of our proposed recommendation framework over competitive baselines by large margins.
An approach to the use of word embeddings in an opinion classification task
2016, Expert Systems with Applications
Citation Excerpt :
Although word2vec has become very popular, it is worth noting that many researchers develop their own word embedding methods, for example to integrate other types of information, like metadata or author features that are relevant to their task (Kiros, Zemel, & Salakhutdinov, 2014b; Yang & Mao, 2016). Word embeddings have also been applied to user segmentation (Boratto, Carta, Fenu, & Saia, 2016), knowledge representation (Bordes, Usunier, Garcia-Duran, Weston, & Yakhnenko, 2013; Socher, Chen, Manning, & Ng, 2013), data stream mining (Djuric, Wu, Radosavljevic, Grbovic, & Bhamidipati, 2015), information retrieval (Grbovic, Djuric, Radosavljevic, Silvestri, & Bhamidipati, 2015; Kiros, Salakhutdinov, & Zemel, 2014a), question answering (Yang, Lee, Park, & Rim, 2015) and social network analysis (Perozzi, Al-Rfou, & Skiena, 2014) among other tasks. The method most frequently used to represent text in a vector form is the bag of words or BOW, mentioned above.
In this paper we show how a vector-based word representation obtained via word2vec can help to improve the results of a document classifier based on bags of words. Both models allow obtaining numeric representations from texts, but they do it very differently. The bag of words model can represent documents by means of widely dispersed vectors in which the indices are words or groups of words. word2vec generates word level representations building vectors that are much more compact, where indices implicitly contain information about the context of word occurrences. Bags of words are very effective for document classification and in our experiments no representation using only word2vec vectors is able to improve their results. However, this does not mean that the information provided by word2vec is not useful for the classification task. When this information is used in combination with the bags of words, the results are improved, showing its complementarity and its contribution to the task. We have also performed cross-domain experiments in which word2vec has shown much more stable behavior than bag of words models.
Adaptive Learning on User Segmentation: Universal to Specific Representation via Bipartite Neural Interaction
2023, SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

View all citing articles on Scopus

View full text

Using neural word embeddings to model user behavior and detect user segments

Abstract

Introduction

Section snippets

Related work

Using word embeddings to model user behavior and detect user segments

Experiments

Conclusions and future work

Acknowledgments

Inf. Process. Manage.

J. Hous. Econ.

Search behavior based latent semantic user segmentation for advertising targeting

Data Mining (ICDM), 2013 IEEE 13th International Conference on

Topic-based user segmentation for online advertising with latent dirichlet allocation

Proceedings of the 6th International Conference on Advanced Data Mining and Applications - Volume Part II

From e-sex to e-commerce: web search changes

Computer

From ”dango” to ”japanese cakes”: query reformulation models and patterns

Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Semantic enrichment of contextual advertising by using concepts

Studying the impact of text summarization on contextual advertising

Semantic coherence-based user profile modeling in the recommender systems context

Proceedings of the 6th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2014, Rome, Italy, October 21-24, 2014

Matching recommendation technologies and domains