Elsevier

Knowledge-Based Systems

Volume 108, 15 September 2016, Pages 5-14
Knowledge-Based Systems

Using neural word embeddings to model user behavior and detect user segments

https://doi.org/10.1016/j.knosys.2016.05.002Get rights and content

Abstract

Modeling user behavior to detect segments of users to target and to whom address ads (behavioral targeting) is a problem widely-studied in the literature. Various sources of data are mined and modeled in order to detect these segments, such as the queries issued by the users. In this paper we first show the need for a user segmentation system to employ reliable user preferences, since nearly half of the times users reformulate their queries in order to satisfy their information need. Then we propose a method that analyzes the description of the items positively evaluated by the users and extracts a vector representation of the words in these descriptions (word embeddings). Since it is widely-known that users tend to choose items of the same categories, our approach is designed to avoid the so-called preference stability, which would associate the users to trivial segments. Moreover, we make sure that the interpretability of the generated segments is a characteristic offered to the advertisers who will use them. We performed different sets of experiments on a large real-world dataset, which validated our approach and showed its capability to produce effective segments.

Introduction

Behavioral targeting is the process of detecting segments of users with similar behaviors, in order to address effective ads to them. Given the high interest that extracting effective segments has, both the industry and the academia are studying ways to model user behavior. In the industry, the systems try to monitor the behavior of the users in implicit ways, in order to extract their preferences and form the segments; usually, neither the algorithms not the data are publicly made available, to avoid disclosing both industrial secrets and private information about the users. In the academic literature it has been highlighted that classic approaches to segmentation (like k-means) cannot take into account the semantics of the user behavior [1]. Tu and Lu [2] proposed a user segmentation approach based on a semantic analysis of the queries issued by the users, while Gong et al. [1] proposed a LDA-based semantic segmentation that groups users with similar query and click behaviors.

However, several problems remain open in the literature when performing a user segmentation by considering the user behavior.

Data sources reliability. In order to satisfy the users’ information need, query reformulation characterizes nearly 50% of the queries issued by the users [3], [4], [5]. Therefore, the semantic analysis of a query is not a reliable source of information, since it does not contain any information about whether or not a query led to what the user was really looking for. Moreover, performing a semantic analysis on the items evaluated by the users, in order to perform a filtering on them, can increase the accuracy of a system [6], [7], [8]. Considering these aspects, a possible solution to this issue would be a semantic analysis on the description of the items a user positively evaluated through an explicitly given rating. However, another issue arises in cascade.

Preference stability. The analysis of reliable information about the users, such as the description of the items evaluated by them, would probably lead to trivial segments, since users tend to evaluate items of the same categories (e.g., they usually watch movies of the same genres or by the same director/actor). This problem is known as preference stability [9] and on the one hand it leads to high-quality knowledge sources, while on the other hand there is no way to target the users with serendipitous and effective ads (overspecialization [10]).

Segmentation interpretability. Another open issue widely-studied in this research area is the capability for a segmentation to be easily interpreted. A recent survey on user segmentation (mostly focused on the library domain) [11], highlighted that, in order to create a proper segmentation of the users, it is important to understand them. On the one hand, easily interpretable approaches generate trivial segments, and even a partitioning with the k-means clustering algorithm has proven to be more effective than this method [12], while on the other hand, when a larger set of features is combined, the problem of properly understanding and interpreting results arises [13], [14]. This is mostly due to the lack of guidance on how to interpret the results of a segmentation [15]. The fact that easily understandable approaches generate ineffective segments, and that more complex ones are accurate but not easy to use in practice, generates an important gap in this research area.

Our contributions. In this paper, we present an approach to user segmentation, such that the sources of information used to build it are reliable, the generated user segmentation is not trivial and it is easily interpretable.

As previously mentioned, the problem of using reliable sources of information will be solved by considering the items positively evaluated by the users with an explicitly-assigned rating. In particular, we employ the vector representation of the words in a description (word embedding) [16]. Word embeddings are built by considering as input a text corpus, that leads to the building of a vocabulary, and to the learning of the vector representation of the words. They are largely employed nowadays in several NLP tasks, such as the representations of sentences and paragraphs [17], [18], relational entities [[19], [20], general text-based attributes [21], descriptive text of images [22], and nodes in graph structure [23]. According to the authors’ knowledge, no approach uses word embeddings for user segmentation purposes.

For each class of items that the users can be targeted with (e.g., movie genres), our approach builds a vector representation based on the word embeddings, which characterizes the words that represent the class. In a similar way, we also build a user model that captures the user interests. By matching the vector representations of a class of items with the user model, thanks to a similarity metric, we can associate a user to the segment that represents that class of items. It is trivial to notice that each user can have a strong similarity also with classes of items she never evaluated (thus avoiding the preference stability problem) and that the segments can be easily interpreted (even if a class and a user are represented by tens of features in the vector, the advertiser is only required to specify which interests she wants to target). In order to allow advertisers to specify more complex targets, we also present a Boolean algebra that combines multiple classes of items with simple operations (e.g., to extract a vector representation of what characterizes comedy AND romantic movies).

More formally, the problem statement is the following:

Problem 1

We are given a set of users U={u1,,uN}, a set of items I={i1,,iM}, and a set R of ratings used to express the user preferences (e.g., R=[1,5] or R={like,dislike}). The set of all possible preferences expressed by the users is a ternary relation PU × I × R. We denote as P+P the subset of preferences with a positive value, as I+ the items for which there is a positive preference, and as Iu the items positively evaluated by a user u. The set of item descriptions is denoted as D={d1,,dM} (note that we have a description for each item, so D=I), and the vocabulary of the words in D is denoted as V={v1,,vW}. Let NWEvw={l1,,lZ} be the vector representation (neural word embedding) of each word vwV. We denote as C={c1,,cK} the set of primitive classes used to classify the items. Our first aim is to extract a vector representation of each class ckC based on the neural word embeddings of the description of the items classified with ck (neural class embedding, NCE), and a vector representation of each user uU (user model, mu). The objective of this paper is to build a function f: CU that, given a class, returns a set of users to target TU, such that the similarity between the neural class embedding and the models of the users in T is higher than a threshold value.

The scientific contributions of our proposal are now presented:

  • we propose a novel use of neural word embeddings for user segmentation purposes;

  • we introduce a novel data structure, called NCE (Neural Class Embedding), able to model the words that characterize a class of items;

  • we consider, for the first time in the user segmentation literature, the reliability of the data sources. Indeed, with respect to the literature that usually performs an analysis of the queries issued by the users, we rely on the description of the items a user positively rated;

  • we avoid preference stability by considering the similarity between each user model and the vector representation of a classes of items, in order to allow the approach to include a user in a segment that represents a class of items she has never evaluated, but that is highly similar to her preferences;

  • we present a Boolean algebra that allows us to specify, in a simple but punctual way, the interests that the segment should cover; the algebra, along with the built models, avoids the interpretability issues that usually characterize the segmentations based on several features.

The rest of the paper is organized as follows: we first present the works in the literature related with our approach (Section 2), then we continue with the implementation details (Section 3) and the description of the performed experiments (Section 4), ending with some concluding remarks (Section 5).

Section snippets

Related work

Here, we report the main approaches developed in the industry and in the literature for each of the topics related to our work.

Behavioral targeting. Most of the approaches to behavioral targeting have been developed by the industry as real-word systems. Among the different types of targeting that Google’s AdWords1 developed to present ads to the users, the closest to our proposal is “Topic targeting”, in which the system groups and reaches

Using word embeddings to model user behavior and detect user segments

Here we present the algorithms employed to build our user segmentation. The approach works in five steps:

  • 1.

    Neural word embeddings extraction: processing of the textual information of all the items, in order to remove the useless elements from the text (e.g., punctuation marks, stop words, etc.), and extract the word embeddings;

  • 2.

    Neural item embeddings extraction: for each item, we sum the correspondent vector elements of the words that compose its description, to obtain as result a vector

Experiments

This section describes the experiments performed to validate our proposal. In Section 4.1 we present the experimental setup and strategy, in Section 4.2 the dataset employed for the evaluation, Section 4.3 illustrates the involved metric, and Section 4.4 contains the results.

Conclusions and future work

This paper presented an approach to segment the users by analyzing the items positively evaluated by them, in order to consider reliable user preferences. These items were analyzed by extracting the word embeddings and by building a novel type of class model, named Neural Class Embedding. The models allowed us to understand what characterizes each class of items and to generate segments of users with certain characteristics. In this way, we designed an approach that does not generate trivial

Acknowledgments

This work is partially funded by Regione Sardegna under project NOMAD (Next generation Open Mobile Apps Development), through PIA - Pacchetti Integrati di Agevolazione “Industria Artigianato e Servizi” (annualità 2013), and by MIUR PRIN 2010-11 under project “Security Horizons”.

References (42)

  • S.Y. Rieh et al.

    Analysis of multiple query reformulations on the web: the interactive information retrieval context

    Inf. Process. Manage.

    (2006)
  • S.C. Bourassa et al.

    Defining housing submarkets

    J. Hous. Econ.

    (1999)
  • GongX. et al.

    Search behavior based latent semantic user segmentation for advertising targeting

    Data Mining (ICDM), 2013 IEEE 13th International Conference on

    (2013)
  • TuS. et al.

    Topic-based user segmentation for online advertising with latent dirichlet allocation

    Proceedings of the 6th International Conference on Advanced Data Mining and Applications - Volume Part II

    (2010)
  • A. Spink et al.

    From e-sex to e-commerce: web search changes

    Computer

    (2002)
  • P. Boldi et al.

    From ”dango” to ”japanese cakes”: query reformulation models and patterns

    Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

    (2009)
  • G. Armano et al.

    Semantic enrichment of contextual advertising by using concepts

  • G. Armano et al.

    Studying the impact of text summarization on contextual advertising

  • R. Saia et al.

    Semantic coherence-based user profile modeling in the recommender systems context

    Proceedings of the 6th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2014, Rome, Italy, October 21-24, 2014

    (2014)
  • R.D. Burke et al.

    Matching recommendation technologies and domains

  • P. Lops et al.

    Content-based recommender systems: state of the art and trends

  • C. Gustav Johannsen

    Understanding users: from man-made typologies to computer-generated clusters

    New Libr. World

    (2014)
  • A. Nairn et al.

    Something approaching science? Cluster analysis procedures in the crm era

  • S. Dolnicar et al.

    Methodological reasons for the theory/practice divide in market segmentation

    J. Market. Manage.

    (2009)
  • S. Dibb et al.

    A program for implementing market segmentation

    J. Bus. Ind. Market.

    (1997)
  • T. Mikolov et al.

    Exploiting similarities among languages for machine translation

    CoRR

    (2013)
  • N. Djuric et al.

    Hierarchical neural language models for joint representation of streaming documents and their content

    Proceedings of the 24th International Conference on World Wide Web

    (2015)
  • Q.V. Le et al.

    Distributed representations of sentences and documents

    Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014

    (2014)
  • A. Bordes et al.

    Translating embeddings for modeling multi-relational data

  • R. Socher et al.

    Reasoning with neural tensor networks for knowledge base completion

  • R. Kiros et al.

    A multiplicative model for learning distributed text-based attribute representations

  • Cited by (27)

    • Node proximity preserved dynamic network embedding via matrix perturbation

      2020, Knowledge-Based Systems
      Citation Excerpt :

      One paradigm to overcome this obstacle is network embedding, which represents each node of a given network as dense real-valued vectors in a low-dimensional latent space while preserving essential structural properties (i.e. edges and other high-order proximities [1,2]) in the network. The learned embedding vectors can benefit a variety of practical applications [3,4], such as multi-label classification [5–7], [8], link prediction [9,10], and so on. Supplies of excellent network embedding methods are proposed recently.

    • Proximity-aware heterogeneous information network embedding

      2020, Knowledge-Based Systems
      Citation Excerpt :

      Unlike the traditional homogeneous networks where all the vertexes available are in the same type, nodes and their relations of heterogeneous networks fall into multiple types, which contain a lot of potential semantics. These informative relations can benefit substantial machine learning tasks [1,2] in networks, such as node classification [3,4], node clustering [5], similarity search [6] and link prediction [7]. For example, DBLP, a typical citation network shown in Fig. 1, consists of three kinds of nodes (i.e., authors, papers and venues) and three relations (i.e., an author writes a paper, a paper is published at a venue and a paper cites another paper).

    • Bilingual embeddings with random walks over multilingual wordnets

      2018, Knowledge-Based Systems
      Citation Excerpt :

      In later work, Goikoetxea et al. [19] show that the synthetic corpus can be combined with text corpora, yielding embeddings which combine both sources of information, and surpassing other methods in word similarity tasks. Finally, distributional representation have also been combined with network graphs with good results in areas such as sentiment analysis [23], user behavior modelling [8] or cross lingual plagiarism detection [16]. All the methods mentioned here focus on monolingual embeddings, as there is very little work on combining knowledge bases with distributional embeddings in bilingual settings.

    • Improving user recommendation by extracting social topics and interest topics of users in uni-directional social networks

      2018, Knowledge-Based Systems
      Citation Excerpt :

      Also, except for topic model, there are other methods can be used to class users with their preferences. Work on [38] models user behaviors to detect segments of users to target and to whom address ads. It presents an approach to segment the users by analyzing the items positively evaluated by them, in order to consider reliable user preferences.

    • An approach to the use of word embeddings in an opinion classification task

      2016, Expert Systems with Applications
      Citation Excerpt :

      Although word2vec has become very popular, it is worth noting that many researchers develop their own word embedding methods, for example to integrate other types of information, like metadata or author features that are relevant to their task (Kiros, Zemel, & Salakhutdinov, 2014b; Yang & Mao, 2016). Word embeddings have also been applied to user segmentation (Boratto, Carta, Fenu, & Saia, 2016), knowledge representation (Bordes, Usunier, Garcia-Duran, Weston, & Yakhnenko, 2013; Socher, Chen, Manning, & Ng, 2013), data stream mining (Djuric, Wu, Radosavljevic, Grbovic, & Bhamidipati, 2015), information retrieval (Grbovic, Djuric, Radosavljevic, Silvestri, & Bhamidipati, 2015; Kiros, Salakhutdinov, & Zemel, 2014a), question answering (Yang, Lee, Park, & Rim, 2015) and social network analysis (Perozzi, Al-Rfou, & Skiena, 2014) among other tasks. The method most frequently used to represent text in a vector form is the bag of words or BOW, mentioned above.

    • Adaptive Learning on User Segmentation: Universal to Specific Representation via Bipartite Neural Interaction

      2023, SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
    View all citing articles on Scopus
    View full text