Abstract
Online Video Streaming services such as Hulu hosts tens of millions of premium videos, which requires an effective recommendation system to help viewers discover what they enjoy. In this talk, we will introduce Hulu's recent technical progresses in recommender systems and deep-dive into the topic of generating recommendation reason from knowledge graph. We have two user scenarios: the store-shelf and autoplay. The first requires a list of videos to maximize the chance that a viewer would pick one of them to watch. The second requires a sequence of video recommendations such that the viewer would continuously watch within the current session.
In the model layer, we designed specific models to match with each user scenario, balancing both exploitation and exploration. For example, we leverage the contextual-bandit model in the store-shelf scenario to adapt the ranking strategy to various types of user feedbacks. To optimize exploitation, we tested several granularity levels for parameter sharing among the arms. For more effective exploration, we incorporate Thomason sampling. For the autoplay scenario, we use a contextual recurrent neural network to predict the next video that the viewer is going to watch.
In the feature and data layer, we train embeddings for content, user and contextual info. For example, to train content embeddings, we collect factual tags from metadata, sentiment tags from reviews, and keywords from the captions and object/action recognized using computer vision techniques.
Next we will deep-dive into one important topic: generating recommendation reason from knowledge graph.
A fact is defined by a tuple of related entities and their relation, which is normally a pair of entities tagged by a relationship. In our problem setting, recommendation results are the targets, viewed as inputs for the reasoning task, consisting of pairs of relevant entities, i.e. a source node and a destination node in a knowledge graph. The recommendation reasoning task is to learn a path or a small directed acyclic subgraph, connecting the source node to the destination node.
Since the facts in a knowledge graph have different confidence values for different reasoned targets, we need to conduct a probabilistic inference. The challenge is we do not know a predefined set of logic rules to guide the search through the knowledge graph, which prevents us from directly applying the probabilistic logic methods. Inspired by recent advances in deep learning and reinforcement learning, especially in graph neural networks, attention mechanism and deep generative models, we propose two ways to model the reasoning process: the differentiable reasoning approach and the stochastic reasoning approach.
Differentiable reasoning approaches are based on graph neural networks [1,2] with attention flow and information flow. The attention dynamics is an iterative process of redistributing and aggregating attention over the knowledge graph, starting at the source node. The final attention aggregated at the destination node serves for the prediction to compute the loss. Instead of the prediction accuracy, we care more about how the learned attention dynamics draws its reasoning track in a knowledge graph.
Stochastic reasoning approaches frame the reasoning process as learning a probabilistic graphical model consisting of stochastic discrete operations, such as selecting a node and selecting an edge, to build a reason subgraph extracted from the knowledge graph. The model is known as stochastic computation graphs (SCGs), and to learn it, we propose a generalized back-propagation framework Backprop-Q [3] to overcome the gradient-blocking issues in applying standard back-propagation. In summary, we give an overview of the recommendation research in Hulu and deep-dive into our differentiable reasoning approach and stochastic reasoning approach for generating recommendation reasons based on a knowledge graph.