Elsevier

Decision Support Systems

Volume 117, February 2019, Pages 14-27
Decision Support Systems

A utility-based news recommendation system

https://doi.org/10.1016/j.dss.2018.12.001Get rights and content

Highlights

  • The news utility model goes beyond click through rate analysis for news recommendations.

  • The system is designed based on the MapReduce framework to analyze huge volume of data in parallel.

  • News Cold Start Problem has been addressed effectively by a novel probabilistic approach.

  • Experiments on 2 billion records demonstrate the effectiveness and efficiency of the system.

Abstract

News platforms exhibit both the challenges as well as opportunities for enhancing the functionalities of recommendation systems in today's big data environment. Novel use of big data storage and programming models can improve news recommendation systems through efficient handling and analysis of clickstream data and a better understanding of users' interests. Most existing approaches to news recommendation consider users' clicks as the implicit feedback to understand user behaviors. However, “clicks” may not be an effective indicator of real user interests. We address this problem by developing a novel news recommendation system based on a news utility model. Given the new utility model, we propose a two stage news recommendation framework. The framework first generates article-level recommendation rules based on the utility model, then integrates the notion of utility and probabilistic topic models and generates topic-level recommendation rules. We argue that the proposed utility-based news recommendation system also addresses the news cold start problem which is one of the most challenging obstacles for news agencies. We evaluate the framework on a massive real dataset (two billion records) obtained from a major newspaper (i.e., The Globe and Mail) in Canada and show that it outperforms the existing methods.

Introduction

Recent advancements in Internet-based technologies and the production of digital contents have shifted news consumption models from reading physical newspapers to visiting online news websites. Understanding and modeling users' interests is a critical task for the value proposition and prosperity of any online service provider. Moreover, in the digital age, value creation has become value co-creation between companies and customers [1]. Thus, many companies focus on their big historical data to model users' behaviors and boost profits. Similarly, online newspaper agencies aim to provide personalized contents to improve visit experiences. This does not only increase the frequency of visits, which boosts the revenue through the advertisements, but also improves user engagement, leading to more subscriptions [2]. In fact, from an online news provider's business perspective, increasing the revenue through advertisements and subscriptions is a major objective. Therefore, for most online newspapers, developing effective recommendation systems to help users find interesting articles and keep them engaged is of paramount importance.

Building an effective and efficient recommendation system for the news domain is much more challenging than other domains:

  • 1.

    Business Objectives Trade-off. As the business model evolves, many online news publishers are struggling to balance the availability of high value advertising inventory with the need for content that maximizes opportunities for subscriber acquisition and retention. Thus, in many cases, a recommendation system is needed in which recommended articles satisfy more than one business objective, even if those objectives are in contention with one another.

  • 2.

    Beyond Click-through Rates. Conventional news recommendation systems only consider a click as an implicit feedback. This is quite problematic because a user might click on an article but may not be interested in reading it (e.g., the title may be appealing to her while the content may not meet her expectations). Therefore, the number of clicks may not be helpful to address an intended domain specific objective. For example, it is very likely that user engagement increases the number of subscriptions, therefore a news agency might be interested in a type of recommendation which increases the engagement (e.g., dwell time of visited articles) thus the subscription rate. However, click analysis does not necessarily maximize this objective.

  • 3.

    Big Data of Unknown Visitors. Newspapers are rich repositories of visitors' historical data. In 2016, the number of average monthly unique visitors for top 50 U.S. newspapers was more than 11 billion1. The user interaction data of The Globe and Mail2, Canada's foremost news media company, is about 30 million sessions in one month. Building a recommendation system that can handle big data requires applying scalable techniques and platforms [3]. Moreover, despite the availability of such huge volume of historical data, most interactions belong to un-subscribed users (i.e., unknown visitors). Therefore, user profiling and collaborative filtering techniques such as Matrix factorization [4] are not effective recommendation approaches in the news domain. This is due to the fact that such approaches need to identify users and collect their reading histories to discover their interests and the similarity among them.

  • 4.

    News/Users Cold Start Problems. Articles are generated continuously and unboundedly at a high speed. This makes recommending new articles much harder than recommending new items (e.g., new products, new travel packages [5]) in other e-commerce domains. Thus, the ability to handle the cold start problem is essential for newspapers. There are two types of cold start problems: user cold start (when a new or unknown user visits the portal) and item cold start (when a new article is published). Despite different solutions [6, 7] to this major issue, this problem is still challenging, particularly in the news domain.

To address the aforementioned challenges, this paper presents a Utility-based News Recommendation SYStem, called URecSYS, which works based on a news utility model. The idea of the news utility model is inspired by the concept of utility in intelligent agents. In intelligent agents and machine learning, goal-basedagents and utility-based agents are two common classes of agents [8]. A goal-based agent can only differentiate between goal and non-goal states in the environment. However, it is important to measure how desirable a particular state is. To do so, a utility function is defined such that it measures how satisfied the agent (e.g., a visitor) will be if it moves to a particular state (e.g., article). This helps agents to move toward a goal state (e.g., increase user engagement) with the highest satisfaction. It is important to note that the concept of utility in our model is very similar to its common usage in economics. In economics, the term utility is used to describe the measurement of usefulness and satisfaction that a consumer obtains from any good [9]. The utility is not a characteristic of a particular good (e.g., article), but rather of each consumer's reactions (e.g., user's engagement) to that good. In this paper, we argue that the utility can be defined in the context of recommendation. We model the utility to represent a desirable domain specific objective and recommend article with respect to this specific objective. In other words, we design a utility model to recommend those articles that persuade the user to move toward a higher value of the objective. For example, if the objective is to increase user engagement, a utility model can be designed such that a recommender suggests articles that lead to a highly engaged visit.

The news utility model is designed based on two broad types of attributes: article and user-article interaction attributes. Importantly, the model can be engineered to address one or more objectives even if those objectives are in conflict with one another. URecSYS recommends news articles by discovering article-level rules based on the news utility model. Moreover, by leveraging topic modeling and a probabilistic framework, it generalizes rules from the article-level to the topic-level. This addresses the news cold-start problem properly as newly-published articles can be recommended to a user by matching them to the topic-level rules. Moreover, as the news utility model is built based on reading sessions of all users, the recommendation rules (at either the article-level or the topic-level) can be used for new users, thus also resolves the user cold start problem.

To the best of our knowledge, this study is the first step toward exploring the impact of both article and user-article interaction attributes in a unified framework. Our contributions are summarized as follows:

  • 1.

    We define a novel model, called News Utility Model, to simultaneously consider both article attributes (e.g., the recency of the article) and user-article interaction attributes (e.g., DwellTime). We argue that it is more beneficial to recommend news articles based on the utility of articles rather than the browsing frequency of articles.

  • 2.

    We propose a novel and scalable rule engine to discover article-level recommendation rules based on the news utility model. At its heart, the rule engine uses multiple MapReduce-like steps to discover article-level recommendation rules in parallel.

  • 3.

    We propose a novel probabilistic approach on top of topic-based models to generalize article-level news recommendation rules. The output is a topic-level recommendation rule engine that links topics of articles based on the news utility model, thus the domain specific objective. Such rules recommend newly-published articles based on topics of interests to users.

  • 4.

    We apply the proposed framework to a dataset of two billion records collected from a major Canadian news agency (The Globe and Mail3) and demonstrate the effectiveness of the recommended articles by comparing the proposed framework to other state-of-the-art recommendation systems in practice.

The rest of the paper is organized as follows. In Section 2, we provide a review of prior work relevant to news recommenders. In Section 3, we introduce terms and concepts used in this paper. In Section 4, we present the proposed framework. In Section 5, we outline the experimental settings and discuss the results. Finally, in Section 6 we provide conclusions and point out limitations and future research directions.

Section snippets

Literature review

The underlying techniques used in recommender systems can be categorized into three broad classes: content-based [10, 11], collaborative filtering [4, [12], [13], [14], [15]] and hybrid [7, 16, 17] approaches. Several studies have been conducted on content-based news recommendation systems[10]. For example, Liang et al. [18] propose a time-aware content recommendation system. In another work, Agrawal et al [19] take activity freshness into account and significantly outperform click-through rate

Notations and definitions

For convenience, Table 1 summarizes the concepts and notations we define in this paper.

Let N={nw1,nw2,,nwn}N={nw1,nw2,,nwn} be a set of distinct articles. A clickstream dataset consists of several user sessions. A user session S (or session in short) is defined as an ordered list of viewed articles ⟨nw1,nw2,…,nwz⟩ within a visit. Each article is represented by different attributes such as popularity, topics, published date. These attributes are called article attributes. Once an article is

URecSYS: a utility-based news recommendation system

Given the proposed news utility model to present a domain specific objective, the most important challenge is how to incorporate it into the recommendation process. To address this challenge, we design a Utility-based news Recommendation SYStem (URecSYS). URecSYS is a rule-based recommendation system which is designed and developed using Apache Spark and MapReduce framework. URecSYS first finds recommendation rules from the clickstream dataset and then applies the discovered rules to recommend

Experimental results and discussions

The experimental environment consists of one master node and six worker nodes. Each node is equipped with Intel Xeon 2.6 GHz(each 12 core) and 128 GB main memory. The framework is implemented on Spark 2.3.0.

Conclusions, limitations and future work

In the news recommendation context, the most challenging problem for any online news publisher is to make a balance between different business objectives (e.g., increasing user engagement through free content delivery in one hand and revenue maximization through subscription on the other hand). Such objectives are usually in contention with one another. Moreover, most existing news recommendation systems only consider the click (e.g., CTR) as an implicit feedback, which is quite problematic as

Acknowledgments

This work is funded by Natural Sciences and Engineering Research Council of Canada (NSERC), The Globe and Mail, and the Big Data Research, Analytics and Information Network (BRAIN) Alliance established by the Ontario Research Fund - Research Excellence Program (ORF-RE).

Morteza Zihayat is an assistant professor at the School of Information Technology Management of Ryerson University from 2016 and IBM CAS Faculty Fellow from 2018. Before joining ITM, he was a Postdoctoral Fellow at University of Toronto (2015-2016). He was also a research fellow in the IBM Cloud Analytics as a member of the BRAIN ALLIANCE — Big Data Research, Analytics, and Information Network. His research concerns Big Data Analytics and machine learning. He was recently awarded multiple

References (37)

  • V. Karnowski et al.

    From incidental news exposure to news engagement. How perceptions of the news post and news usage patterns influence engagement with news articles encountered on Facebook

    Computers in Behavior

    (2017)
  • D.-R. Liu et al.

    A hybrid of sequential rules and collaborative filtering for product recommendation

    Information Sciences

    (2009)
  • H. Davoudi et al.

    Time-aware Subscription Prediction Model for User Acquisition in Digital News Media

  • R. Agarwal, V. Dhar, Big data, data science, and analytics: the opportunity and challenge for IS research,...
  • D.D. Lee et al.

    Algorithms for non-negative matrix factorization

  • S.J. Russell et al.

    Artificial Intelligence: A Modern Approach

    (2016)
  • F.W. Taussig

    Principles of Economics

    (2013)
  • P. Lops et al.

    Content-based recommender systems: state of the art and trends

  • Cited by (70)

    • An improved heterogeneous graph convolutional network for job recommendation

      2023, Engineering Applications of Artificial Intelligence
    • A systematic review of value-aware recommender systems

      2023, Expert Systems with Applications
    View all citing articles on Scopus

    Morteza Zihayat is an assistant professor at the School of Information Technology Management of Ryerson University from 2016 and IBM CAS Faculty Fellow from 2018. Before joining ITM, he was a Postdoctoral Fellow at University of Toronto (2015-2016). He was also a research fellow in the IBM Cloud Analytics as a member of the BRAIN ALLIANCE — Big Data Research, Analytics, and Information Network. His research concerns Big Data Analytics and machine learning. He was recently awarded multiple research grants, including NSERC Discovery Grant, NSERC Engage and MITACS. He has ongoing collaborations with industry, including IBM Canada, The Globe and Mail and AT&T Labs Research. Morteza obtained his PhD from York University where he worked on designing scalable frameworks to discover actionable knowledge from Big Data streams and social networks. His research has been published in top-tier data mining and data management venues such as Information Sciences, Machine Learning, SIGKDD, SIAM SDM, PKDD, EDBT.

    Anteneh Ayanso is Professor of Information Systems and founding director of the Centre for Business Analytics at the Goodman School of Business, Brock University. He teaches Business Analytics, Database Design and Management, Data Mining Techniques & Applications and Management of IS/IT. He received PhD in Information Systems from the University of Connecticut and MBA from Syracuse University. His research interests focus primarily on data management and information retrieval, Big Data analytics, electronic commerce, and electronic government. His articles are published in leading journals such as Decision Sciences, Decision Support Systems, European Journal of Operational Research, Journal of Database Management, International Journal of Electronic Commerce, Government Information Quarterly, among others. His research has been funded by government grants, including NSERC Discovery Research Grant, NSERC Engage Grant and Voucher for Innovation and Productivity (VIP) by Ontario Centres of Excellence (OCE). He is currently serving as an Associate Editor at Decision Support Systems journal and a review board member at Journal of Database Management, and International Journal of Convergence Computing.

    Aijun An is a Professor in the Department of Electrical Engineering and Computer Science at York University. She is currently leading the Big Data Research, Analytics and Information Network (BRIAN) Alliance, an Ontario-based research network that involves four universities and a dozen of private and public sector partners. Her main research area is data mining. She has worked on various research topics in data mining, including classification, clustering, data stream mining, high utility pattern mining, sentiment and emotion analysis from text, topic detection, parallel and distributed deep learning, graph mining and bioinformatics. She has published extensively in various well-respected journals and conferences in data mining, databases, optimization, and intelligent information systems. Her research has been supported by NSERC, SSHRC, and ORF-RE.

    Heidar Davoudi recieved his PhD in Computer Science at Department of Electrical Engineering and Computer Science at York University, Canada. His research interest includes data mining and machine learning and, in particular, user modeling for acquisition, engagement, and recommendation.

    Xing Zhao is a Master's student in Data Mining Lab at the Department of Electrical Engineering and Computer Science at York University. His research areas of interest are machine learning and Big Data. He received B.Sc., Spec. Hons. in Computer Science from York University in 2017.

    View full text