A utility-based news recommendation system
Introduction
Recent advancements in Internet-based technologies and the production of digital contents have shifted news consumption models from reading physical newspapers to visiting online news websites. Understanding and modeling users' interests is a critical task for the value proposition and prosperity of any online service provider. Moreover, in the digital age, value creation has become value co-creation between companies and customers [1]. Thus, many companies focus on their big historical data to model users' behaviors and boost profits. Similarly, online newspaper agencies aim to provide personalized contents to improve visit experiences. This does not only increase the frequency of visits, which boosts the revenue through the advertisements, but also improves user engagement, leading to more subscriptions [2]. In fact, from an online news provider's business perspective, increasing the revenue through advertisements and subscriptions is a major objective. Therefore, for most online newspapers, developing effective recommendation systems to help users find interesting articles and keep them engaged is of paramount importance.
Building an effective and efficient recommendation system for the news domain is much more challenging than other domains:
- 1.
Business Objectives Trade-off. As the business model evolves, many online news publishers are struggling to balance the availability of high value advertising inventory with the need for content that maximizes opportunities for subscriber acquisition and retention. Thus, in many cases, a recommendation system is needed in which recommended articles satisfy more than one business objective, even if those objectives are in contention with one another.
- 2.
Beyond Click-through Rates. Conventional news recommendation systems only consider a click as an implicit feedback. This is quite problematic because a user might click on an article but may not be interested in reading it (e.g., the title may be appealing to her while the content may not meet her expectations). Therefore, the number of clicks may not be helpful to address an intended domain specific objective. For example, it is very likely that user engagement increases the number of subscriptions, therefore a news agency might be interested in a type of recommendation which increases the engagement (e.g., dwell time of visited articles) thus the subscription rate. However, click analysis does not necessarily maximize this objective.
- 3.
Big Data of Unknown Visitors. Newspapers are rich repositories of visitors' historical data. In 2016, the number of average monthly unique visitors for top 50 U.S. newspapers was more than 11 billion1. The user interaction data of The Globe and Mail2, Canada's foremost news media company, is about 30 million sessions in one month. Building a recommendation system that can handle big data requires applying scalable techniques and platforms [3]. Moreover, despite the availability of such huge volume of historical data, most interactions belong to un-subscribed users (i.e., unknown visitors). Therefore, user profiling and collaborative filtering techniques such as Matrix factorization [4] are not effective recommendation approaches in the news domain. This is due to the fact that such approaches need to identify users and collect their reading histories to discover their interests and the similarity among them.
- 4.
News/Users Cold Start Problems. Articles are generated continuously and unboundedly at a high speed. This makes recommending new articles much harder than recommending new items (e.g., new products, new travel packages [5]) in other e-commerce domains. Thus, the ability to handle the cold start problem is essential for newspapers. There are two types of cold start problems: user cold start (when a new or unknown user visits the portal) and item cold start (when a new article is published). Despite different solutions [6, 7] to this major issue, this problem is still challenging, particularly in the news domain.
To address the aforementioned challenges, this paper presents a Utility-based News Recommendation SYStem, called URecSYS, which works based on a news utility model. The idea of the news utility model is inspired by the concept of utility in intelligent agents. In intelligent agents and machine learning, goal-basedagents and utility-based agents are two common classes of agents [8]. A goal-based agent can only differentiate between goal and non-goal states in the environment. However, it is important to measure how desirable a particular state is. To do so, a utility function is defined such that it measures how satisfied the agent (e.g., a visitor) will be if it moves to a particular state (e.g., article). This helps agents to move toward a goal state (e.g., increase user engagement) with the highest satisfaction. It is important to note that the concept of utility in our model is very similar to its common usage in economics. In economics, the term utility is used to describe the measurement of usefulness and satisfaction that a consumer obtains from any good [9]. The utility is not a characteristic of a particular good (e.g., article), but rather of each consumer's reactions (e.g., user's engagement) to that good. In this paper, we argue that the utility can be defined in the context of recommendation. We model the utility to represent a desirable domain specific objective and recommend article with respect to this specific objective. In other words, we design a utility model to recommend those articles that persuade the user to move toward a higher value of the objective. For example, if the objective is to increase user engagement, a utility model can be designed such that a recommender suggests articles that lead to a highly engaged visit.
The news utility model is designed based on two broad types of attributes: article and user-article interaction attributes. Importantly, the model can be engineered to address one or more objectives even if those objectives are in conflict with one another. URecSYS recommends news articles by discovering article-level rules based on the news utility model. Moreover, by leveraging topic modeling and a probabilistic framework, it generalizes rules from the article-level to the topic-level. This addresses the news cold-start problem properly as newly-published articles can be recommended to a user by matching them to the topic-level rules. Moreover, as the news utility model is built based on reading sessions of all users, the recommendation rules (at either the article-level or the topic-level) can be used for new users, thus also resolves the user cold start problem.
To the best of our knowledge, this study is the first step toward exploring the impact of both article and user-article interaction attributes in a unified framework. Our contributions are summarized as follows:
- 1.
We define a novel model, called News Utility Model, to simultaneously consider both article attributes (e.g., the recency of the article) and user-article interaction attributes (e.g., DwellTime). We argue that it is more beneficial to recommend news articles based on the utility of articles rather than the browsing frequency of articles.
- 2.
We propose a novel and scalable rule engine to discover article-level recommendation rules based on the news utility model. At its heart, the rule engine uses multiple MapReduce-like steps to discover article-level recommendation rules in parallel.
- 3.
We propose a novel probabilistic approach on top of topic-based models to generalize article-level news recommendation rules. The output is a topic-level recommendation rule engine that links topics of articles based on the news utility model, thus the domain specific objective. Such rules recommend newly-published articles based on topics of interests to users.
- 4.
We apply the proposed framework to a dataset of two billion records collected from a major Canadian news agency (The Globe and Mail3) and demonstrate the effectiveness of the recommended articles by comparing the proposed framework to other state-of-the-art recommendation systems in practice.
The rest of the paper is organized as follows. In Section 2, we provide a review of prior work relevant to news recommenders. In Section 3, we introduce terms and concepts used in this paper. In Section 4, we present the proposed framework. In Section 5, we outline the experimental settings and discuss the results. Finally, in Section 6 we provide conclusions and point out limitations and future research directions.
Section snippets
Literature review
The underlying techniques used in recommender systems can be categorized into three broad classes: content-based [10, 11], collaborative filtering [4, [12], [13], [14], [15]] and hybrid [7, 16, 17] approaches. Several studies have been conducted on content-based news recommendation systems[10]. For example, Liang et al. [18] propose a time-aware content recommendation system. In another work, Agrawal et al [19] take activity freshness into account and significantly outperform click-through rate
Notations and definitions
For convenience, Table 1 summarizes the concepts and notations we define in this paper.
Let be a set of distinct articles. A clickstream dataset consists of several user sessions. A user session S (or session in short) is defined as an ordered list of viewed articles ⟨nw1,nw2,…,nwz⟩ within a visit. Each article is represented by different attributes such as popularity, topics, published date. These attributes are called article attributes. Once an article is
URecSYS: a utility-based news recommendation system
Given the proposed news utility model to present a domain specific objective, the most important challenge is how to incorporate it into the recommendation process. To address this challenge, we design a Utility-based news Recommendation SYStem (URecSYS). URecSYS is a rule-based recommendation system which is designed and developed using Apache Spark and MapReduce framework. URecSYS first finds recommendation rules from the clickstream dataset and then applies the discovered rules to recommend
Experimental results and discussions
The experimental environment consists of one master node and six worker nodes. Each node is equipped with Intel Xeon 2.6 GHz(each 12 core) and 128 GB main memory. The framework is implemented on Spark 2.3.0.
Conclusions, limitations and future work
In the news recommendation context, the most challenging problem for any online news publisher is to make a balance between different business objectives (e.g., increasing user engagement through free content delivery in one hand and revenue maximization through subscription on the other hand). Such objectives are usually in contention with one another. Moreover, most existing news recommendation systems only consider the click (e.g., CTR) as an implicit feedback, which is quite problematic as
Acknowledgments
This work is funded by Natural Sciences and Engineering Research Council of Canada (NSERC), The Globe and Mail, and the Big Data Research, Analytics and Information Network (BRAIN) Alliance established by the Ontario Research Fund - Research Excellence Program (ORF-RE).
Morteza Zihayat is an assistant professor at the School of Information Technology Management of Ryerson University from 2016 and IBM CAS Faculty Fellow from 2018. Before joining ITM, he was a Postdoctoral Fellow at University of Toronto (2015-2016). He was also a research fellow in the IBM Cloud Analytics as a member of the BRAIN ALLIANCE — Big Data Research, Analytics, and Information Network. His research concerns Big Data Analytics and machine learning. He was recently awarded multiple
References (37)
- et al.
Value co-creation between firms and customers: the role of big data-based cooperative assets
Information and Management
(2016) - et al.
SocoTraveler: Travel-package recommendations leveraging social influence of different relationship types
Information and Management
(2016) - et al.
A web recommendation system considering sequential information
Decision Support Systems
(2015) - et al.
A novel recommendation model with Google similarity
Decision Support Systems
(2016) - et al.
Bayesian probabilistic matrix factorization with social relations and item contents for recommendation
Decision Support Systems
(2013) - et al.
Collaborative error-reflected models for cold-start recommender systems
Decision Support Systems
(2011) - et al.
A social recommender mechanism for e-commerce: combining similarity, trust, and relationship
Decision Support Systems
(2013) - et al.
Maximizing customer satisfaction through an online recommendation system: a novel associative classification model
Decision Support Systems
(2010) - et al.
Measuring consumers' willingness to pay with utility-based recommendation systems
Decision Support Systems
(2015) - et al.
A survey on feature selection methods
Computers & Electrical Engineering
(2014)
From incidental news exposure to news engagement. How perceptions of the news post and news usage patterns influence engagement with news articles encountered on Facebook
Computers in Behavior
A hybrid of sequential rules and collaborative filtering for product recommendation
Information Sciences
Time-aware Subscription Prediction Model for User Acquisition in Digital News Media
Algorithms for non-negative matrix factorization
Artificial Intelligence: A Modern Approach
Principles of Economics
Content-based recommender systems: state of the art and trends
Cited by (70)
An improved heterogeneous graph convolutional network for job recommendation
2023, Engineering Applications of Artificial IntelligenceA systematic review of value-aware recommender systems
2023, Expert Systems with ApplicationsData Science, Machine learning and big data in Digital Journalism: A survey of state-of-the-art, challenges and opportunities
2023, Expert Systems with ApplicationsLive streaming recommendations based on dynamic representation learning
2023, Decision Support SystemsAn improved autoencoder for recommendation to alleviate the vanishing gradient problem
2023, Knowledge-Based Systems
Morteza Zihayat is an assistant professor at the School of Information Technology Management of Ryerson University from 2016 and IBM CAS Faculty Fellow from 2018. Before joining ITM, he was a Postdoctoral Fellow at University of Toronto (2015-2016). He was also a research fellow in the IBM Cloud Analytics as a member of the BRAIN ALLIANCE — Big Data Research, Analytics, and Information Network. His research concerns Big Data Analytics and machine learning. He was recently awarded multiple research grants, including NSERC Discovery Grant, NSERC Engage and MITACS. He has ongoing collaborations with industry, including IBM Canada, The Globe and Mail and AT&T Labs Research. Morteza obtained his PhD from York University where he worked on designing scalable frameworks to discover actionable knowledge from Big Data streams and social networks. His research has been published in top-tier data mining and data management venues such as Information Sciences, Machine Learning, SIGKDD, SIAM SDM, PKDD, EDBT.
Anteneh Ayanso is Professor of Information Systems and founding director of the Centre for Business Analytics at the Goodman School of Business, Brock University. He teaches Business Analytics, Database Design and Management, Data Mining Techniques & Applications and Management of IS/IT. He received PhD in Information Systems from the University of Connecticut and MBA from Syracuse University. His research interests focus primarily on data management and information retrieval, Big Data analytics, electronic commerce, and electronic government. His articles are published in leading journals such as Decision Sciences, Decision Support Systems, European Journal of Operational Research, Journal of Database Management, International Journal of Electronic Commerce, Government Information Quarterly, among others. His research has been funded by government grants, including NSERC Discovery Research Grant, NSERC Engage Grant and Voucher for Innovation and Productivity (VIP) by Ontario Centres of Excellence (OCE). He is currently serving as an Associate Editor at Decision Support Systems journal and a review board member at Journal of Database Management, and International Journal of Convergence Computing.
Aijun An is a Professor in the Department of Electrical Engineering and Computer Science at York University. She is currently leading the Big Data Research, Analytics and Information Network (BRIAN) Alliance, an Ontario-based research network that involves four universities and a dozen of private and public sector partners. Her main research area is data mining. She has worked on various research topics in data mining, including classification, clustering, data stream mining, high utility pattern mining, sentiment and emotion analysis from text, topic detection, parallel and distributed deep learning, graph mining and bioinformatics. She has published extensively in various well-respected journals and conferences in data mining, databases, optimization, and intelligent information systems. Her research has been supported by NSERC, SSHRC, and ORF-RE.
Heidar Davoudi recieved his PhD in Computer Science at Department of Electrical Engineering and Computer Science at York University, Canada. His research interest includes data mining and machine learning and, in particular, user modeling for acquisition, engagement, and recommendation.
Xing Zhao is a Master's student in Data Mining Lab at the Department of Electrical Engineering and Computer Science at York University. His research areas of interest are machine learning and Big Data. He received B.Sc., Spec. Hons. in Computer Science from York University in 2017.