A utility-based news recommendation system

doi:10.1016/j.dss.2018.12.001

Decision Support Systems

Volume 117, February 2019, Pages 14-27

https://doi.org/10.1016/j.dss.2018.12.001 Get rights and content

Highlights

•
The news utility model goes beyond click through rate analysis for news recommendations.
•
The system is designed based on the MapReduce framework to analyze huge volume of data in parallel.
•
News Cold Start Problem has been addressed effectively by a novel probabilistic approach.
•
Experiments on 2 billion records demonstrate the effectiveness and efficiency of the system.

Abstract

News platforms exhibit both the challenges as well as opportunities for enhancing the functionalities of recommendation systems in today's big data environment. Novel use of big data storage and programming models can improve news recommendation systems through efficient handling and analysis of clickstream data and a better understanding of users' interests. Most existing approaches to news recommendation consider users' clicks as the implicit feedback to understand user behaviors. However, “clicks” may not be an effective indicator of real user interests. We address this problem by developing a novel news recommendation system based on a news utility model. Given the new utility model, we propose a two stage news recommendation framework. The framework first generates article-level recommendation rules based on the utility model, then integrates the notion of utility and probabilistic topic models and generates topic-level recommendation rules. We argue that the proposed utility-based news recommendation system also addresses the news cold start problem which is one of the most challenging obstacles for news agencies. We evaluate the framework on a massive real dataset (two billion records) obtained from a major newspaper (i.e., The Globe and Mail) in Canada and show that it outperforms the existing methods.

Introduction

Recent advancements in Internet-based technologies and the production of digital contents have shifted news consumption models from reading physical newspapers to visiting online news websites. Understanding and modeling users' interests is a critical task for the value proposition and prosperity of any online service provider. Moreover, in the digital age, value creation has become value co-creation between companies and customers [1]. Thus, many companies focus on their big historical data to model users' behaviors and boost profits. Similarly, online newspaper agencies aim to provide personalized contents to improve visit experiences. This does not only increase the frequency of visits, which boosts the revenue through the advertisements, but also improves user engagement, leading to more subscriptions [2]. In fact, from an online news provider's business perspective, increasing the revenue through advertisements and subscriptions is a major objective. Therefore, for most online newspapers, developing effective recommendation systems to help users find interesting articles and keep them engaged is of paramount importance.

Building an effective and efficient recommendation system for the news domain is much more challenging than other domains:

1.
Business Objectives Trade-off. As the business model evolves, many online news publishers are struggling to balance the availability of high value advertising inventory with the need for content that maximizes opportunities for subscriber acquisition and retention. Thus, in many cases, a recommendation system is needed in which recommended articles satisfy more than one business objective, even if those objectives are in contention with one another.
2.
Beyond Click-through Rates. Conventional news recommendation systems only consider a click as an implicit feedback. This is quite problematic because a user might click on an article but may not be interested in reading it (e.g., the title may be appealing to her while the content may not meet her expectations). Therefore, the number of clicks may not be helpful to address an intended domain specific objective. For example, it is very likely that user engagement increases the number of subscriptions, therefore a news agency might be interested in a type of recommendation which increases the engagement (e.g., dwell time of visited articles) thus the subscription rate. However, click analysis does not necessarily maximize this objective.
3.
Big Data of Unknown Visitors. Newspapers are rich repositories of visitors' historical data. In 2016, the number of average monthly unique visitors for top 50 U.S. newspapers was more than 11 billion¹. The user interaction data of The Globe and Mail², Canada's foremost news media company, is about 30 million sessions in one month. Building a recommendation system that can handle big data requires applying scalable techniques and platforms [3]. Moreover, despite the availability of such huge volume of historical data, most interactions belong to un-subscribed users (i.e., unknown visitors). Therefore, user profiling and collaborative filtering techniques such as Matrix factorization [4] are not effective recommendation approaches in the news domain. This is due to the fact that such approaches need to identify users and collect their reading histories to discover their interests and the similarity among them.
4.
News/Users Cold Start Problems. Articles are generated continuously and unboundedly at a high speed. This makes recommending new articles much harder than recommending new items (e.g., new products, new travel packages [5]) in other e-commerce domains. Thus, the ability to handle the cold start problem is essential for newspapers. There are two types of cold start problems: user cold start (when a new or unknown user visits the portal) and item cold start (when a new article is published). Despite different solutions [6, 7] to this major issue, this problem is still challenging, particularly in the news domain.

To address the aforementioned challenges, this paper presents a Utility-based News Recommendation SYStem, called URecSYS, which works based on a news utility model. The idea of the news utility model is inspired by the concept of utility in intelligent agents. In intelligent agents and machine learning, goal-basedagents and utility-based agents are two common classes of agents [8]. A goal-based agent can only differentiate between goal and non-goal states in the environment. However, it is important to measure how desirable a particular state is. To do so, a utility function is defined such that it measures how satisfied the agent (e.g., a visitor) will be if it moves to a particular state (e.g., article). This helps agents to move toward a goal state (e.g., increase user engagement) with the highest satisfaction. It is important to note that the concept of utility in our model is very similar to its common usage in economics. In economics, the term utility is used to describe the measurement of usefulness and satisfaction that a consumer obtains from any good [9]. The utility is not a characteristic of a particular good (e.g., article), but rather of each consumer's reactions (e.g., user's engagement) to that good. In this paper, we argue that the utility can be defined in the context of recommendation. We model the utility to represent a desirable domain specific objective and recommend article with respect to this specific objective. In other words, we design a utility model to recommend those articles that persuade the user to move toward a higher value of the objective. For example, if the objective is to increase user engagement, a utility model can be designed such that a recommender suggests articles that lead to a highly engaged visit.

The news utility model is designed based on two broad types of attributes: article and user-article interaction attributes. Importantly, the model can be engineered to address one or more objectives even if those objectives are in conflict with one another. URecSYS recommends news articles by discovering article-level rules based on the news utility model. Moreover, by leveraging topic modeling and a probabilistic framework, it generalizes rules from the article-level to the topic-level. This addresses the news cold-start problem properly as newly-published articles can be recommended to a user by matching them to the topic-level rules. Moreover, as the news utility model is built based on reading sessions of all users, the recommendation rules (at either the article-level or the topic-level) can be used for new users, thus also resolves the user cold start problem.

To the best of our knowledge, this study is the first step toward exploring the impact of both article and user-article interaction attributes in a unified framework. Our contributions are summarized as follows:

1.
We define a novel model, called News Utility Model, to simultaneously consider both article attributes (e.g., the recency of the article) and user-article interaction attributes (e.g., DwellTime). We argue that it is more beneficial to recommend news articles based on the utility of articles rather than the browsing frequency of articles.
2.
We propose a novel and scalable rule engine to discover article-level recommendation rules based on the news utility model. At its heart, the rule engine uses multiple MapReduce-like steps to discover article-level recommendation rules in parallel.
3.
We propose a novel probabilistic approach on top of topic-based models to generalize article-level news recommendation rules. The output is a topic-level recommendation rule engine that links topics of articles based on the news utility model, thus the domain specific objective. Such rules recommend newly-published articles based on topics of interests to users.
4.
We apply the proposed framework to a dataset of two billion records collected from a major Canadian news agency (The Globe and Mail³) and demonstrate the effectiveness of the recommended articles by comparing the proposed framework to other state-of-the-art recommendation systems in practice.

The rest of the paper is organized as follows. In Section 2, we provide a review of prior work relevant to news recommenders. In Section 3, we introduce terms and concepts used in this paper. In Section 4, we present the proposed framework. In Section 5, we outline the experimental settings and discuss the results. Finally, in Section 6 we provide conclusions and point out limitations and future research directions.

Section snippets

Literature review

The underlying techniques used in recommender systems can be categorized into three broad classes: content-based [10, 11], collaborative filtering [4, [12], [13], [14], [15]] and hybrid [7, 16, 17] approaches. Several studies have been conducted on content-based news recommendation systems[10]. For example, Liang et al. [18] propose a time-aware content recommendation system. In another work, Agrawal et al [19] take activity freshness into account and significantly outperform click-through rate

Notations and definitions

For convenience, Table 1 summarizes the concepts and notations we define in this paper.

Let $N = {n w_{1}, n w_{2}, \dots, n w_{n}}$ $N = {n w_{1}, n w_{2}, \dots, n w_{n}}$ be a set of distinct articles. A clickstream dataset consists of several user sessions. A user session S (or session in short) is defined as an ordered list of viewed articles ⟨nw₁,nw₂,…,nw_z⟩ within a visit. Each article is represented by different attributes such as popularity, topics, published date. These attributes are called article attributes. Once an article is

URecSYS: a utility-based news recommendation system

Given the proposed news utility model to present a domain specific objective, the most important challenge is how to incorporate it into the recommendation process. To address this challenge, we design a Utility-based news Recommendation SYStem (URecSYS). URecSYS is a rule-based recommendation system which is designed and developed using Apache Spark and MapReduce framework. URecSYS first finds recommendation rules from the clickstream dataset and then applies the discovered rules to recommend

Experimental results and discussions

The experimental environment consists of one master node and six worker nodes. Each node is equipped with Intel Xeon 2.6 GHz(each 12 core) and 128 GB main memory. The framework is implemented on Spark 2.3.0.

Conclusions, limitations and future work

In the news recommendation context, the most challenging problem for any online news publisher is to make a balance between different business objectives (e.g., increasing user engagement through free content delivery in one hand and revenue maximization through subscription on the other hand). Such objectives are usually in contention with one another. Moreover, most existing news recommendation systems only consider the click (e.g., CTR) as an implicit feedback, which is quite problematic as

Acknowledgments

This work is funded by Natural Sciences and Engineering Research Council of Canada (NSERC), The Globe and Mail, and the Big Data Research, Analytics and Information Network (BRAIN) Alliance established by the Ontario Research Fund - Research Excellence Program (ORF-RE).

References (37)

K. Xie et al.
Value co-creation between firms and customers: the role of big data-based cooperative assets
Information and Management
(2016)
J. He et al.
SocoTraveler: Travel-package recommendations leveraging social influence of different relationship types
Information and Management
(2016)
R. Mishra et al.
A web recommendation system considering sequential information
Decision Support Systems
(2015)
T.C.-K. Huang et al.
A novel recommendation model with Google similarity
Decision Support Systems
(2016)
J. Liu et al.
Bayesian probabilistic matrix factorization with social relations and item contents for recommendation
Decision Support Systems
(2013)
H.-N. Kim et al.
Collaborative error-reflected models for cold-start recommender systems
Decision Support Systems
(2011)
Y.-M. Li et al.
A social recommender mechanism for e-commerce: combining similarity, trust, and relationship
Decision Support Systems
(2013)
Y. Jiang et al.
Maximizing customer satisfaction through an online recommendation system: a novel associative classification model
Decision Support Systems
(2010)
M. Scholz et al.
Measuring consumers' willingness to pay with utility-based recommendation systems
Decision Support Systems
(2015)
G. Chandrashekar et al.
A survey on feature selection methods
Computers & Electrical Engineering
(2014)

V. Karnowski et al.

From incidental news exposure to news engagement. How perceptions of the news post and news usage patterns influence engagement with news articles encountered on Facebook

Computers in Behavior

(2017)

D.-R. Liu et al.

A hybrid of sequential rules and collaborative filtering for product recommendation

Information Sciences

(2009)

H. Davoudi et al.

Time-aware Subscription Prediction Model for User Acquisition in Digital News Media

R. Agarwal, V. Dhar, Big data, data science, and analytics: the opportunity and challenge for IS research,...

D.D. Lee et al.

Algorithms for non-negative matrix factorization

S.J. Russell et al.

Artificial Intelligence: A Modern Approach

(2016)

F.W. Taussig

Principles of Economics

(2013)

P. Lops et al.

Content-based recommender systems: state of the art and trends

Cited by (70)

An improved heterogeneous graph convolutional network for job recommendation
2023, Engineering Applications of Artificial Intelligence
Job recommendation is crucial in online recruitment platforms due to the overwhelming number of job postings. Job seekers spend considerable time and effort searching for suitable employment. With millions of job seekers browsing job postings daily, the demand for accurate and effective job recommendations is more pressing than ever. To address this challenge, we propose IHGCN, an improved semi-supervised heterogeneous graph convolutional network model for job recommendation. IHGCN aims to provide job recommendations for early job seekers based on their resumes. Firstly, we introduce a novel labeling classification standard specifically tailored to early job seeker resumes. Secondly, we construct a heterogeneous resume graph where each resume is represented as a node. Job recommendation is treated as a multi-classification problem. Thirdly, our IHGCN model learns a node representation from the graph to perform effective job recommendations. To evaluate our model, we conduct experiments using a real-world resume dataset obtained from LinkedIn. The results demonstrate that IHGCN outperforms the baselines by around 10%. This study highlights the benefits of leveraging meta-paths within the Graph Convolutional Network model to address the sparsity problem caused by the one-hot representation of nodes.
A collaborative filtering model incorporating media promotions and users' variety-seeking tendencies in the digital music market
2023, Decision Support Systems
Understanding customer preferences and providing the right products at the right time to customers via personalized recommendations have been among the major interests of online retailers and service providers. This paper proposes an improved collaborative filtering model that incorporates a firm's marketing effort variables (i.e., media promotional variables) to improve the prediction of customers' digital music choices. In addition, we assert that the predictive model's effectiveness is different for consumers depending on their variety-seeking tendencies in music. We compared our predictive model to benchmark models and demonstrated that our proposed model is superior in predicting users' download behavior. We also found that the overall predictive performance is higher for active variety seekers who consume diverse types of music via streaming. We provide some evidence that this may be due to differences in the degree to which the two groups are influenced by different types of media promotions. The results suggest that considering psychological characteristics such as variety-seeking tendencies provides more advantages in prediction and recommendation systems, which opens new avenues for improvement.
A systematic review of value-aware recommender systems
2023, Expert Systems with Applications
Research on recommender systems (RSs) has traditionally focused on the design of systems capable of suggesting items of interest for users. However, often the most important expectation for RSs used in commercial applications is to improve the business performance of the organization. For this reason, alongside the growth of e-business, we have witnessed growing interest in value-aware RSs that, unlike traditional RSs, are designed to optimize the economic value of recommendations by considering the objectives of multiple stakeholders. In this paper, we provide a systematic literature review, following the PRISMA guidelines, specialized in value-aware RSs. We explore key commercial applications, main algorithms, value categories typically optimized, and the most commonly used datasets. Furthermore, we note limitations of the state-of-the-art approaches and identify future research directions.
Data Science, Machine learning and big data in Digital Journalism: A survey of state-of-the-art, challenges and opportunities
2023, Expert Systems with Applications
Digital journalism has faced a dramatic change and media companies are challenged to use data science algorithms to be more competitive in a Big Data era. While this is a relatively new area of study in the media landscape, the use of machine learning and artificial intelligence has increased substantially over the last few years. In particular, the adoption of data science models for personalization and recommendation has attracted the attention of several media publishers. Following this trend, this paper presents a research literature analysis on the role of Data Science (DS) in Digital Journalism (DJ). Specifically, the aim is to present a critical literature review, synthetizing the main application areas of DS in DJ, highlighting research gaps, challenges, and opportunities for future studies. Through a systematic literature review integrating bibliometric search, text mining, and qualitative discussion, the relevant literature was identified and extensively analyzed. The review reveals an increasing use of DS methods in DJ, with almost 47% of the research being published in the last three years. An hierarchical clustering highlighted six main research domains focused on text mining, event extraction, online comment analysis, recommendation systems, automated journalism, and exploratory data analysis along with some machine learning approaches. Future research directions comprise developing models to improve personalization and engagement features, exploring recommendation algorithms, testing new automated journalism solutions, and improving paywall mechanisms.
Live streaming recommendations based on dynamic representation learning
2023, Decision Support Systems
As an emerging form of social media, live streaming services (e.g., Twitch and Clubhouse) allow users to interact with hosts and peers in real time while enjoying shows or participating in discussions. These platforms are also dynamic, with shows or discussions changing quickly inside a room and users frequently switching between rooms. To improve user engagement and experience on such platforms, we design a new recommendation model named Dynamic Representations for Live Streaming Rooms (DRIVER) to provide room recommendations. Guided by the Integrated Framework for Consumer Path Modeling and the social affordance theory, DRIVER infers dynamic representations of live streaming rooms by leveraging users’ behavior paths in entering, staying in, and leaving rooms. One contribution of our model is a new and efficient dynamic learning framework to model instantaneous and ever-changing inter-room relationships by considering individual users’ behavior paths after leaving a room. Also supported by social affordance theory, another methodological novelty of our model is to capture dynamic characteristics of a room by incorporating features of the current audience inside the room. Experiments on real-world datasets from two different types of live streaming platforms demonstrate that DRIVER outperforms state-of-the-art representation learning methods and sequential recommender systems. The proposed method also has implications for recommender system design in other contexts, in which items are characterized by users’ dynamic behavior paths and ongoing social interactions.
An improved autoencoder for recommendation to alleviate the vanishing gradient problem
2023, Knowledge-Based Systems
In the recommendation domain, user rating data has high sparsity and the number of interaction information from each user is very uneven, which brings great technical challenge to designing high-quality personalized recommendation schemes. Although autoencoder-based recommendation models have achieved some success in handling sparse data, they do not deal well with the uneven distribution of the amount of interaction information among different users, which further prevents the improvement of recommendation performance. In this paper, we find that the vanishing gradient problem occurs in the process of training model with the data from active users and theoretically analyze its root cause. Then, an improved autoencoder is proposed to alleviate this problem in recommendation domain. Furthermore, based on the improved autoencoder, two recommendation schemes are presented for rating prediction task and top- $N$ ranking task, respectively. Finally, the numerical experimental results demonstrate that the proposed recommendation schemes can achieve about 5% and 3% improvements in the rating prediction and top- $N$ ranking, respectively. Therefore, the improved model can well cope with the challenges brought by data sparsity and uneven distribution of rating data and achieve good recommendation performance.

View all citing articles on Scopus

Morteza Zihayat is an assistant professor at the School of Information Technology Management of Ryerson University from 2016 and IBM CAS Faculty Fellow from 2018. Before joining ITM, he was a Postdoctoral Fellow at University of Toronto (2015-2016). He was also a research fellow in the IBM Cloud Analytics as a member of the BRAIN ALLIANCE — Big Data Research, Analytics, and Information Network. His research concerns Big Data Analytics and machine learning. He was recently awarded multiple research grants, including NSERC Discovery Grant, NSERC Engage and MITACS. He has ongoing collaborations with industry, including IBM Canada, The Globe and Mail and AT&T Labs Research. Morteza obtained his PhD from York University where he worked on designing scalable frameworks to discover actionable knowledge from Big Data streams and social networks. His research has been published in top-tier data mining and data management venues such as Information Sciences, Machine Learning, SIGKDD, SIAM SDM, PKDD, EDBT.

Anteneh Ayanso is Professor of Information Systems and founding director of the Centre for Business Analytics at the Goodman School of Business, Brock University. He teaches Business Analytics, Database Design and Management, Data Mining Techniques & Applications and Management of IS/IT. He received PhD in Information Systems from the University of Connecticut and MBA from Syracuse University. His research interests focus primarily on data management and information retrieval, Big Data analytics, electronic commerce, and electronic government. His articles are published in leading journals such as Decision Sciences, Decision Support Systems, European Journal of Operational Research, Journal of Database Management, International Journal of Electronic Commerce, Government Information Quarterly, among others. His research has been funded by government grants, including NSERC Discovery Research Grant, NSERC Engage Grant and Voucher for Innovation and Productivity (VIP) by Ontario Centres of Excellence (OCE). He is currently serving as an Associate Editor at Decision Support Systems journal and a review board member at Journal of Database Management, and International Journal of Convergence Computing.

Aijun An is a Professor in the Department of Electrical Engineering and Computer Science at York University. She is currently leading the Big Data Research, Analytics and Information Network (BRIAN) Alliance, an Ontario-based research network that involves four universities and a dozen of private and public sector partners. Her main research area is data mining. She has worked on various research topics in data mining, including classification, clustering, data stream mining, high utility pattern mining, sentiment and emotion analysis from text, topic detection, parallel and distributed deep learning, graph mining and bioinformatics. She has published extensively in various well-respected journals and conferences in data mining, databases, optimization, and intelligent information systems. Her research has been supported by NSERC, SSHRC, and ORF-RE.

Heidar Davoudi recieved his PhD in Computer Science at Department of Electrical Engineering and Computer Science at York University, Canada. His research interest includes data mining and machine learning and, in particular, user modeling for acquisition, engagement, and recommendation.

Xing Zhao is a Master's student in Data Mining Lab at the Department of Electrical Engineering and Computer Science at York University. His research areas of interest are machine learning and Big Data. He received B.Sc., Spec. Hons. in Computer Science from York University in 2017.

View full text

A utility-based news recommendation system

Highlights

Abstract

Introduction

Section snippets

Literature review

Notations and definitions

URecSYS: a utility-based news recommendation system

Experimental results and discussions

Conclusions, limitations and future work

Acknowledgments

Information and Management

Information and Management

Decision Support Systems

Decision Support Systems

Decision Support Systems

Decision Support Systems

Decision Support Systems

Decision Support Systems

Decision Support Systems

Computers & Electrical Engineering

Computers in Behavior

Information Sciences

Time-aware Subscription Prediction Model for User Acquisition in Digital News Media

Algorithms for non-negative matrix factorization

Artificial Intelligence: A Modern Approach

Principles of Economics

Content-based recommender systems: state of the art and trends