Elsevier

Expert Systems with Applications

Volume 133, 1 November 2019, Pages 59-74
Expert Systems with Applications

Early prediction of the future popularity of uploaded videos

https://doi.org/10.1016/j.eswa.2019.05.015Get rights and content

Highlights

  • We propose a new approach to early prediction of new video popularity.

  • Our approach does not require historical popularity data to generate forecasts.

  • The proposed method shows better predictive performance than other methods.

  • The proposed method can effectively predict the future popularity of video.

Abstract

Predicting the popularity of videos on video sharing sites is important for the formulation of online advertising strategies and commercial marketing. The predicted popularity value can help the system decide which videos to recommend to users and determine the recommended order of related videos. However, the weakness of most previous methods has been that the prediction of video popularity is based on past usage data after upload. Early predictions cannot be made because they are dependent on past access data. To solve this problem, we propose a new method for the early prediction of video popularity based on the data available at the initial time of video upload. This study first uses a supervised learning approach to develop a video popularity prediction model. Then, to further improve the overall accuracy of the prediction, an ensemble model is developed for integration of these classification results to get the most accurate predictions. Empirical evaluation shows that these models can effectively predict the future popularity of videos at the time of uploading.

Introduction

Social Networking Sites (SNWs) offer highly interactive platforms that allow users to build social relationships with likeminded individuals, to organize social events, share interests, information and ideas, communicate awareness, build real-life connections, and organize and manage social groups in virtual communities (Choi and Burnes, 2016, De Salve et al., 2016, Nambi and Prasad, 2016, Sicilia et al., 2018, Wang et al., 2017). As reported in a recent survey by the Search Engine Journal (SEJ, 2016), most Internet users actively participate in social networks such as YouTube. In addition, SNWs help to create collective knowledge and is now the primary source of information gathering in the purchase decision process (Bilgihan et al., 2016, Hall et al., 2017, Nguyen et al., 2017, Wei et al., 2017). According to a survey by Socialnomics (2016), 93% of internet shoppers’ buying decisions are influenced by SNWs. Moreover, many companies have developed online marketing strategies and upload their advertisements to video-sharing sites, such as YouTube, to attract consumer attention and encourage purchase intentions (Belanche et al., 2017, McDuff et al., 2015).

YouTube is one of the most influential SNWs (Ahmad et al., 2017, Pecay, 2017, Social impact of YouTube. Wikipedia 2018). It was established in 2005 and has become the world's largest online video website (Wikipedia, 2018). According to YouTube statistics, it has more than a billion users, which represents nearly one-third of all Internet users, with people watching hundreds of millions of hours of video on YouTube every day and generating billions of views (YouTube, 2018). According to eMarketer (2016), 72% of marketers plan to invest in digital video ads on YouTube in the subsequent 12 months. Additionally, approximately three-quarters of leading U.S. marketing brands intend to use digital video advertising on the YouTube website.

Recently, social networking mining (Chorley et al., 2015, Ditrich and Sassenberg, 2017, Han et al., 2015, Li et al., 2016) has become a widely studied research topic. A key task in social network information mining is predicting the popularity of content on social networks (Bandari et al., 2012, Kekolahti et al., 2016). For example, numerous studies have focused on YouTube trying to discover the characteristics of video popularity (Ahmad et al., 2017, Figueiredo et al., 2016, Susarla et al., 2016). Understanding what makes videos popular is of the utmost importance for SNWs like YouTube. For instance, it could help them to provide more suitable services and reduce information overload. Website operators could place more popular videos in prominent positions on web pages. For advertisers, precise keywords, tags, and descriptions might help increase disclosure opportunities to increase the exposure rate of videos on YouTube (Roy et al., 2013, Toderici et al., 2010).

Previous studies on the popularity of videos on YouTube have generally used one of two approaches. The first approach is to subject profile and behavior data to statistical analysis to understand the various distributions and behavioral properties. Cheng, Dale, and Liu (2013) analyzed the characteristics of YouTube videos. They reported that the view counts of most videos grow slowly, and that links to related videos generated by the uploaders’ choices form a small-world network. Cheng, Dale, and Liu (2008) studied the distribution and temporal patterns of view counts. They designed a power law curve to fit the growth trend of view counts. Welch, Cho, and Chang (2010) suggested using the text from the video content to obtain high-quality keywords suitable for matching with advertisements. The popularity of related keywords has increased significantly, indicating their benefit for advertising strategies. Rodrigues, Benevenuto, Almeida, Almeida, and Gonçalves (2010) analyzed duplicates in video-sharing systems. They focused on understanding the social perception of duplicate content represented by videos marked as duplicates by YouTube algorithms. Brodersen, Scellato, and Wattenhofer (2012) indicated that geographic relevance is a key factor influencing video popularity. Figueiredo, Almeida, Gonçalves, and Benevenuto (2014) maintained that top videos and copyright-protected videos achieve most of their views in early bursts, usually within a week or a day. They claimed that search engines and related videos are the most crucial mechanisms that drive users to view them.

In contrast, the second approach uses prediction models to predict the popularity of YouTube videos. Chatzopoulou, Sheng, and Faloutsos (2010) conducted an in-depth study of the fundamental properties of video popularity on YouTube. They proposed a linear regression model for predicting view counts and indicated that related video graphs could be characterized as a small-world network. Szabo and Huberman (2010) used a linear regression model to predict the long-time popularity of online content from early measurement of user access patterns. Wu, Timmers, De Vleeschauwer, and Van Leekwijck (2010) proposed a neural network model for predicting the near-future popularity of videos on the basis of popularity data from previous days. Gürsun, Crovella, and Matta (2011) proposed an autoregressive moving average model to describe the daily access patterns of rarely accessed and frequently accessed videos and to predict the number of hits that a video will have in the near future. Roy et al. (2013) proposed a social transfer learning model for identifying the sudden bursts of popularity of videos. Pinto, Almeida, and Gonçalves (2013) presented multivariate linear regression (ML) and multivariate radial basis function models for predicting the future popularity of web content on the basis of historical information provided by early popularity measures.

Previous studies on the model-based approach for predicting popularity have a common weakness: video popularity is predicted based on previous popularity data. That is, future popularity prediction is predicted based on the observed popularity data in previous periods. This dependence makes it impossible for these models to generate popularity predictions at the time of upload because no historical popularity data which can be used to generate predictions exists. Hence, our study aims to predict the future popularity of videos upon uploading. To this end, our model is built not based on past popularity data, but on the variables extracted from the data available when the video is uploaded.

In our model, variables are extracted from three sources, namely videos related to the uploaded video, the author (uploader) of the video, and the important keywords appearing in the user-generated content for the uploaded video. For each source, or perspective, we collect four features from the Top K videos ranked according to three different ranking methods. All these 36 variables (3 perspectives × 4 features × 3 ranking methods) are known at the moment when the video is uploaded.

This is the first attempt to study the predicted future popularity of videos at the moment of uploading. Such predictions bring several benefits. (1) They expand the basis of recommendation. Unlike previous models, where only old videos are recommended, new videos can now be recommended. (2) Due to the expansion of the recommendation base, the videos recommended by the website will be more in line with the user's preferences. (3) Getting the popularity prediction earlier makes it possible to recommend an influential video earlier. (4) The popularity of a video is indicative of its importance, just as page rank values measure the importance of web pages. Combining the importance of the video with other factors can help us determine the most appropriate location for a video to appear in the ranking of the search results. (5) Understanding the factors that influence future new videos can provide us with clues on how to design influential videos.

The rest of the paper is organized as follows. We first review the literature on video popularity prediction. We then introduce the variables and models used in this study. Subsequently, a series of experiments are described that involve verifying the effectiveness of the framework in predicting the popularity of YouTube videos. Finally, some conclusions and directions of future research are presented.

Section snippets

Related work

In recent years, numerous studies have thoroughly explored the popularity of online content (Bai et al., 2018, Bandari et al., 2012). Analysis of the popularity patterns of videos on YouTube has shown “related video” to be one of the key factors for predicting view counts (Chatzopoulou et al., 2010, Figueiredo et al., 2014, Figueiredo et al., 2011, Zhou et al., 2016). “Keywords” has been found to be an important feature for searching for videos in internal search engines (Welch et al., 2010,

Variables and models

The objective of this study is to develop models for predicting the future popularity of a video using the predicting variables obtained at the moment when the video is uploaded. To this end, we first describe how we determine the popularity of a video. We then introduce the predicting variables used to build the models before explaining how these predicting variables are extracted from the raw data collected from the YouTube site. Finally, we describe the models in detail.

Evaluation

In this section, we conduct four empirical evaluations to investigate the effectiveness of the features for predicting a video's popularity. We used the features of three perspectives (i.e., related videos, author, and keywords) to predict the popularity. In study 1, we adopted all features for prediction and comparison of the performance of the ensemble model with those of the five individual classifiers. To comprehend the influence of each individual category of features for popularity

Conclusions

This aim of this work was to develop a model for predicting the future popularity of a video at the earliest possible time. Accordingly, the prediction model we adopted uses only the data collected when the video is initially uploaded as the prediction variables. Consequently, the prediction variables applied in our approach differ significantly from those used in previous approaches. The benefit of our approach is that in the past, popularity prediction could only be made for some videos

Conflict of interest

None.

CRediT authorship contribution statement

Chen Yen-Liang: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Supervision. Chang Chia-Ling: Conceptualization, Formal analysis, Software, Data curation, Writing - original draft, Writing - review & editing.

References (70)

  • X. Han et al.

    Alike people, alike interests? Inferring interest similarity in online social networks

    Decision Support Systems

    (2015)
  • P. Kekolahti et al.

    Features as predictors of phone popularity: An analysis of trends and structural breaks

    Telematics and Informatics

    (2016)
  • J.G. Lee et al.

    An approach to model and predict the popularity of online contents with explanatory factors

  • C.-T. Li et al.

    Exploiting concept drift to predict popularity of social multimedia in microblogs

    Information Sciences

    (2016)
  • V.-D. Nguyen et al.

    Using community preference for overcoming sparsity and cold-start problems in collaborative filtering system offering soft ratings

    Electronic Commerce Research and Applications

    (2017)
  • R. Sicilia et al.

    Twitter Rumour Detection in the Health Domain

    Expert Systems with Applications

    (2018)
  • J. Wei et al.

    Collaborative filtering and deep learning based recommendation system for cold start items

    Expert Systems with Applications

    (2017)
  • R. Zhou et al.

    Boosting video popularity through keyword suggestion and recommendation systems

    Neurocomputing

    (2016)
  • T. Bai et al.

    Characterizing and predicting early reviewers for effective product marketing on e-commerce websites

    IEEE Transactions on Knowledge and Data Engineering

    (2018)
  • R. Bandari et al.

    The pulse of news in social media: Forecasting popularity

    ICWSM

    (2012)
  • S.A. Baumgarten

    The innovative communicator in the diffusion process

    Journal of Marketing Research

    (1975)
  • J.A. Bilmes

    A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models

    International Computer Science Institute

    (1998)
  • A. Brodersen et al.

    Youtube around the world: Geographic popularity of videos

  • M. Cha et al.

    I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system

  • G. Chatzopoulou et al.

    A first step towards understanding popularity in YouTube

  • X. Cheng et al.

    Statistics and social network of youtube videos

  • X. Cheng et al.

    Understanding the characteristics of internet short video sharing: A youtube-based measurement study

    IEEE Transactions on multimedia

    (2013)
  • H. Choi et al.

    How consumers contribute to the development and continuity of a cultural market

    Consumption Markets & Culture

    (2016)
  • Class REPTree. WEKA. (2017). http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/REPTree.htmlAccessed 10...
  • P.T. Dalvi et al.

    Anemia detection using ensemble learning techniques and statistical models

  • T. Ditterrich

    Machine learning research: Four current direction

    Artificial Intelligence Magzine

    (1997)
  • S.T. Dumais

    Latent semantic indexing (LSI) and TREC-2

  • S. Dzeroski et al.

    Is combining classifiers better than selecting the best one

    Machine learning

    (2004)
  • Digital Video Advertising to Grow at Annual Double-Digit Rates. eMarketer. (2016)....
  • F. Figueiredo et al.

    On the dynamics of social media popularity: A youtube case study

    ACM Transactions on Internet Technology (TOIT)

    (2014)
  • Cited by (18)

    • Identifying content unaware features influencing popularity of videos on YouTube: A study based on seven regions

      2022, Expert Systems with Applications
      Citation Excerpt :

      For example, Rui et al. (2019) present a regression-based model to predict YouTube videos' view count. Similarly, the works of (Chen and Chang, 2019) and (Trzciński and Rokita, 2017) introduce a method to predict a video's future popularity without considering the historical popularity statistics. Sangwan and Bhatnagar (2020) focus on the popularity identification of education/training related contents on YouTube.

    • Prediction of information cascades via content and structure proximity preserved graph level embedding

      2021, Information Sciences
      Citation Excerpt :

      Among these, network topological structure-oriented features of the cascades are shown powerful for the prediction task by multiple studies [15]. Besides the design of different features, other issues in classification are also considered, such as concept drift problem [27] and model ensemble [9]. Also, some works [26] try to forecast who will involved in the diffusion of online information on social networks.

    • Hybrid machine learning approach for popularity prediction of newly released contents of online video streaming services

      2020, Technological Forecasting and Social Change
      Citation Excerpt :

      The researches require the user’s information, historical usage data, and metadata for contents. In reference to video contents, many recent researches utilize time-series based log data with content metadata, or external information such as text data written on social network applications (Chen and Chang, 2019; Fukushima et al., 2016; Mestyán et al., 2013; Trzciński and Rokita, 2017; Wu et al., 2016; Zhu et al., 2017). Initially, many researchers focused on the analysis of tree series based on meta data of contents and QOE data.

    • RL-OPRA: Reinforcement Learning for Online and Proactive Resource Allocation of crowdsourced live videos

      2020, Future Generation Computer Systems
      Citation Excerpt :

      Furthermore, most of the efforts, including [12] and [13], have adopted a strategy of renting servers on cloud sites, where the video is popular. Meanwhile, existing efforts [14,15] predict this popularity after receiving the feedback of viewers (e.g., reacts, joining viewers, etc.) and not at the instant of the broadcasting. Also, because of the dynamics of viewership, the popularity could change over-time, which causes allocation inefficiency and requires alteration of video location.

    View all citing articles on Scopus
    View full text