Early prediction of the future popularity of uploaded videos
Introduction
Social Networking Sites (SNWs) offer highly interactive platforms that allow users to build social relationships with likeminded individuals, to organize social events, share interests, information and ideas, communicate awareness, build real-life connections, and organize and manage social groups in virtual communities (Choi and Burnes, 2016, De Salve et al., 2016, Nambi and Prasad, 2016, Sicilia et al., 2018, Wang et al., 2017). As reported in a recent survey by the Search Engine Journal (SEJ, 2016), most Internet users actively participate in social networks such as YouTube. In addition, SNWs help to create collective knowledge and is now the primary source of information gathering in the purchase decision process (Bilgihan et al., 2016, Hall et al., 2017, Nguyen et al., 2017, Wei et al., 2017). According to a survey by Socialnomics (2016), 93% of internet shoppers’ buying decisions are influenced by SNWs. Moreover, many companies have developed online marketing strategies and upload their advertisements to video-sharing sites, such as YouTube, to attract consumer attention and encourage purchase intentions (Belanche et al., 2017, McDuff et al., 2015).
YouTube is one of the most influential SNWs (Ahmad et al., 2017, Pecay, 2017, Social impact of YouTube. Wikipedia 2018). It was established in 2005 and has become the world's largest online video website (Wikipedia, 2018). According to YouTube statistics, it has more than a billion users, which represents nearly one-third of all Internet users, with people watching hundreds of millions of hours of video on YouTube every day and generating billions of views (YouTube, 2018). According to eMarketer (2016), 72% of marketers plan to invest in digital video ads on YouTube in the subsequent 12 months. Additionally, approximately three-quarters of leading U.S. marketing brands intend to use digital video advertising on the YouTube website.
Recently, social networking mining (Chorley et al., 2015, Ditrich and Sassenberg, 2017, Han et al., 2015, Li et al., 2016) has become a widely studied research topic. A key task in social network information mining is predicting the popularity of content on social networks (Bandari et al., 2012, Kekolahti et al., 2016). For example, numerous studies have focused on YouTube trying to discover the characteristics of video popularity (Ahmad et al., 2017, Figueiredo et al., 2016, Susarla et al., 2016). Understanding what makes videos popular is of the utmost importance for SNWs like YouTube. For instance, it could help them to provide more suitable services and reduce information overload. Website operators could place more popular videos in prominent positions on web pages. For advertisers, precise keywords, tags, and descriptions might help increase disclosure opportunities to increase the exposure rate of videos on YouTube (Roy et al., 2013, Toderici et al., 2010).
Previous studies on the popularity of videos on YouTube have generally used one of two approaches. The first approach is to subject profile and behavior data to statistical analysis to understand the various distributions and behavioral properties. Cheng, Dale, and Liu (2013) analyzed the characteristics of YouTube videos. They reported that the view counts of most videos grow slowly, and that links to related videos generated by the uploaders’ choices form a small-world network. Cheng, Dale, and Liu (2008) studied the distribution and temporal patterns of view counts. They designed a power law curve to fit the growth trend of view counts. Welch, Cho, and Chang (2010) suggested using the text from the video content to obtain high-quality keywords suitable for matching with advertisements. The popularity of related keywords has increased significantly, indicating their benefit for advertising strategies. Rodrigues, Benevenuto, Almeida, Almeida, and Gonçalves (2010) analyzed duplicates in video-sharing systems. They focused on understanding the social perception of duplicate content represented by videos marked as duplicates by YouTube algorithms. Brodersen, Scellato, and Wattenhofer (2012) indicated that geographic relevance is a key factor influencing video popularity. Figueiredo, Almeida, Gonçalves, and Benevenuto (2014) maintained that top videos and copyright-protected videos achieve most of their views in early bursts, usually within a week or a day. They claimed that search engines and related videos are the most crucial mechanisms that drive users to view them.
In contrast, the second approach uses prediction models to predict the popularity of YouTube videos. Chatzopoulou, Sheng, and Faloutsos (2010) conducted an in-depth study of the fundamental properties of video popularity on YouTube. They proposed a linear regression model for predicting view counts and indicated that related video graphs could be characterized as a small-world network. Szabo and Huberman (2010) used a linear regression model to predict the long-time popularity of online content from early measurement of user access patterns. Wu, Timmers, De Vleeschauwer, and Van Leekwijck (2010) proposed a neural network model for predicting the near-future popularity of videos on the basis of popularity data from previous days. Gürsun, Crovella, and Matta (2011) proposed an autoregressive moving average model to describe the daily access patterns of rarely accessed and frequently accessed videos and to predict the number of hits that a video will have in the near future. Roy et al. (2013) proposed a social transfer learning model for identifying the sudden bursts of popularity of videos. Pinto, Almeida, and Gonçalves (2013) presented multivariate linear regression (ML) and multivariate radial basis function models for predicting the future popularity of web content on the basis of historical information provided by early popularity measures.
Previous studies on the model-based approach for predicting popularity have a common weakness: video popularity is predicted based on previous popularity data. That is, future popularity prediction is predicted based on the observed popularity data in previous periods. This dependence makes it impossible for these models to generate popularity predictions at the time of upload because no historical popularity data which can be used to generate predictions exists. Hence, our study aims to predict the future popularity of videos upon uploading. To this end, our model is built not based on past popularity data, but on the variables extracted from the data available when the video is uploaded.
In our model, variables are extracted from three sources, namely videos related to the uploaded video, the author (uploader) of the video, and the important keywords appearing in the user-generated content for the uploaded video. For each source, or perspective, we collect four features from the Top K videos ranked according to three different ranking methods. All these 36 variables (3 perspectives × 4 features × 3 ranking methods) are known at the moment when the video is uploaded.
This is the first attempt to study the predicted future popularity of videos at the moment of uploading. Such predictions bring several benefits. (1) They expand the basis of recommendation. Unlike previous models, where only old videos are recommended, new videos can now be recommended. (2) Due to the expansion of the recommendation base, the videos recommended by the website will be more in line with the user's preferences. (3) Getting the popularity prediction earlier makes it possible to recommend an influential video earlier. (4) The popularity of a video is indicative of its importance, just as page rank values measure the importance of web pages. Combining the importance of the video with other factors can help us determine the most appropriate location for a video to appear in the ranking of the search results. (5) Understanding the factors that influence future new videos can provide us with clues on how to design influential videos.
The rest of the paper is organized as follows. We first review the literature on video popularity prediction. We then introduce the variables and models used in this study. Subsequently, a series of experiments are described that involve verifying the effectiveness of the framework in predicting the popularity of YouTube videos. Finally, some conclusions and directions of future research are presented.
Section snippets
Related work
In recent years, numerous studies have thoroughly explored the popularity of online content (Bai et al., 2018, Bandari et al., 2012). Analysis of the popularity patterns of videos on YouTube has shown “related video” to be one of the key factors for predicting view counts (Chatzopoulou et al., 2010, Figueiredo et al., 2014, Figueiredo et al., 2011, Zhou et al., 2016). “Keywords” has been found to be an important feature for searching for videos in internal search engines (Welch et al., 2010,
Variables and models
The objective of this study is to develop models for predicting the future popularity of a video using the predicting variables obtained at the moment when the video is uploaded. To this end, we first describe how we determine the popularity of a video. We then introduce the predicting variables used to build the models before explaining how these predicting variables are extracted from the raw data collected from the YouTube site. Finally, we describe the models in detail.
Evaluation
In this section, we conduct four empirical evaluations to investigate the effectiveness of the features for predicting a video's popularity. We used the features of three perspectives (i.e., related videos, author, and keywords) to predict the popularity. In study 1, we adopted all features for prediction and comparison of the performance of the ensemble model with those of the five individual classifiers. To comprehend the influence of each individual category of features for popularity
Conclusions
This aim of this work was to develop a model for predicting the future popularity of a video at the earliest possible time. Accordingly, the prediction model we adopted uses only the data collected when the video is initially uploaded as the prediction variables. Consequently, the prediction variables applied in our approach differ significantly from those used in previous approaches. The benefit of our approach is that in the past, popularity prediction could only be made for some videos
Conflict of interest
None.
CRediT authorship contribution statement
Chen Yen-Liang: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Supervision. Chang Chia-Ling: Conceptualization, Formal analysis, Software, Data curation, Writing - original draft, Writing - review & editing.
References (70)
- et al.
HarVis: An integrated social media content analysis framework for YouTube platform
Information Systems
(2017) - et al.
A personalized recommender system for pervasive social networks
Pervasive and Mobile Computing
(2017) - et al.
Understanding interactive online advertising: congruence and product involvement in highly and lowly arousing, skippable video ads
Journal of Interactive Marketing
(2017) - et al.
Consumer perception of knowledge-sharing in travel-related online social networks
Tourism Management
(2016) - et al.
Personality and location-based social networks
Computers in Human Behavior
(2015) - et al.
The impact of user's availability on On-line Ego Networks: A Facebook analysis
Computer Communications
(2016) - et al.
Kicking out the trolls–Antecedents of social exclusion intentions in Facebook groups
Computers in Human Behavior
(2017) - et al.
Bagging ensemble models for bank profitability: An emprical research on Turkish development and investment banks
Applied Soft Computing
(2016) - et al.
TrendLearner: Early prediction of popularity trends of user generated content
Information Sciences
(2016) - et al.
A new Dirichlet process for mining dynamic patterns in functional data
Information Sciences
(2017)
Alike people, alike interests? Inferring interest similarity in online social networks
Decision Support Systems
Features as predictors of phone popularity: An analysis of trends and structural breaks
Telematics and Informatics
An approach to model and predict the popularity of online contents with explanatory factors
Exploiting concept drift to predict popularity of social multimedia in microblogs
Information Sciences
Using community preference for overcoming sparsity and cold-start problems in collaborative filtering system offering soft ratings
Electronic Commerce Research and Applications
Twitter Rumour Detection in the Health Domain
Expert Systems with Applications
Collaborative filtering and deep learning based recommendation system for cold start items
Expert Systems with Applications
Boosting video popularity through keyword suggestion and recommendation systems
Neurocomputing
Characterizing and predicting early reviewers for effective product marketing on e-commerce websites
IEEE Transactions on Knowledge and Data Engineering
The pulse of news in social media: Forecasting popularity
ICWSM
The innovative communicator in the diffusion process
Journal of Marketing Research
A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models
International Computer Science Institute
Youtube around the world: Geographic popularity of videos
I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system
A first step towards understanding popularity in YouTube
Statistics and social network of youtube videos
Understanding the characteristics of internet short video sharing: A youtube-based measurement study
IEEE Transactions on multimedia
How consumers contribute to the development and continuity of a cultural market
Consumption Markets & Culture
Anemia detection using ensemble learning techniques and statistical models
Machine learning research: Four current direction
Artificial Intelligence Magzine
Latent semantic indexing (LSI) and TREC-2
Is combining classifiers better than selecting the best one
Machine learning
On the dynamics of social media popularity: A youtube case study
ACM Transactions on Internet Technology (TOIT)
Cited by (18)
Identifying content unaware features influencing popularity of videos on YouTube: A study based on seven regions
2022, Expert Systems with ApplicationsCitation Excerpt :For example, Rui et al. (2019) present a regression-based model to predict YouTube videos' view count. Similarly, the works of (Chen and Chang, 2019) and (Trzciński and Rokita, 2017) introduce a method to predict a video's future popularity without considering the historical popularity statistics. Sangwan and Bhatnagar (2020) focus on the popularity identification of education/training related contents on YouTube.
Prediction of information cascades via content and structure proximity preserved graph level embedding
2021, Information SciencesCitation Excerpt :Among these, network topological structure-oriented features of the cascades are shown powerful for the prediction task by multiple studies [15]. Besides the design of different features, other issues in classification are also considered, such as concept drift problem [27] and model ensemble [9]. Also, some works [26] try to forecast who will involved in the diffusion of online information on social networks.
Hybrid machine learning approach for popularity prediction of newly released contents of online video streaming services
2020, Technological Forecasting and Social ChangeCitation Excerpt :The researches require the user’s information, historical usage data, and metadata for contents. In reference to video contents, many recent researches utilize time-series based log data with content metadata, or external information such as text data written on social network applications (Chen and Chang, 2019; Fukushima et al., 2016; Mestyán et al., 2013; Trzciński and Rokita, 2017; Wu et al., 2016; Zhu et al., 2017). Initially, many researchers focused on the analysis of tree series based on meta data of contents and QOE data.
RL-OPRA: Reinforcement Learning for Online and Proactive Resource Allocation of crowdsourced live videos
2020, Future Generation Computer SystemsCitation Excerpt :Furthermore, most of the efforts, including [12] and [13], have adopted a strategy of renting servers on cloud sites, where the video is popular. Meanwhile, existing efforts [14,15] predict this popularity after receiving the feedback of viewers (e.g., reacts, joining viewers, etc.) and not at the instant of the broadcasting. Also, because of the dynamics of viewership, the popularity could change over-time, which causes allocation inefficiency and requires alteration of video location.
Discovering popular and persistent tags from YouTube trending video big dataset
2024, Multimedia Tools and Applications