Keywords

1 Introduction

In recent years, due to diffusion of recorder and the spread of the Internet, the functions of television are changing, the purpose of watching television also changed [1]. So, it changes the TV industry circumstance. Therefore, it is an important issue for predicting customer’s preferences and the latent needs. Then, TV viewing changes from a long time to a short time, and the positive consciousness to the television decreases. As a result, it is thought that the viewing time of the people of the world is decreasing, and the viewing time is reduced. In this study, we predict what kinds of customers can be preference-oriented, faithful customers for what kind of programs, that is, the need to predict customers’ actions for watching television is increasing.

Looking at the TV viewing time as a customer’s loyalty, we should pay attention to how long watching of TV on weekdays customers should Kimura reported the statistics about Japanese TV viewing behavior on long-term change from the viewpoint of behavior and consciousness of the studies [1]. He gathered TV viewing data both 2010 and 2015 from 2400 monitors then he found that many monitors reduced watching TV, from the viewing time of television for 30 min to 2 h for short time, 3 h for normal viewing, 4 h for long time viewing, 5 years viewing trend variation Data is available for 2400 people all over the world, customers in the former age group increased (35% to 38%) for short time, ordinary viewing (21% to 19%), long-time viewing (40% to 37%) decreased (see Fig. 1).

Fig. 1.
figure 1

(from [1])

Japanese viewing time change in 2010 and 2015

2 Dataset

In this study, we use below data. This data is obtained from VR-CUBIC of Video Research Ltd. that was provided from Data Analysis Competitions 2018 by Joint Association Research Group of Management Sciences. This data is obtained from Kanto district area and mainly consist with some detail media contact situation. The summary of data is shown in below:

  • Source of data: Television viewing data (data set on television from April 2017 to April 2018)

  • Contents of data: TV contact log, TV play log, web site browsing log, program information, sample information.

In this study, we randomly selected 1,500 customers from all the data from September 2017 to the December 2017.

3 Method

In this section we explain our analysis procedure.

3.1 Data Summary

First of all, from 03/Sep./2017 to 01/Dec./2017 data, 1,500 respondent monitors were randomly extracted from customer data. The results are as following Table 1:

Table 1. Data item

Based on the sample information and the customer’s personal information, we calculate summary statistics of 1,500 monitors’ data for each generational/housewife code, gender, unmarried and age (see Figs. 2, 3, 4 and 5).

Fig. 2.
figure 2

Monitor attribute

Fig. 3.
figure 3

Age ratio

Fig. 4.
figure 4

Distinct of sex

Fig. 5.
figure 5

Data of unmarried

As shown in these figures, this sample has more household head and 30 s to 50 s monitor, it is greater ratio than population. Moreover, there is hardly difference between the numbers of men and women, but men are a bit more abundant. The number of married monitors is twice of unmarried ones.

3.2 Analysis of TV Viewing

Firstly, we analysis characteristics of consumers’ TV viewing behavior. We divide all target monitors into some groups by TV viewing method. The viewing method is divided into three (i.e. time shift, real time, web site). Then, in this study, the relationship among these three construct a hierarchical structure is shown (Fig. 6).

Fig. 6.
figure 6

Customer split by TV viewing method

Next, we calculate how much time is taken for each method, and how much proportion are there three methods by Eq. (1).

$$ Real \,time \,viewing\, time \,ratio = \frac{Real\, time}{Real \,time + time\, shift + web\, site} $$
(1)

Further, time shift and web site browsing ratios are obtained using same nature of Eq. (1). Then using these three ratios, we depict a triangle graph as Fig. 7.

Fig. 7.
figure 7

Viewing time ratio of the three viewing methods (\( X \) is real time TV, \( Y \) is time shift TV and \( Z \) is web site)

As shown in this figure, lots of monitors are watching TV in real time mainly. Inferring the reason for seeing the time shift regarding TV viewing is thought that there is cause such as wanting to see at different time or want to see repeatedly, we can see that many customers watch television in real time.

However, time shift ratio is not high ratio and Web TV is near ratio. So, we think time shift viewing is limited some situation about consumer or contents of TV program.

3.3 Cluster Analysis for Segment

In this study, we divide customers into several segments which can be considered homogeneous features. Latent class model [2] with EM (Expectation and Maximizing) algorithm is used in this study. Latent class model is also known as a mixture model. Some common latent classes are assumed and each case belongs to these classes with a probability. Difference of belonging probabilities appear the heterogeneity among monitors. It is very difficult to obtain the optimal parameters containing probabilities at once, because the number of parameters is so many, then EM algorithm which is algebraic calculation repeatedly, are performed.

We use time, sex, age, etc. as explanatory variables. Moreover, we decide the number of clusters based on Bayesian Information Criteria [2], and 2 cluster model was chooses. We can interpret the segment and obtain some representation characteristics of customer [3]. The summary of results is shown in Table 2.

Table 2. Cluster Summary

As shown in Table 1, comparing with Class 1 and Class 2, 20’s and 50’s monitors see real-time viewing is more frequent in short-time viewing, in the case of time shift, There are more in medium-time monitors. The consumers who belong to class 2 tend to long time watch TV. Almost of them are older.

The box plot of real time viewing time of Class 1 and Class 2 are shown in Figs. 8 and 9. The statistics are summarized in Table 3.

Fig. 8.
figure 8

Real time viewing in class1

Fig. 9.
figure 9

Real time viewing in class2

Table 3. Real time viewing summary (min.)

It is a real-time viewing and a box-by-class diagram by customer’s class, and it can be seen that the viewing time tends to be long in the class 2. Then the box plot of the time shift are shown in Figs. 10 and 11. The statistics are summarized in Table 4.

Fig. 10.
figure 10

Time shift viewing in class1

Fig. 11.
figure 11

Time shift viewing in class2

Table 4. Time shift viewing summary (min.)

We can know that Class 2 tends to have longer viewing time.

3.4 Prediction Model

In this section, we construct a behavior prediction model for each customer segment and clarify the viewing trend of customers. The prediction method is to set the day as t and summarize the viewing data for the previous seven days. We can estimate parameters affecting viewing on the day by putting real time viewing, time shift viewing and watching TV viewing on the day of the day and performing logistic regression analysis. The objective variables is whether TV watching by real time or time shift in the next day are shown in Table 5. Then, we perform 4 analyzes which one combined objective variable and segment.

Table 5. Objective variables

Explanatory variables as shown in Table 6.

Table 6. Explanatory variables

The summary of the results is shown in Table 7.

Table 7. The summary of the results

As shown in Table 7, class 1 is that there is a tendency to watch for short time viewing. The longer the viewing time for seven days, the higher the possibility of viewing that day. Class 2 is that there is a tendency to view for medium and long time viewing and the longer the viewing time for seven days. The longer the viewing time for seven days, the higher the possibility of viewing that day. With consideration of explanatory variable, customer personal attribute data and TV program data are added, and a more accurate model can be obtained.

4 Discussion

From the result of the classification as shown in above section we discuss from some aspects. By evaluating the features, we were able to learn that each person tended to see which program. By forecasting programs that are easy to see, it is possible to predict the viewing tendency of customers and categorize viewing characteristics. Then you can grasp the viewing trend of the customer in the future.

5 Conclusion

In this study, using television viewing data, we clarified some customer clusters and made it meaning to each cluster. Then, we have revealed several customer clusters using television viewing data, making it meaningful for each cluster. The result confirmed the customer’s lifestyle to the television. Then, from viewing data for seven days, for each class, estimate the viewing tendency of the customer with a simple explanatory variable, and clarify the viewing tendency of the customer.

In the future, to find variables when clustering from data, to add data such as programs etc. to the viewing tendency of customers, and to grasp the tendency of viewing programs of customers are our works.