Keywords

1 Introduction

In this study, we analyze “The Japan Professional Soccer League (J League)”. There are 54 clubs in the J League. J League has separated 3 leagues. The top league of them is called J1-League. After Round robin tournament which plays games at home & away is carry out, a club having most points in the league win. Currently, 18 clubs belong to the J1-League.

In the J League, since 2011, the number of audience mobilization is increasing year by year (Fig. 1). Also, from past data, we can see that the number of audience mobilization increased in the following year of carried out World Cup. From participating in the Russian World Cup in 2018 of Japanese national team, it can predict that the J League market will be further activated.

Fig. 1.
figure 1

Trends in number of attendants (upper-left)

Before the analysis, we compared the percentage of the play just before all score of “Urawa Red Diamonds” and “Yokohama F-Marinos” (Table 1). There were several contrasting results between both teams. From this result, it can be predicted that each team belonging to J League has a different score tendency. On the other hand, it can be seen that the Japanese national team has a strong tendency of a score from “through-pass (22%)” and “cross (24%)” (Fig. 2).

Table 1. Start point percentage of scoring (upper-right) play
Fig. 2.
figure 2

Scoring trend of Japan national team (left)

Based on these current situations, this study’s purpose is proposing how to cultivate player and develop team tactics, by analyzing tendency of the play related to scoring in J League and finding the characteristic play of Japanese soccer.

2 Data Definition and Cleaning

Data handled in this analysis is category data which recorded all play history for the last 45 games of the league of 2016 season. All of them have been given the information of location obtained by dividing the field into X-axis & Y-axis (Fig. 3).

Fig. 3.
figure 3

The coordinates of the field

Fig. 4.
figure 4

Area name near the goal

Before analyzing the data, “Vital area” and “Primary area” are set in the penalty area. “Vital area” means the area in front of the penalty area, the play in this area is easy to score. “Primary area” indicates a position further closer to the goal than the vital area. In other words, they are areas where make easily chances. Table 2 shows mainly used variables.

Table 2. Variables mainly used

The penalty area, Vital area, and Primary area are including in the area of 30 m from goal.

If “Enter in Vital area” have been “1”, “Enter the area 30 m from goal” was same (“1”).

If they had same “Attack start history No”, and any of them (ex. “Enter in ~ ”) had been inputted “1”, changed the value which is farther from enemy goal, “1” to “0”.

Through-pass, Back-pass, and Center are kinds of a pass. If “Through-pass” have been “1”, “Pass” was same (“1”). If they had same “Attack start history No”, and any of them (ex. “ ~ -pass”) had been inputted “1”, changed the valuable, “1” to “0”.

3 Cluster Analysis

The start position of the play which involved in a score is segmented by cluster analysis. At first, the first play data reached to score were extracted from all the play data. After that, cluster analysis carried out, using the information location variables (gave coordinate information by X & Y axis) of extracted data. Ward’s method was used in clustering and square Euclidean distance was used in the measurement of distance (Figs. 5 and 6).

Fig. 5.
figure 5

Dendrogram

Fig. 6.
figure 6

Plot of “Starting point of scoring” play”

As the result of the cluster analysis, the point of first play related score was divided into 6 clusters. In this study, we used 3 clusters which had shown good results. They were mainly from own-field.

From this result, we analyzed the tendency of play related to the score from own-field in the J League.

4 Factor Analysis and Covariance Structure Analysis

Find tendencies of strategy for scoring using Factor analysis, after that the covariance structure and analyzed the relationship. We restored play-data related to scoring from the variable, “Attack_start_history_No” of all starting point data which were contained three clusters. Factor analysis was performed used each play data, and the extracted latent variables defined as “tactics involved in scoring”. In factor analysis, Maximum likelihood method was used. Factors were extracted using Gutman-Kaiser criteria. The Gutman-Kaiser criterion is criteria that adopt only factors whose eigenvalues are 1 or higher (Table 3).

Table 3. Evaluation index of Covariance structure analysis

After that, we carried out the covariance structure and analyzed the relationship between “tactics”. Those with low associations between “tactics (latent variables)” are considered as independent tactics, and the play data (explanatory variables) have influenced each tactic are analyzed.

4.1 Cluster1 (from Near Own Goal)

By using Gutman-Kaiser criteria, up to the third factor was adopted as latent factors. From the constituent elements of each latent factor, they were defined as tactics. The tactics affecting the play started from Cluster 1 to score were found to be “side attack”, “Breakthrough a wing by dribbling” and “Final ball to the vital area”. Covariance structure analysis was carried out with reference to this result (Figs. 7 and 8).

Fig. 7.
figure 7

Scree plot of cluster1 (upper-right)

Fig. 8.
figure 8

Covariance structure model of cluster1

Elliptical objects are latent variables which had been defined as “tactics”. Rectangular objects are endogenous variables that explain a latent variable. They are the ones had been recorded during data investigation. Numerical values on the arrows from latent variables to endogenous variables indicate the degree of explanation for each endogenous variable to latent variables. In this covariance structure analysis, “AGFI”, “CFI” and “RMSEA” showed ideal values (Tables 4 and 5).

Table 4. Eigenvalue of cluster1 (upper-left)
Table 5. Factor structure of cluster1 (left)

From the results, it was found that in the play from this cluster, “Side attack using dribbling and pass” has a strong relevance to the scoring. It could be judged that “center from the flank” and “breakthrough the wing using dribbling” being performed frequently.

4.2 Cluster2 (from Right Flank of Own-Field Near Halfway Line)

By factors extraction using Gutman-Kaiser criteria, up to the third factor was adopted as a latent factor. The tactics affecting the score by the attack from Cluster 2 were found to be “side attack”, “Make chance using dribbling” and “Final ball to the vital area”. Covariance structure analysis was carried out with reference to this result (Figs. 9 and 10).

Fig. 9.
figure 9

Scree plot of cluster2 (upper-right)

Fig. 10.
figure 10

Covariance structure model of cluster2

In this covariance structure analysis, “AGFI”, “CFI” showed ideal values, but “RMSEA” showed the fit of the model was a little bad.

According to the result of this cluster, same as cluster1, it was found that “Side attack using dribbling and pass” have a strong relevance to the scoring.

Furthermore, it could see that “Dribbling” or “Through-pass” were frequently used in scoring play from this cluster. Therefore, it seems that there is a tendency of scoring play from the right wing near half-way-line, “attacks targeting spaces between or behind enemies” (Tables 6 and 7).

Table 6. Eigenvalue of cluster2 (upper-left)
Table 7. Factor structure of cluster2 (left)

4.3 Cluster1 (from Left Flank of Own-Field Near Halfway Line)

By factors extraction using Gutman-Kaiser criteria, up to the second factor was adopted as a latent factor. The tactics affecting the score by the attack from Cluster 3 were found to be “side attack” and “Final ball to the vital area”. Covariance structure analysis was carried out with reference to this result.

In this covariance structure analysis, “CFI” showed ideal values, but “AGFI” was a little acceptable in a fit of the model. In addition, “RMSEA” showed the fit of the model was bad. From the observed evaluation index, it can be said that the reliability of this model is not high (Figs. 11 and 12).

Fig. 11.
figure 11

Scree plot of cluster3 (upper-right)

Fig. 12.
figure 12

Covariance structure model of cluster3

According to the result of this cluster, same as other clusters, it was found that “Side attack using pass” has a strong relevance to the score.

On the other hand, a relationship of scoring with “Side attack using dribbling” was not found. The fit of the model is not good in this cluster, but at least it can seem that “attack from left-wing near half-way-line hardly used dribbling” (Tables 8 and 9).

Table 8. Eigenvalue of cluster3 (upper-left)
Table 9. Factor structure of cluster3 (left)

5 Discussion/Summary

In recent years of J League, it was found that attacks from own-field which aimed at vacant spaces (ex. a space between enemy defenses) much easy to score. From this result, we can propose “develop flank-man superior in physical strength/speed” for future player development. Furthermore, it thought that “tactics for using space or counter-attack” can be proposed in the strategy for getting points from own-field.

In addition, despite the results that “pass” and “dribble” are affecting scores in Cluster 1, it did not show any influence of tactics using “dribble” in Cluster 3. Furthermore, it is considered that the reason why the RMSEA value of Cluster 3 got larger was “the classification sample or play-type (variable) was less than other clusters”. From these interpretations, it supposed that J-League has a problem of “Japanese players have few left-footed players”.

Future analysis tasks are “Analysis of the tendency of score play starting from the enemy-field” and “Analysis of time to score until play start position to goal”. Further-more, tally “the number of left-footed players in J League” which had summarized in a summary, and want to explore the influence on scoring.