Keywords

1 Introduction

In recent years, EC (Electronic Commerce) market developed rapidly because of the spread of the internet [1]. The size of the B to C - EC market in 2017 is 16,555.4 billion yen (Table 1). Compare with the other sector, the elongation percentage of the service sector is higher. Focusing on the service contents provided by the EC site, the use contributed review service has been introduced at many EC markets. In this service, the user posts review based on his/ her own experience after purchasing the products. This is commonly called “a word of mouth (WOM). In internet ere, many of E-WOMs are posted to many EC-sites, information site or SNS. These posted reviews are used to make decisions for consumers who are considering purchasing [2]. Consumers who are considering purchasing can obtain useful information without using products or shops by looking at the reviews. A review has a strong influence on consumer’s decision making. In addition, it is said that user review is information based on the user’s experience and opinions, it is considered that the features of the review target can be grasped.

Table 1. The market size of B to C - EC in Japan

In this study, we focus on a golf portal site in Japan and we clarify the features of golf courses from user review. We attempt to can clarify the characteristics of golf courses on the consumer’s point of view by using user reviews. Moreover, we consider the difference of evaluation to the golf course from the difference of player golf skill. This is because the contents of the review differed depending on the user’s golf skill.

2 Data Summary

In this study, we used data provided by the Japanese golf portal site. Specifically, we targeted on customer data, golf course attribute data and review data for golf courses.

Summary of these data is shown below.

  • Customer data: Customer data contains information on customers’ sex, age, golf score, income etc.

  • Golf Course Attribute Data: Golf course attribute data contains information such as the postal code, location of golf course, price range presence of professional tour etc.

  • Review Data: Review Data is information on reviews about the golf courses. It contains that contents of a review (text), golf score of customers who posted review, golfer type of customers who posted review etc. The data period is three years from August 2015 to August 2018.

3 Materials and Method

3.1 The Purposes of Analysis

We grasp the characteristics of the golf course from customer data and review data and clarify the evaluation of the golf course due to the difference of the customer’s golf skill. Figure 1 shows the outline of our analysis. First, we create dataset from the provided data. Second, in order to classify the golf course, we performed hierarchical cluster analysis. Third, we performed natural language processing in order to grasp the characteristics of the review posted on the golf courses belonging to each cluster. Moreover, we also focus on the level of reviewer. Finally, we clarify the characteristics of the golf course from the review and propose measures.

Fig. 1.
figure 1

Outline of our analysis

3.2 Dataset

First, we extracted customers who have customer ID from customer data. Additionally, we selected customers who have written a review of the golf course more than 10 times in the data period. From the result, we extracted 3,563 customers’ data and 1918 golf course data used by customers who wrote reviews more than 10 times as the data set.

3.3 Classification of Golf Courses Using Hierarchical Cluster Analysis

Next, we performed hierarchical cluster analysis using the golf course attribute data. This aims to gather golf courses with similar attributes. We think that it is possible to extract features common to multiple golf courses by gathering golf course.

Here, we used the 6 variables (caddy is adjoined or not, the minimum price of course uses, the maximum price of course uses, number of practice place, host a large-scale golf tour or not, course land price) to perform hierarchical cluster analysis. We used Manhattan distance as the distance between the data, and we used Ward method as a distance between the clusters. The results are shown in the Fig. 2.

Fig. 2.
figure 2

Result of hierarchical cluster analysis using golf course attribute data

From the result, we divided all golf course into four clusters.

Summary statistics of each cluster are shown in Table 2.

Table 2. Summary statistics for each cluster

From Table 2, we named cluster 1 as “General public course”, cluster 2 as “Large high price course”, cluster 3 as “Practice course”, cluster 4 as “Professional course”.

3.4 Analysis of Customer Review by Natural Language Processing

Next, we performed natural language processing in order to grasp the characteristics of the review posted on the golf courses belonging to each cluster. Natural language processing is used for analysis of text data, and many types of research are targeted on reviews on the EC site [3].

First, we summarized. the review data on golf courses belonging to each cluster as one document. We performed morphological analysis for each document. We use KH Coder for analysis. KH Coder is a free software for quantitative content analysis or text mining [4].

In this study, we extracted nouns, verbs, adjectives, proper nouns, place names, organization names and part-of-speech of proper nouns. From the result, nouns, verbs and adjectives appeared frequently. The frequency of appearance of part-of-speech (nouns, verbs and adjectives) in each cluster is summarized in Table 3.

Table 3. Summary statistics of part of speech

Next, using the results of morphological analysis for each cluster, we extracted feature words by the TF-IDF method. TF-IDF method was adopted by the following formulas (1) to (3). The TF-IDF method calculates the importance by weighting the occurrence of all documents and the number of times of non-appearance in other documents. The accuracy of feature words is raised by weighting the documents in two viewpoints [5].

$$ TF - IDF_{i,j} = tf_{i,j} \times idf_{i} $$
(1)
$$ tf_{i,j} = \frac{{n_{i,j} }}{{\mathop \sum \nolimits_{S} n_{S,j} }} $$
(2)
$$ idf_{i} = { \log }\frac{\left| D \right|}{{\left| {\left\{ {d:d \in t_{i} } \right\}} \right|}} $$
(3)

Here, \( n_{i,j} \) is the number of appear frequency about word \( i \) in the sentence \( j \). \( \sum\nolimits_{S} {n_{S,j} } \) is the number of appear frequency of all words in the sentence j, \( \left| D \right| \) is the total number of all sentences \( \left| {\left\{ {d:d \in t_{i} } \right\}} \right| \) is the number of sentences containing word \( i \).

Table 4 shows words with the highest TF-IDF value in each document (cluster).

Table 4. The highest TF-IDF value in each document (cluster).

We try to evaluate characteristic of the user review for each cluster using the feature with the high TF-IDF value in each cluster.

Cluster 1 is characterized by the words “Play,” “Revenge,” “Score,” “Tricky” which appeared higher rank of Table 4. These feature wards are words related to golf skills and play situation, and it can be inferred that these points have been evaluated by the user.

Cluster 2 is characterized by the words,” “Caddy” and “High” appeared higher rank of Table 4. These feature words are words related to the golf service, and it can be inferred that this point has been evaluated by the user. Also, feature words such as attacking, raging, happy appeared higher rank of Table 4. This is a word concerning emotion, and it can be inferred that the difficulty level of the golf course is the subject of evaluation.

Cluster 3 is characterized by the words,” “Resort,” “Old” and “Cost performance” appeared higher rank of Table 4. The content of the course location and the contents concerning the price are subject to the evaluation.

Cluster 4 is characterized by the words “Lunch,” “Atmosphere” and “Girl” appeared higher rank of Table 4. It can be inferred that the points of golf facilities and the situation that have been evaluated by user.

3.5 Analysis of Customer Review by the Difference of Golfer Skill Using Natural Language Processing

Next, we focus on differences in review contents due to golf skills. First, we classified the customers into three golf skills, and we performed natural language processing to each skill of each cluster. We used the value of the golf score of customer data for classification of golf skills. Specifically, we classified the customers for the three ranks, less than 92 (expert players), 93 to 100 (intermediate players) and 101 to 131 (beginners). As for the extraction of characteristic words, we used the TF-IDF method in the same way as above. The feature words of these three ranks of each cluster are as follows (Tables 5, 6, 7 and 8).

Table 5. The highest TF-IDF value in cluster1 by difference of golfer skill rank (top 18).
Table 6. The highest TF-IDF value in cluster2 by difference of golfer rank (top 18).
Table 7. The highest TF-IDF value in cluster3 by difference of golfer rank (top 18)
Table 8. The highest TF-IDF value in cluster4 by difference of golfer rank (top 18).

Since Cluster 1 is a general public course, it is understood that all skills contain words related to the quality of golf facilities and services. Also, the words related to play manners and quality of green appears in each skill. Focus on the feature words of experts of cluster 1, “Teahouse,” “Speed,” “Small” and “Complaints” are high on the list (Table 5). From these words, it can be inferred that experts emphasize the green speed, the size of the green, and the manner of players. On the other hand, “Easy to play,” “Best score,” “Golfer,” “Drink,” and “Beverage” are high on the feature words of intermediate of cluster 1 (Table 5). From these words, it can be inferred that intermediate emphasize ease of giving the best score, easy to around the course, good manners for golfers and services such as drink offering. Focus on the feature wards of beginners are “Old,” “Player,” “Crowded,” “Women,” “Be attentive,” “Clubhouse” and “Eat” are high on the list (Table 6). It can be that beginners emphasis the state of the facility, whether maintenance of the course and facilities and meal quality is adequate.

Since Cluster 2 is a high price course, it is understood that the customers belonging to Cluster 2 are high in price as each skill, but prestige is important factor. Focus on feature wards of experts in cluster 2, “Professional,” “Tournament,” “Rich,” “Prestigious” and “Speed” are higher rank on the list (Table 6). From these wards, it can be inferred that experts are emphasize prestigious golf courses like to use professionals, the course of strategy and green speed. Then focus on feature wards of Intermediate, “Prestige,” “Environment,” “Moderate,” “Best”, “Player” and other words are distinctive. Form these wards, it can be inferred that intermediate are emphasizes prestigious courses, the level of practice enrichment, and whether it can be played at an affordable price. Focus on feature wards of beginners, “Price,” “Crowded” and “Maintenance”. Form these wards, it can be inferred that beginners are emphasizes easy to rotate without getting crowded, we place importance on maintenance at a low price.

Cluster 3 is a course for practice. From the review of Cluster 3, the word “Maintenance” commonly appears in each skill. So that customers belonging to Cluster 3 have a high degree of importance of course maintenance. In addition, middle - level and higher level seek strategy and interest for the course. Focus on feature wards of experts in Cluster 3, “Beginners,” “Maintenance,” “Manner,” “Strategy,” “Interesting,” “Fee” and “Fast” are high on the list (Table 7). From these wards, it can be inferred that expert are emphasize the elements of the strategy and fun of the course. Experts also focus on maintenance, green speed and play manners. Focus on feature wards of intermediate, “Maintenance,” “Beginner,” “Interesting,” “Fast,” and “Fee” are high on the list (Table 7). The speed of green and the state of maintenance, beginners can also use it, and emphasize an interesting course. Focus on feature wards of beginners, “Courses,” “Fairways,” “Maintenance,” “Price,” and “Difficulty” are high the list (Table 7). From these wards, it can be inferred that beginners are emphasize Course status, difficulty and price.

Cluster 4 is a course for professionals, customers belonging to Cluster 4 are middle-ranked and over, and the word “player” commonly appears. From that, it turns out that the player’s manner is high importance in the professional course. Focus on feature wards experts in Cluster 4, “Speed,” “Player,” “Prestige,” “Read,” and “To hit” are high on the list (Table 8). From these wards, it can be that experts emphasize prestigious golf courses and green speed. Technical aspects such as reading the characteristics of the course are taken into consideration. Next, focus on feature wards intermediate in cluster 4, “player”, “environment” is characteristic. From these wards, it can be that Intermediate worried the course can’t be turned by too much customer packing. Moreover, Intermediate emphasizes convenient and the practice environment and play manners. Focus on feature wards beginners in cluster4, “Cup,” “Plan,” “Price,” “Latency” and “Women” are high on the list (Table 8). Form these wards, it can be that beginners emphasis the location of the hall cup and the availability of the utilization plan.

In the experts in all clusters, the word “Speed,” “Maintenance,” “Strategy,” “Play,” and “Manners” is distinctive. For expert users, the importance of the maintenance concerning the green speed and the strategy of the course itself is high. Moreover, we are also careful about the play manners of users, and advanced players have high standards for play environment. Intermediate players in all clusters are commonly characterized by the word “Player”. The importance of the play manners is high in all intermediate players. For beginners in all clusters, words related to “Maintenance” and “Price” are distinctive. Beginners have less requests to the golf course in common, and the importance of price and maintenance is high.

4 Discussion

Finally, we consider feature wards of each cluster focusing on golf skill.

Cluster 1 is a course for the general public, and words concerning the quality of golf facilities and services appears in the review of each skill. It is necessary to carefully observe reviews on golf facilities and services. Improvement is necessary if negative evaluations are seen for reviews on golf facilities and services. Also, it is necessary to be careful about reviews on green maintenance and play manners.

Cluster 2 is a high price course. Customers emphasize prestigious courses, strategy of course and green speed. Improvement is necessary if there are negative contents about the strategy of the course and the green speed. Also, words related to practice environment and crowdedness are also high importance, so it is necessary to carefully observe reviews related to those words.

Cluster 3 is a course for practice. Customers emphasize changing of green speed according to green maintenance. Therefore, for customers belonging to Cluster 3, it is necessary to be careful about reviews on green speed and maintenance. If a negative review is seen, course surface maintenance is required.

Cluster 4 is a course for professionals, and intermediate players consider player manners. If there is negative content, it is necessary to create a golf course that strictly manages players’ manners. In addition, it is necessary to increase the difficulty level of the course to suit the professional, to set restrictions on customers’ entrance and make the course more comfortable.

5 Conclusion

In this study, we used the review on the golf course of the golf portal site and clarified the features of the golf course. We also grasped what kind of difference is in the evaluation of the golf course from the difference of the golf skill. Through this study, we clarified the characteristics of the golf course and the evaluation of the golf course due to the difference in golf skills. In addition, this method can be applied not only to golf courses but also to various reviews, and features can be grasped.

In this study, we divided customer reviews with one index called golf skill. In the future works, we are planning to classify customers by using demographic attributes such as age, income, residential area and so forth. Moreover, we will evaluate each customer review based on differences in demographic attributes and clarify what features are available.