1 Introduction

Areas of specialty often require a set of expressions that are tailored to meet the need of a specific genre. As these expressions are not used commonly in everyday communication, for people that are not familiar with the specialty terminology or expressions it can be quite baffling and difficult to understand. An area of particular interest to the authors is the language that is used to describe and express complex emotions and senses. A good example of this can be seen in the description of food and beverages that consist of complex aromas, flavors, and many other characteristics as they usually are expressed using specialist terminology used in a subjective manner. Within this area, the descriptions of wine are notorious for the use of specialist terminology and the expression of commonly used words in an uncommon manner. This is formally known as winespeak, and is used by wine reviews/tasters and also in the descriptions on the back label of wine bottles. To the uninitiated, it might be difficult to understand what a wine with “slightly pungent notes of green tomato or crushed tomato leaf” might be like, as used by Joe Czerwinski in his review of a Villa Maria 2009 Sauvignon Blanc [2].

In this paper, we propose a method for the automatic visualization of wine characteristics form viewpoints based on the sense sentiment analysis of a corpus of wine tasting notes. A subset corpus consisting of wine tasting notes that have been manually classified into four sense sentiment viewpoints will be analyzed to train and evaluate Support Vector Machine (SVM) classifiers for sentiment analysis. By analyzing target wine tasting notes with these classifiers, a score will be predicted from each of the sense sentiment viewpoints. These predicted scores will then be visualized in the form of Radar Charts so that the characteristics of wines may be compared.

2 Related Work

There are many papers on research into the language that is used to describe wines, called winespeak. Some of this research is dedicated to analyzing wine tasting notes from different points of view. Paradis and Eeg-Olofsson [5] examined tasting notes to identify expressions and words that are related to the viewpoints of vision, smell, taste, and touch. 39 typical phrases of these sensory expressions were identified. Caballero [1] focused on how manner-of-motion verbs are used from the point of view of describing a wine’s intensity and persistence, and collected 56 typical sentences that contain such verbs. In this paper, wine sentiment analysis is conducted using the four sensory viewpoints defined by Paradis and Eeg-Olofsson [5]. There is also related research into the visualization of wine tasting notes for linguistic analysis. Kerren et al. [4] visualized wine tasting notes using word trees generated from parts of speech and words. Their system enables the analysis of linguistic patterns within single wine reviews or based on regions and varieties. However the system is highly specialized and not intended for general use. In previous research, we examined the relations of Winespeak expressions and visualized these as mindmaps [3]. In this paper, the language used in tasting notes is automatically analyzed from different sensory viewpoints. The results are then visualized as radar charts so that the sensory sentiment content of the wine tasting note can be conveyed without having an understanding of winespeak.

3 Data Collection

In this paper, we propose that tasting notes can be analyzed to predict the classification of wines from various points of view. The target data for analysis is a corpus that consists of 91,010 wine tasting notes, or 255, 966 sentences, that were collected from the Wine Enthusiast website.Footnote 1 The attributes of each wine, such as: winery, region, and grape variety were collected along with the text of the wine tasting notes. This data was then indexed to construct a special use search engine using GETA.Footnote 2 A subset of the data consisting of 992 sentences from wine tasting notes was randomly selected for use in the training, testing and evaluation of sentiment models. This data subset was manually classified by hand into four different sensory category viewpoints, as defined by Paradis and Eeg-Olofsson [5].

4 Sensory Viewpoint Analysis and Prediction

An overview of the analysis in this paper, which involves training SVM models to predict four sense sentiments for visualization as radar charts, is shown on the left in Fig. 1. Firstly, a data subset of 992 manually classified sentences was vectorized, with each feature vector consisting of the words contained within a wine tasting note. The feature weights were normalized at the feature vector for each sentence to ensure that the number of features does not have an influence on model training. An SVM classifier for each sense was initially trained using all the data in the subset. The weights from these models was then extracted and used to score feature words for feature selection. An example of the top 10 positive and negative score feature words for the sense smell are shown in Table 1.

Fig. 1.
figure 1figure 1

Left: an overview of the automatic prediction and visualization of wines from multiple viewpoints; right: prediction performance of feature selection using the top N absolute value score words for the touch sense.

Table 1. Top 10 positive and negative score words for the smell sense

The words are ranked by the absolute value of the weight score, with the top N ranked words selected for training and testing. For each set of N top words, 5 SVM classifiers were trained and tested using 5-fold cross validation with a training/test data ratio of 4:1. Evaluation of the prediction performance for increasingly larger N was calculated, which can be seen on the right in Fig. 1 for the smell sense. The N with the greatest average prediction performance from the 5 SVM models by F-measure is selected as the optimum model. For the smell sense, the baseline prediction performance is an F-measure of 0.59 for a model created by analyzing all feature words. Prediction performance peeked at an F-measure of 0.63 for a model created by analyzing 500 of the top ranking words. This indicates that the top 500 words are representative features for the smell viewpoint.

The feature selection process was applied to all four sense sentiment models. The optimal N for each of the sensory viewpoint model, the evaluation of the model, and the baseline F-measure are shown in Table 2. Models trained on optimal feature selection are used to predict the sense sentiment of wine tasting note visualization.

Table 2. Feature selection: optimal N and evaluation for each of the sensory viewpoints

5 Visualization of Sensory Sentiment as Radar Charts

By reading the descriptions on a wine bottle, or a tasting note for a single wine, we might be able to roughly understand some of the wines characteristics without having a mastery of winespeak. However, it is much harder to grasp the characteristics of a wine region without reading about all the different wines produced. Sensory sentiment analysis from different viewpoints can provide an overview of the characteristics of a wine region, and then be plotted as a Radar Chart for easy comparison with other regions.

5.1 Model Normalization for Characteristic Prediction and Visualization

If a feature vector of a wine tasting note contains many feature words, then the sum of the predicted scores of these feature words would be greater than the sum of the predicted scores of a wine tasting note that only contains a subset of the same feature words. Also, because each of the SVM classifiers for each sensory viewpoint were trained by 5-fold cross validation, the feature weights and therefore the prediction score range is different for each model. As the size of the feature vector and the SVM models that classify the sensory sentiments of wine tasting notes can influence the final score given, both the feature vector and SVM model prediction scores need to be normalized before visualization of the results.

When vectorizing the tasting notes of a region, the weight of each word in the feature vector was determined by Eq. 2,

$$ weight(w_{i} ) = \frac{{DF(w_{i} )}}{{\sqrt {\mathop \sum \nolimits_{{w_{j} \in W}} DF(w_{j} )^{2} } }} $$
(1)

where \( DF(w_{i} ) \) is the document frequency of the word \( w_{i} \) from the search query. This normalization ensures that a feature vector with many terms is not of greater weight that a feature vector that contains only a few terms. Thus, the number of terms does not influence the analysis of the characteristic features.

Also the prediction score from each SVM classifier can be over a different range, and therefore the prediction score needs to be normalized so that a fare comparison can be made. Equation 2 was used to normalize the prediction scores for each feature vector from each SVM model,

$$ norm(v_{i} ,m_{j} ) = \frac{{score(v_{i} ,m_{j} ) - min(m_{j} )}}{{max(m_{j} ) - min(m_{j} )}} $$
(2)

where \( score(v_{i} ,m_{j} ) \) is the predicted score for the feature vector \( v_{i} \) from the SVM model \( m_{j} \), and the maximum and minimum model feature weights are represented by \( max(m_{j} ) \) and \( min(m_{j} ) \) for the model \( m_{j} . \)

5.2 Visualization of Sensory Sentiment by Region

In the data that was collected for analysis there are 4,675 regions, including major and sub-region combinations. The characteristics were calculated based on the wine tasting notes for each region as an example of sensory sentiment analysis. The chart with the largest summed score was from the Pelješac region in Croatia, as seen in Fig. 2 on the left.

Fig. 2.
figure 2figure 2

Example radar charts: regions with the smallest and largest graph area are Pelješac (left) and Primorska (right) respectively.

The chart shows three large positive scores, with the vision sense not scoring highly. The region with the smallest summed score was from the Primorska region in Slovenia. The chart for this region shows that few sense descriptive feature words were used in the wine tasting notes, with only a slight emphasis on the taste and touch senses.

Extreme sense sentiments can be seen in the charts of Fig. 3. These charts represent the highest score values for each of the four sense sentiment viewpoints. The chart for the Sonoma County, Santa Barbara County region in the USA shows a large number of features representing taste were used to describe the wines. This could suggest that the wines from that region have more taste qualities than smell, vision, and touch. In the chart for Alto Adige Valle Isarco region in Italy is the highest scoring region for the vision sense, but it would seem that negative scoring features for the vision sense are also prominent in the wine tasting notes. This would explain why the score is less than seen in other charts. The chart for the Ioannina in Greece scores highly on the smell sense viewpoint. This suggests that aromas play an important point in the description of the wines from that region. Lastly the chart for the Barossa Valley, Clare Valley region in Australia scores highly in sense sentiment for touch, suggesting feature words to do with the texture of the wine are often used.

Fig. 3.
figure 3figure 3

Example radar charts: graphs with strongest sentiment for each sense are Sonoma county, Santa Barbara county for taste (top right), Alto Adige valley Isarco for vision (top right), Ioannina for smell (bottom left), and Barossa valley, Clare valley for touch (bottom right).

6 Conclusion and Future Work

In this paper, we analyzed 992 manually classified sentences of wine tasting notes to create SVM models from four sense sentiment viewpoints: vision, smell, taste, and touch. The models were evaluated and a search was performed to find the optimal feature selection for each model. The optimal models were then used to analyze the four sense sentiment viewpoints of 4,675 regions in a corpus consisting of 91,010 wine tasting notes. The results of the analysis were then normalized for fare comparison between models and sense sentiment viewpoints. Six examples of visualizations by Radar Chart were given representing the largest, smallest, and strongest sentiment for all four of the sense viewpoints.

In future work, we plan to investigate what viewpoints are suitable for understanding and comparing the characteristics of wines from tasting notes. Also, we plan to undertake a formal evaluation of the effectiveness of the visualization.