1 Introduction

Metaphor is a figure of speech, that shows similarities between one thing and another or between actions. Metaphors are so abundant in any language that their identification and interpretation would benefit the Natural Language Processing (NLP) methods like paraphrasing, summarization, machine translation, language generation etc.

For any metaphor, to be analysed and interpreted, it has to be identified first. Some of the existing computational methods for metaphor detection use heirarchical organisation of conventional metaphors or conventional mappings of subject-verb, verb-object, subject-object or selectional restrictions as provided in lexical resources available, or domain mapping of the word and its context [17].

We tackle the problem of detection of metaphor in a given sentence irrespective of its type, and without using lexical resources like WordNet. For this we come up with a novel method using word embeddings, even when embeddings were not meant for this purpose. The method for detection of metaphor thus described in this paper uses similarity metrics for vectors, with the vector representation of words. With the similarity, thus obtained, as features, Decision Tree Classifier is used to classify whether any metaphor is present in the given sentence. Experiments with the VU Amsterdam Metaphor Corpus show better results compared to strong baselines.

2 Related Works

There have been many works on detection of metaphors computationally, supervised as well as unsupervised. Shutova [17] has a comprehensive review of computational metaphor detection. Following are some of the works that are related to our approach.

To determine whether a sentence contains a metaphor Wilks et al. [21] extracted all verbs along with the subject and direct object arguments for each verb using the Stanford Parser. Extracting the verbs from the sentence, they checked for preference violations with the help of WordNet [6, 14] and VerbNet [16]. If there is a violation, they mark it as ‘Preference Violation metaphor’. They also take into consideration the ‘conventional metaphors’ and determine them by the senses in WordNet. Klebanov et al. [9] used the logistic regression classifier to detect metaphor using unigrams, part of speech, concreteness and topic models as features. Klebanov et al. [8], to improve their previous work, tuned the weight parameter to represent concreteness of information, including the difference of concreteness. Su et al. [20], based on the theory of meaning, presented a metaphor detection technique, considering the difference between the source and target domains in the semantic level rather than the categories of the domains. They extract subject-object pair by a dependency parser, which they refer to as ‘concepts-pair’. Then they compare the cosine similarity of the concepts-pair and from the WordNet find out whether the subject is hypernym or hyponym of the object. When the cosine similarity is below a particular threshold and ‘concept-pair’ does not have a hypernym-hyponym relation, it is categorized as metaphorical, otherwise literal. But they target only nominal metaphors (‘IS-A’ metaphors) aka Type I metaphors [11], whereas our method is general and does not look for any particular type of metaphor.

3 Motivation

Many real-world NLP systems treat words as atomic units because of simplicity, robustness and the observation that simple models trained on huge amounts of data outperform complex systems trained on less data [12]. Motivation behind the proposed approach is that other methods treat words as atomic units but words can have multiple degrees of similarity [13] and many word embeddings acknowledge that fact.

4 Proposed Metaphor Detection Approach

Flow diagram for our approach is shown in Fig. 1.

4.1 Vector Representation of Words

The method proposed in this paper uses vector representation of words already made available to it. We have used the open-source Google Word2VecFootnote 1 system, and for training it, we have used text corpus from the latest English Wikipedia dumpFootnote 2, preprocessed with the Perl script of Matt MahoneyFootnote 3.

For training as well as testing purposes, one might come across words for which embeddings are not provided. For such scenarios, we map those to a constant vector of the same dimension as of the word vectors provided.

Fig. 1.
figure 1

Flow Diagram

4.2 Feature Extraction

Replacing Named Entities: First we normalize the sentences with Normalization Form KD (NFKD) [3]. This is required because in presence of non-ascii characters, the Stanford NLP SoftwaresFootnote 4 sometime produce characters which are originally not there in the input.

We replace the Named Entities because we cannot get the vector representation for many proper nouns, and the chances increase for the unpopular ones. Also the replacement is required for unification of similar proper nouns under the same category. For example, different companies have different names, but unification is required for them to be treated similarly. So we use Stanford Named Entity Recognizer (NER) [7] for that purpose. Once the entities are recognized, the names are replaced by the entities. So “Montenegro’s sudden rehabilitation of Nicholas ’s memory is a popular move” (VU Amsterdam Metaphor Corpus) becomes “LOCATION’s sudden rehabilitation of PERSON ’s memory is a popular move”.

Getting Type Dependencies: We parse the sentences with NER replaced with the Stanford PFCG Lexical Parser [10] to get the parse trees and also the typed dependencies (Stanford Typed Dependencies) [4]. Of all the dependencies identified, we keep a subset of the dependencies along with their types. We decide, the types that we choose to include in our subset, in such a way that they may contain a metaphor. Wilks et al., [21], consider agent, nsubj, xsubj, dobj and nsubjpass as they look for metaphor surrounding a verb. We choose a larger subset. For example, we also consider acomp (adjectival complement) [5], as it may result in metaphors as in ‘he looks green’.

4.3 Training

The system has to be provided with an annotated metaphor corpus, a corpus with sentences having metaphors and some without, marked positive and negative respectively, for training purpose. It gets the cosine similarity of the dependent word pairs, and then distributes the cosine similarities according to the class of the sentence they come from, i.e., the cosine similarities of dependent words of a metaphor containing sentences are put in the positive class and those coming from sentences not containing any metaphor are put in the negative class.

4.4 Classification

The default class of a sentence is negative. Then one by one the cosine similarities are classified with CART [1], which is a Decision Tree Classifier. If atleast one of them is classified to be positive, the sentence is marked positive.

5 Experiments

5.1 Dataset

The VU Amsterdam Metaphor Corpus (VUAMC)Footnote 5 [19] is one of the “largest available corpus hand-annotated for all metaphorical language use, regardless of lexical field or source domain”. It is based on “a systematic and explicit metaphor identification protocol” [18] with inter-annotator reliability of \(\kappa >0.8\).

5.2 Baselines

We compare our method with two baselines, one that does not use word embedding and one that does, which are explained as follows.

Baseline 1 (UPT+CUpDown+DCUpDown model)

As one of our baselines, we use the results from Klebanov et al. [8]. They also report on the VUAMC other than the ‘Essay Data’. We consider the average of VUAMC (VUA in [8]) for comparison. We choose their best reported results achieved by using the method known as UPT+CUpDown+DCUpDown model.

Baseline 2 (CRF (with SF+CF+AF+XF))

As our next baseline, we use the results from Rai et al. [15]. They also report on the VUAMC. For comparison, we choose their best reported results achieved by using CRF with feature set of SF+CF+AF+XF on overall VUAMC dataset across every genre (Dataset2 in [15]).

Baseline 3 (SVM (with word embeddings))

For each of the sentences, after replacing NERs, we get the typed dependencies. For each of the pairs of words, we append word vector of the second word in the (ordered) pair to the vector representation of the first and thus obtain the feature vectors. For a sentence containing metaphor, the feature vectors derived from the (typed) dependent pairs are placed in positive class and those sentences which do not contain metaphor, their feature vectors are placed in negative class. By default, for classification, a sentence is put in a negative class and if atleast one of its feature vectors is classified by Support Vector Classifier [2] to be positive, the sentence is marked positive for metaphor.

5.3 Evaluation

For training and testing purpose, we consider the VU Amsterdam Metaphor Corpus and performed a 10-fold cross validation on it.

We compare our method against the baselines on the basis of precision, recall and F\(_{1}\)-score. For their calculation, sentences containing metaphors are considered to constitute the positive class, irrespective of the number of metaphors in the sentence and sentences not having metaphors constitute the negative class.

Table 1. VU Amsterdam Metaphor Corpus.

6 Results and Discussions

As shown in Table 1, the proposed method outperforms the baselines. Our method surpasses each of the criteria considered for comparison of the methods. For the VU Amsterdam Metaphor Database, Klebanov et al. [8] report an average F\(_{1}\)-score of 0.511 and Rai et al. [15] report F-measure of 0.609. The proposed approach gives an F\(_{1}\)-score of 0.758.

Some of the typed dependencies are ignored so as to speed up the process and decrease the volume of the data to be examined for detection procedure. Considering all of them does not improve the results significantly, but increase the overheads.

Analysing the false positives, we found out that over-fitting of the positive class is due to the presence of common pairs in the typed dependencies of dobj(direct object), nsubj(nominal subject) and the alikes. We observed in our experiments that if we do not consider those dependencies, the F\(_{1}\)-score falls drastically.

Our system gives a larger number of false positives compared to false negatives, which we believe to be the better option. Metaphor interpretation, comes after metaphor recognition. For false negatives, the metaphors will be treated literally and interpreted in ways they were not intended. But for the cases of false positives, we search for the analogies, if any analogy is not found, we can always return to the literal meaning.

7 Conclusion

In this paper we proposed a novel approach for metaphor detection which uses cosine similarity as its main component. We compared our results on a standard dataset and showed superior performance. In future, we intend to use the proposed method in downstream applications like paraphrasing and summarization.