Keywords

1 Introduction

Analogies describe comparative relationships between two sets of concepts. According to the Stanford Encyclopedia of Philosophy, “An analogy is a comparison between two objects, or systems of objects, that highlights respects in which they are thought to be similar” (Bartha 2013), e.g. “Life is like a box of chocolates, you never know what you’re gonna get”. Here, an analogy is made between the uncertainty of what happens in life and the flavor of chocolates one may get by randomly picking one from a box. Scientific analogies often compare one system to another, such as Rutherford’s atomic model which compares the atomic system to the solar system. Making analogies is a common and important rhetorical device. Analogies can not only help explain new concepts, but also make the sentences more interesting and creative.

With the rapid development of computational technologies in recent years, creating conversational agents that can communicate with people in task orientated dialogue or small talk has received increasing interest in recent years. Intelligent conversational agents, such as Alexia, Google assistant, and Siri have become a more and more integral part of people’s everyday lives. Enabling conversational agents to use analogies in their speech will enhance their ability to communicate or develop a social relationship with the users.

1.1 Cognitive Theories on Analogy-Making

Many cognitive theories have been proposed for explaining how people form analogies, such as LISA (Kubose et al. 2002), CAB (Larkey and Love 2003), Structure-Mapping Theory (SMT) (Gentner 1983; Gentner and Smith 2012), and (Winston 1980; Kline 1983; Kedar-Cabelli 1985; Greiner 1988; Holyoak and Thagard 1989; O’Donoghue and Keane 2012; Grootswagers 2013). Common to most existing work, the central idea behind analogy-making involves mapping hierarchical relationship structures among the concepts in two different domains. For example, according to the Structure-Mapping Theory (SMT), analogical mapping is created by establishing a structural alignment of the relationships between two sets of concepts. The closer the structural match is, the more optimal the inferred analogy will be. The Structure-Mapping Engine (SME) is a computational system that implements SMT (Falkenhainer et al. 1989). For producing the analogy between the solar system and the Rutherford model, SME can computationally determine that maximum structural mapping happens when the Sun is mapped to the nucleus and the planets are mapped to the electrons. This mapping is resulted from the structural mapping of the relationships among these concepts. In the solar system, the Sun attracts the planets and has a greater mass than the planets; as a result, the planets revolve around the Sun, i.e. the attract and the greater-than relationships together cause the revolve relationship. Furthermore, the attract relationship is caused by both the Sun and the planets have mass and therefore have gravity. The same relationship structure exists for explaining how the nucleus attracts the electrons and makes the electrons revolve around it. Figure 1 shows SMT’s models of the solar system and the Rutherford model. The relationship structures are also often expressed in predicate calculus.

Fig. 1.
figure 1

Relationship structures of the solar system and the Rutherford model.

1.2 Create Analogies Without Hierarchically Structured Data

A major challenge of adapting existing theories on analogy-making to generate dialogue for conversational agents is acquiring appropriate input data. In most cases, existing theories rely on the input data containing structural information about concept relationships, such as the relationship structures shown in Fig. 1. Automatically gathering data with such hierarchal structure is almost impossible. Manually created input data tend to be small scaled which will significantly limit the amount and variety of analogies a dialogue agent can make.

In this work, we explore automatically creating analogies using contents from knowledge graphs. Semantic web such as DBpedia (Bizer et al. 2009) and Wikidata (Erxleben et al. 2014) contain a massive amount of structured data, which can be used to construct knowledge graphs easily. In knowledge graphs, concepts are connected by links which represent the relationships among the concepts. Knowledge graphs, therefore, intuitively provide a good basis for automatically generating analogies. On the other hand, the concept-relationship structure is flat in knowledge graphs. There is no hierarchical relationship structure, and therefore we cannot directly use content from knowledge graphs to infer relationship structural mappings for analogy-making. Figure 2 provides an example knowledge graph crawled from Wikipedia. Sun was used as the seed node, and we only included concepts that are within two steps from the Sun. We used the Seealsology tool (Seealsology) for generating this graph. For generating analogies in this work, we wrote our own crawler for gathering data which will be explained in Sect. 2.

Fig. 2.
figure 2

Sun and related concepts in Wikipedia.

In our previous work (Si and Carlson 2017), we explored automatically generating analogies using data from DBpedia. Our approach was inspired by the Structural Mapping Theory. The algorithm strives to find analogous pairs of concept groups and the analogies are composed of a pair of mapping concepts and a set of supporting evidences. For example, Punk Rock is analogous to LPC (a programming language) because “the stylistic origin of Punk Rock is Garage Rock, Glam Rock, and Surf Music, just like LPC is influenced by Lisp, Perl, and C,” and “Punk Rock is a music fusion genre of Celtic Punk, just like LPC influences Pike.” Here, the analogy between Punk Rock and LPC is supported by mapping the “stylistic origin” of a music genre to the “influenced by” relationship among programming languages, and the “fusion genre” relationship among music genres to the “influence” relationship among programming languages.

An important step in the algorithm is inferring pairs of analogous relationships. The algorithm computes how analogous two relationships are to each other based on the structural similarity of their adjacent concepts and relationships. For example, if on average, the concepts where relationship r is linked from are always associated with more relationships than the concepts where r is targeting at, then r is more similar to other relationships which also have this pattern than to those that have a different pattern, e.g. the concepts where the relationship is linked from are always associated with the same relationships as the concepts where the relationship is targeting at. In our algorithm, we computed four sets of relationship differences between the linked-from concept and the targeting concept:

  1. 1.

    Gain – what relationships are associated with the targeting concept but not the linked-from concept;

  2. 2.

    Loss – what relationships are associated with the linked-from concept but not the targeting concept;

  3. 3.

    Same – what relationships are associated with both the targeting concept and the linked-from concept;

  4. 4.

    Diff – the combination of the gain and the loss sets.

Section 2 provides the details of the algorithm for computing the structural similarity between two relationships. After we have obtained the similarities between each pair of relationships in a domain or two different domains if we septate the source and the target domains, the relationships are then used to construct analogies between concepts. If two concepts have many relationships that are analogous/similar to each other, the two concepts are regarded as being analogous.

1.3 Evaluations of the Generated Analogies

Though the process of computing how analogous two relationships are to each other leverages on the idea of computing structural similarity, the results produced by (Si and Carlson 2017) are different from results produced by SME or other theories that infer analogies purely based on structural similarities. Using SME, a relationship is mapped to another, e.g., the revolving relationship in a planet revolves around the Sun and the revolving relationship in an electron revolves around the nucleus because of the structural alignment between the two groups of concepts and does not have anything to do with what the relationships are. Two relationships both named involving does not make them more analogous to each other than two relationships with different names.

In our previous work as well as in this work, we aim at creating analogies where the relationship mapping itself is analogous. Using the analogy example between programming languages and music genres, we believe Punk Rock is analogous to LPC exactly because “stylistic origin” is analogous to “influenced by” and “fusion genre” is analogous to “influence.” When interpreting analogies created by our system, people’s perception of how much two concepts seem analogous to each other are dependent on how much they think each of the related relationship pairs are analogous to each other. Here, “influence” and “sub-genre” are mapping relationships. They are not synonyms but have similar meanings in their respective domains. We can similarly say Python influenced many other programming languages just like Jazz has many sub-genres. Without the supporting evidences, Python and Jazz are largely not related. However, the influenced relationship and the sub-genre relationship may still read analogous to each other. Further, unlikely presenting the analogy itself where the supporting evidences must be included, the analogous relationships can be presented alone, and without additional supporting information. This property makes it very convenient to use analogous relationship in dialogue generation.

In this work, we study how people’s impressions of the analogies and in particular the analogous relationships are affected by how the analogies are presented. (Kubose et al. 2002) showed that when multiple propositions can jointly provide a stronger structural support for an analogy and are presented together, people understand the mappings in the analogy more accurately than when the propositions are presented individually. Our hypothesis is inspired by this finding. We hypothesized that the analogy between a pair of relationships will be perceived as stronger when it is presented in a group of analogous relationships for making an analogy between two concepts than when it is presented alone. In addition, we want to find out whether people can differentiate the mapping between two concepts or two relationships is analogous or is creative.

2 Make Analogies Using Knowledge Graphs

2.1 Information from Wikidata

In this work, we explored using contents from Wikidata for generating analogies. Knowledge graphs such as DBpedia or Wikidata contain huge sets of connected concepts. In (Si and Carlson 2017), we used information from DBpedia as the base for generating analogies. In this work, we want to explore using information from Wiki-data. The main benefit of switching to Wikidata is that the relationships in Wikidata are all uniquely identifiable. This eliminates the need for dealing with relationships with similar names such as “influence” and “influences.” It turned out to be a challenging task to decides what relationships can/should be merged when using data from DBpedia. For getting information from Wikidata, we wrote a web crawler using Python, which stores concepts and their relationships in a network structure that can be directly used by the analogy building algorithm.

2.2 Find Analogous Relationship and Concepts

When computing mapped relationships, we used the topological similarity between the groups of relationships related to the source and target concepts as a multi-dimensional embedding for each relationship. Algorithm 1 is taken from (Si and Carlson 2017). It is the main algorithm for computing a unique index for each relationship in a domain. As mentioned in Sect. 1.2, for each pair of concepts connected by a given relationship r, we computer four sets of relationship differences between the linked-from concept and the targeting concept: gain, loss, same and diff. These four sets are aggregated over all the concept pairs connected by the relationship. We then compute the Jaccard index between each pair of the sets and generate an embedding for the relationship with six dimensions. These embeddings are used to compute the similarity between two relationships. Because there are no concept or relationship names in the embedding, the relationships from different domains can be compared with each other, and thus enable us to generate analogies between concepts from different domains. The details of the analogy-making algorithms can be found in (Si and Carlson 2017).

Another difference between this work and (Si and Carlson 2017) is that we are using much larger domains now. In our previous work, the sizes of domains are ranging between having hundreds of to thousands of concepts. In this work, we expanded the domains to have more than 20k concepts. As a result, the aggregated gain, loss, same and diff sets may become very large. We capped the sizes of these sets to 5000. When the sets grow beyond the limit, we down sample the set by randomly deleting items in it.

figure a

2.3 Example Outputs

The example output from our system consists of a pair of mapping concepts and a number of supporting evidence, which are the mapping relationships and their targeting concepts. An example analogy created by our system between The Source (a famous painting) and OS/2 (an operating system) is provided below:

  • The Source → OS/2

  • instance of painting → instance of operating system

  • country France → language of work or name English

  • location Musée d’Orsay → platform x86

  • genre figure painting → programming language C

This example reads like “The Source is analogous to OS/2. This is because The Source is an instance of paining just like OS/2 is an instance of operating system. The Source’s country is France just like OS/2’s language of work is English, etc. Another example is provided below. An analogy is made between the state Vermont and (2970) Pestalozzi – an asteroid.

  • Vermont → (2970) Pestalozzi

  • instance of state of the United States → instance of asteroid

  • located in time zone Eastern Time Zone → parent astronomical body Sun

  • country United States of America → minor planet group asteroid belt

  • head of government Peter Shumlin → discoverer or inventor Paul Wild (2970) Pestalozzi

We created these analogies by forcing the system to look for analogous concepts from different domains, i.e., the two concepts are not the same type of instance. Both of these two examples were used in the evaluation study described in Sect. 3.

3 Experiment Design and Materials

In this study, we want to evaluate how the presentations of the analogies affect people’s impressions of them. In particular, we hypothesized that the analogy between a pair of relationships will be perceived as stronger when it is presented in a group of relationship pairs for making an analogy between two concepts than when it is presented alone. In addition, we want to find out whether people can differentiate the mapping between two concepts or two relationships is analogous or is creative.

3.1 Experiment Design

For evaluating the hypothesis, we designed a between group study with three conditions:

  1. 1.

    Full: the full analogy is presented with both the concept pairs and the supporting evidences which are composed of corresponding relationships and their targeting concepts, such as the analogy example given in Sect. 2.3.

  2. 2.

    R+D: only the relationship and targeting concept pairs are presented, e.g., country France → language of work or name English.

  3. 3.

    R: only the relationship pairs are presented, e.g. country → language of work.

The study was conducted on Amazon’s mTurk. We recruited 50 subjects for each condition.

3.2 Materials and Procedure

The experiment material consists of 16 analogies generated by the system. Just like the examples provided in Sect. 2.3, each analogy contains four pieces of supporting evidence. The first one is always the mapping between each concept’s instance-of relationship. Except for the instance-of relationship, the rest of the mapping relationships do not have the same name. In general, our algorithm is capable of generating analogies with more supporting evidence, and it is not necessary for the mapping relationships to have different names. We enforced these rules when generating the analogy examples for this study.

During the study, the subjects need to read the analogies with their supporting evidences if provided, and rate how analogous and how creative each item is. If the full analogy is presented, the subject needs to rate each supporting evidence first before rating the analogy between the two concepts. Each question is given a Likert scale of 1 to 7 (1 = Strongly Disagree, 7 = Strongly Agree). In the R+D and R conditions, the analogy between the two instance-of relationships is excluded from being rated. In addition, in the R condition, if the same relationship mapping appears in multiple analogies, we only asked the subjects to rate it once.

The overall time for the study is less than 5 min for most subjects.

4 Results and Discussion

Figures 3 and 4 compares the ratings of the same relationship mapping when being presented within a full analogy (the Full condition), separately with the targeting concept (the R+D condition) and alone (the R condition). The x-axis shows the question IDs. The first digit of the ID indicates the index of analogy example the question belongs to. Each analogy contains 10 questions. The first 8 questions are about how analogous and creative each supporting evidence is, with odd-numbered questions asking about how analogous an item is, and the even-numbered questions asking about how creative an item is. The 9th and the 10th questions are about the concept mapping itself. Even though in the R+D and R conditions subjects do not see the mapping concepts, we keep the same naming convention. So, question ID 13 means it is the 3rd questions about Analogy example 1. Whether it is about the relationship and targeting concepts or just the relationship alone is dependent on which experimental group the subject belongs to. Questions 1 and 2 are about the instance-of relationship. Because both the R+D and R conditions do not rate this relationship, the answers to these questions are excluded from the plot. Similarly, the answers to the 9th and the 10th questions are excluded. We also excluded a few questions where we didn’t collect enough data because of technique problems.

Fig. 3.
figure 3

Analogous ratings. The x-axis shows question IDs. (Color figure online)

Fig. 4.
figure 4

Creative ratings. The x-axis shows question IDs.

From Fig. 3, we can see that the analogous ratings from the R+D group (blue bars) are usually the highest, and the ratings from the R group and the Full group are compatible. The polynomial trend lines indicate there is a constant decrease in the subjects’ ratings in the Full group. This trend is not apparent for the other two groups. We suspect the trend in the Full group is caused by the fatigue factor. The subjects in this group need to rate more items than the subjects from the other two groups.

In contrast, for ratings about how creative the items are, we cannot see clear differences among the three groups. There is a similar trend that the Full group’s ratings gradually decrease over time.

Table 1 shows the means and standard deviations of the analogous ratings from each group. We report both the statistics for the entire sequence, and for the first half, i.e., the first 7 questions for ruling out the potential impact of the participants’ fatigue factor. We performed two-tailed paired sample T-tests between the ratings from different. As we can see in Table 1, the R group gave the highest ratings in general. The ratings from the R group and the Full group are significantly higher than the ratings from the R+D group. The R group and the Full group gave similar ratings when only consider the first half of the questionnaire.

Table 1. Means, standard deviations, and T-tests for analogous ratings.

Table 2 shows the means and standard deviations of the creative ratings from each group. We can see that the creative ratings from different groups are more similar to each other compared to the analogous ratings. Looking at the first half of the questionnaire, only the ratings from the R group are higher than the ratings from the R+D group at the .05 level.

Table 2. Means, standard deviations, and T-tests for creative ratings.

The results from Tables 1 and 2 suggest that the participants rated how analogous and how creative a mapping is in different ways. Based on the results in Fig. 4 and Table 4, we suspect that people could not meaningfully rate the creativity of the mappings generated from our algorithm. However, people can tell how analogous they are. Our hypothesis is confirmed that a relationship mapping is rated as being more analogous when it is presented within an analogy than when it is presented alone with an example of its targeting concept. We consider the R+D condition as providing an example of the relationship R’s targeting concept because typically each relationship can point to multiple targeting concepts. A somehow surprising result is the R group received the highest ratings for how analogous the mappings are. This suggests that supplying examples, i.e., the R+D group may not always help people understand the concepts better – in this case supplying a concrete example of the targeting concept, in fact, hurts people’s ability to see the analogous relationship between the two relationships.

The difference between the ratings on being creative and being analogous is further illustrated in Fig. 5. Figure 5 is a heat map of the differences between ratings from different groups. The first line contains the difference between the Full and the R+D group. To obtain this difference, we simply used the ratings from the Full group to minus the corresponding ratings from the R+D group. Similarly, the second line contains the difference between the R and the R+D group. The third line contains the difference between the Full and the R group. The last line is the question’s ID. We only took results from the first half of the questionnaire. The odd column in Fig. 5 are associated with the analogous ratings, and the even columns are associated with the creativity ratings. We can see that there is no clear pattern in the difference between the Full and the R groups. Further, there is no clear pattern in the creative ratings as well. There are green bars in the alternative columns indicating that both the Full and R groups are receiving higher ratings on how analogous the presented items are than the R+D group.

Fig. 5.
figure 5

Heat map comparing the creative and analogous ratings. (Color figure online)

5 Conclusion and Future Work

Making analogies is a common and important rhetorical device. Analogies can not only help explain new concepts, but also make the expressions more interesting and seem more creative. In this work, we explored creating analogies using information crawled from Wikidata. We also conducted an empirical study for investigating how people’s impressions of the analogies are affected by how the analogies are presented. Our results show that both presenting the analogous relationships just by themselves and within an analogy between two concepts work better than showing the relationships along with a targeting concept the relationship can point to. Our study also shows that people’s impressions of how creative the analogy between a pair of relationships is not affected by how they are presented. We also suspect though people can judge how analogous the generated analogies are, it is hard for them to judge how creative the analogies are.

Our future work lies in two main directions. One is improving the algorithm of analogy generation for creating more interesting and sound analogies. In particular, even though data coming from knowledge graphs typically do not contain hierarchical relationships, leveraging on semantic and network analysis tools, we may be able to build a hierarchical relationship structure in an ad hoc fashion and using it to aide our reasoning on analogy building. Related to this goal, we also plan to explore creating analogies that are more similar to those appearing in literature, rather than for explaining scientific concepts. Secondly, we are interested in conducting more experiments like the one presented in this work and study how we can use the automatically generated analogies more effectively in conversations.