Keywords

1 Introduction

While many theories of humor agree that humor often involved the detection of incongruities and their resolution the details remain vague and there is no agreed upon theoretical framework which describes how these incongruities form and are detected by intelligent agents [12]. This paper demonstrates the use of text visualization for modeling humor in text in process known as visual text mining, a subset of visual data mining [4, 15]. In particular our approach visualizes shifts in meaning assignment over time as jokes are processed, While this does not fully solve the problem of modeling the specific mechanisms underlying humor, visualization and visual text mining gives us one more data centric tool for detecting features associated with various natural language phenomena Furthermore these approaches can be used to model and detect other forms of humor, in particular sequential physical humor, and other phenomena involving shifts of interpretation.

This paper made use of three visualization approaches to model, detect, and classify sequential jokes involving shifts in the interpreted meaning for some ambiguous word. This form of sequential joke has been referred to as a ‘garden path’ joke to differentiate it from other sequential jokes involving incongruity and resolution which do not involve a shift from on interpretation to the next [1]. The three approaches use a correlation based measure to assign meaning of ambiguous words given the context of the ambiguous word in different parts of a surface level text and relations associated with different meanings of that word as defined in an ontology as a deeper level.

The first visualization shows how meaning correlation scores for two or more opposing meanings are plotted as coordinates using an approach known as collocated paired coordinates [2]. This lets us visually see shifts of meaning associated when given a set of jokes when compared with a set of non jokes. The next visualization uses heat maps to color code the differences of meaning correlation scores given different time steps. The heat maps for the set of jokes is distinguishable from that of non jokes with respect to these meaning correlation differences. Finally the third visualization displays in two dimensions an entire model space consisting of boolean vectors describing meaning correlation over time. The set of jokes and non jokes are plotted on this space allowing us to see the boundary between what is a joke and non joke. To show the power of this approach we compare the results with traditional data mining approaches which result in models describing the same key features. This paper describes all three approaches in detail, including the construction of an informal ontology using web mining to identify semantic relations, and shows how these approaches were used to visualize jokes and non jokes to get experimental results. While improvements can be made the results were encouraging.

The visualization strategies used in this paper can be used to model and detect other forms of humor including sequential physical humor in nonverbal settings. Incongruities and their resolution arise in many other situations involving sensors and analysis where classification occurs. We will briefly discuss the use of the three visualization approaches for modeling and detection physical humor and applications from comedy to any situation where incongruities arise given agents and their sensors.

2 Related Work

Both Computational Humor and Text Visualization as fields have seen extensive activity lately but tend to work on separate topics. Computational Humor deals a lot with the modeling and detection of incongruities within text and many attempts have recently been made attempting to detect or generate jokes using computers [6, 7, 10, 13, 16, 17] but no attempt focused on visualization has been made. On the other hand people working on Text Visualization tend to focus on other topics such as identifying the central topic within a text. To our knowledge this paper is the first attempt to visualize incongruities within text.

3 Incongruity Resolution Theory of Humor and Garden Path Jokes

‘Two fish are in a tank. One looks to the other and asks: How do you drive this thing?’

Many predominant theories of verbal humor states that humor is triggered by the detection and resolution of incongruities [12, 14]. The dictionary defines ‘incongruous’ as lacking harmony of parts, discordant, or inconsistent in nature. During the parsing of a text incongruities form when a reader’s interpretation of some concept conflicts with other possible interpretations as the text is read. This paper will focus on a particular humor subtype where there is a shift from some interpretation to some opposing other. Dynel calls these jokes “garden path” jokes using the garden path metaphor of being misled [1], while other theorists use the terminology of ‘forced reinterpretation’ and ‘frame shifting’ [12]. These jokes are sequential in nature and describe a certain pattern of incongruity and resolution. With a garden path joke a reader establishes some interpretation A as they read the first part of a joke, the setup, but given new evidence included in the second part, the punchline, they must discard this interpretation and establish a new interpretation B. The fishtank joke displays an incongruity; the reader initially interprets the tank to be an aquarium, but given additional information the alternative meaning of a vehicle becomes possible and probable. Incongruities often arise when ‘opposing’ or ‘mutually exclusive’ elements simultaneously occur. Different word senses oppose in that when a tank is a vehicle it is not an aquarium. Opposition occurs in many other areas; when it is summer it is not winter, when something is hot it is not cold, and when one is sad they are not happy. To model this we visualize changes of correlation in the context established for mutually exclusive meanings for some ambiguous word and with the context that different parts of text containing that ambiguous word are found in. Similar meanings will have similar contexts according to the distributional hypothesis. Before presenting the visualizations let us discuss the approach used to establish meaning and how we identifying correlation between meaning representations.

4 Establishing Meanings and Meaning Correlations

The sections on estalblishing meaning and meaning correlations are in line with previous work by the authors of this paper [5]. We chose a vector representation of meaning based on the frequency at which words occur in the context of some target word. This is a standard approach taken by a number of researchers in the past for dealing with meaning [8]. These vector of word associations form an informal ontology describing entities and their relations. The material used to build these vectors was retrieved via a web search. Below we consider some ambiguous word A with a number of possible meanings \(AM_{1}...AM_{n}\) and different parts of some text \(P_{1}...P_{m}\) containing the ambiguous word A.

4.1 Establishing Vectors of Word Association Frequencies Using a Web Mining Approach

For each meaning \(AM_{x}\) we establish a set of disambiguating keywords \(K(AM_{x})\) which uniquely identify that meaning. While we hand-chose our keyword sets these can be established using a variety of resources such as wordnet.

Next we use \(K(AM_{x})\) as a query for a search engine retrieve the top n documents. Let D(q, n) be a search function which retrieves n documents relevant to some query q. The resulting document set for some meaning \(AM_{x}\) is thus designated \(D(K(AM_{x}),n)\).

Finally we compute frequencies of all words occuring within distance j of A given the document set \(D(K(AM_{x}),n)\). We designate this \(F(A, j, D(K(AM_{x}),n)))\) where F is a function that returns a vector of word frequencies. In this paper F uses the term-frequency to inverse document frequency approach (TF-IDF) to establishing word frequencies [11]. \(F(A, j, D(K(AM_{x}),n)))\) represents the meaning for \(AM_{x}\) as a set of word association frequencies or in other words its contexts. These frequencies are ordered by the lexicographic order of the words. Note that we include the frequency of the given word A itself though have experimented with variants which do not include the ambiguous word.

In a similar fashion we established semantics for the ambiguous word A given the different parts \(P_{1}...P_{m}\) of some text containing A. We denote them as \(F(A, j, D(P_{1}, n))\)... \(F(A, j, D(P_{m}, n))\).

4.2 Calculating Correlation Coefficients

We are interested in how the meaning of A given a search for some phrase Px correlates with the meaning of A given the meaning established for each word sense \(AM_{1}...AM_{n}\).

We compute the correlation coefficient given the vector of word frequencies associated with A given a search for some part of text \(P_{i}\), that is \(F(A, D(P_{i}, n)\), and the vector of word frequencies associated with a search for some meaning \(AM_{x}\), that is \(F(A, j, D(K(AM_{x}), n)\), using some function C which return the correlation. We denote these \(C_{iy}\) = \(C(F(A, j, D(P_{i}, n)), F(A, j, D(K(AM_{y}), n)))\).

All of the jokes in our data set are two part jokes in which two meanings are invoked. Given two meanings of some ambiguous word A and some statement with parts \(P_{1}\) and \(P_{2}\) that refer to A, we calculate the following correlation scores:

Given P1 (part one of some text):

\(C_{1x} = C(F(A, j, D(P_{1}, n)), F(A, j, D(K(AM_{x}), n)))\) is a correlation of meaning \(AP_{1}\) with meaning \(AM_{x}\),

\(C_{1y} = C(F(A, j, D(P_{1}, n)), F(A, j, D(K(AM_{y}), n)))\) is a correlation of meaning \(AP_{1}\) with meaning \(AM_{y}\),

Given P2 (part two of some text):

\(C_{2x} = C(F(A, j, D(P_{2}, n)), F(A, D(K(AM_{x}), n)))\) is a correlation of meaning \(AP_{2}\) with meaning \(AM_{x}\),

\(C_{2y}= C(F(A, j, D(P_{2}, n)), F(A, D(K(AM_{y}), n)))\) is a correlation of meaning \(AP_{2}\) with meaning \(AM_{y}\).

4.3 Calculating Correlation Coefficient Differences Given Different Parts of Text

Finally we calculate differences between the correlation coefficients which are useful for joke classification as they describe g correlation movement patterns. For example the difference between \(C_{1x}\) and \(C_{1y}\) tells us which meaning has greater correlation given P1, the first part of the joke, while the difference between \(C_{1x}\) and \(C_{1y}\) tells us which meaning has greater correlation given the second part. If \(C_{1x} - C_{1y} > 0\) then this meaning x is greater than meaning y given part one. On the other hand the difference between \(C_{1x}\) and \(C_{2x}\) tells us if a correlation coefficient for some meaning has increased or decreased given part on or part two of some text. If \(C_{1x} - C_{2x} > 0\) then the correlation of meaning x has decreased as the text is read in while if \(C_{1x} - C_{2x} < 0\) then it has increased.

We calculate the differences between \(C_{1x}, C_{1y}, C_{2x}\), and \(C_{2y}\). The difference \(C_{1x} - C_{1y}\) shows which meaning correlates higher given the first part of text, \(C_{2x} - C_{2y}\) shows which meaning correlates higher given the second part of text, \(C_{1x} - C_{2x}\) shows if meaning X correlates higher in the second part of text compared with the first, and \(C_{1y} - C_{2y}\) shows if meaning Y correlates higher in the second part of text compared with the first.

4.4 Building Features from Correlation Coefficient Differences Given Different Time Steps

We then define four Boolean variables \(x_{1}-x_{4}\) using these differences:

\(x_{1} = 1 \) If \(C_{1x} > C_{1y}\), else \(x_{1} = 0 \)

\(x_{1} = 1 \) means the correlation with meaning X is greater than meaning Y given the first part of the text.

\(x_{2} = 1 \) If \(C_{1x} > C_{2x}\), else \(x_{2} = 0 \)

\(x_{2} = 1 \) means the correlation with meaning X decreased going from part one to part two of the text

\(x_{3} = 1 \) If \(C_{1y} < C_{2y}\), else \(x_{3} = 0 \)

\(x_{3} = 1 \) means the correlation with meaning Y increased going from par tone to part two of the text

\(x_{4} = 1 \) If \(C_{2x} < C_{2y}\), else \(x_{4} = 0 \)

\(x_{4} = 1 \) means the correlation with meaning Y is greater than meaning X given the second part of the text.

5 Example

Take a two-part garden path joke J with the parts \(P_{1}\) = ‘fish in tank’ and \(P_{2}\) = ‘they drive the tank’ that contains the ambiguous word \(A=\)tank’. Let \(tankM_{1}\) and \(tankM_{2}\) be the two meanings invoked at different points while reading J, that of an aquarium and that of a vehicle.

\(P_{1}\)= “fish in a tank.”

\(P_{2}\)= “drives the tank”

\(K(tankM_{1})\) = [“aquarium”, “tank”]

\(K(tankM_{2})\) = [“vehicle”, “panzer”, “tank”]

This is a distilled example of the “fishtank” joke presented in the Sect. 3. In order to concentrate on the issue at hand, i.e. modeling incongruity, we reduced many jokes to simplified form.

We establish vectors for the various meanings of ‘tank’ using data from searches for \(P_{1}, P_{2}, K(tankM_{x})\) and \(K(tankM_{y})\) and then calculate the correlation coefficients between these meaning vectors. The meanings for ‘tank’ found in \(P_{1}\) and \(P_{2}\) may or may not be the same as \(M_{1}\) and \(M_{2}\). According to the distributional hypothesis, which states that similar meanings will have similar contexts, if they are the same then there should be correlation of context. The correlation of context can be found by comparing the vectors of word associations we extracted via web mining.

Meaning correlation coefficients given \(P_{1}\):

Meaning correlation coefficients given \(P_{2}\):

Over the course of a garden path joke there should be a switch in dominant meaning correlation coefficient. Given the first part correlation with meaning X should be greater and given the second part correlation with meaning Y should be greater.

6 Data Set Used in Visualizations

We collected two part jokes of garden path form containing lexical ambiguities and converted them into a simple form by hand as we want to model incongruity rather than focusing on other issues related to parsing text. Algorithmically selecting relevant parts of text \(P_{1}\) and \(P_{2}\) from longer texts that contain a lot of additional material is a valid approach but outside the scope of this research. Thus material not relevant to the interpretation of the ambiguous lexical entity was removed. Thus “Two fish are in tank” becomes “a fish in a tank.” as the number of fish has little to do with the lexical ambiguity involed in the incongruity we are attempted to model. In order to focus on developing means of visualizing text we let meaning X to be the meaning indicated in the first part of the text and meaning Y to be the secondary meaning.

For each joke we created a non joke of similar form. It contains the same first part but a different non-humorous second part. We strove to change as little as possible, usually only a noun or verb, to preserve the structure of the statement. The following are some examples of jokes and non jokes contained in the data set.

Joke1:   

\(P_{1}\): Two fish are in a tank.

\(P_{2}\): They drive the tank.

NonJoke1:

\(P_{1}\): Two fish are in a tank.

\(P_{2}\): The swim in the tank.

Meaning X search query: ‘Aquarium tank’

Meaning Y search query: ‘Panzer tank’

Joke2:    

\(P_{1}\): No charge said the bartender..

\(P_{2}\): To the neutron.

NonJoke2:

\(P_{1}\): No charge said the bartender.

\(P_{2}\): To the customer.

Meaning X search query: ‘Cost charge’

Meaning Y search query: ‘Electron charge.’

7 Visualization Approach 1. Collocated Paired Coordinates

Our first visualization uses a visualization technique known as collocated paired coordinates [2]. Given some ambiguous element with multiple possible meanings, we plot meaning correlation scores established given a part of text and the various meanings as points on a coordinate graph. The Y axis measures the correlation with meaning Y, while the X axis measures the correlation with meaning X. Each part of text in a sequence results in a point and these points are connected with arrows representing time. This allows us to visualize correlation patterns over time. A garden path jokes which involves a shift from one meaning to the next should form a line moving away from one axis and towards another as the meaning correlation score for one meaning lessens and another meaning increases. In our visualization we set the X axis to measure correlation with the meaning invoked by the first part of the text and the Y axis to measure the correlation with second meaning invoked in the second part of text so that the arrows should all move in the same direction as a meaning shift occurs.

Visualization overview:

For \(P_{1}\) and \(P_{2}\) we plot the meaning correlation coefficients given two opposing meanings for some ambiguous word with \(AM_{x}\) and \(AM_{y}\) as points:

The X axis represents correlation with some meaning \(AM_{x}\).

The Y axis represents correlation with some meaning \(AM_{y}\)

  1. 1.

    Plot a point representing the meaning correlations given \(P_{1}\).

  2. 2.

    Plot a coordinate representing the meaning correlations given \(P_{2}\).

  3. 3.

    Connect via an arrow indicating time.

  4. 4.

    Color-code green if humorous, red if not, and black if unknown.

Fig. 1.
figure 1

Collocated paired coordinate plot of meaning context correlation over time. The set of jokes and non jokes plotted as meaning correlation over time using collocated paired coordinates. (Color figure online)

Fig. 2.
figure 2

Second endpoint only. The correlation coefficients given \(P_{2}\).

7.1 Discussion

While there are some examples which fail to match the pattern, it is clear that most jokes involve a shift away from correlation with one meaning and towards the second meaning given part two of the joke. Figure 1 shows this as the green arrows, representing jokes, move from one axis to another while the red arrows tend to stay closer to the original meaning as there is no meaning change. An analysis of the handful of cases that do not follow this pattern indicates explainable circumstances such as the web search returning irrelevant documents due to things like a poor choice in keywords or semantic noise. Methods such as dimensionality reduction including latent semantic analysis may help with this. Figure 2 only looks at the meaning correlation coefficients given \(P_{2}\) which clearly shows that there is higher correlation with meaning Y which opposes some meaning X that was initially established.

8 Visualization 2: Heat Maps

Visualization Overview: In the previous visualization we saw that there is a shift from one meaning correlation being higher to the opposite. To test this intuition we make use of heat maps based on differences in correlation coefficient values given the different meanings and different parts of text. With this approach we can identify potential features that distinguish jokes from non-jokes, assisting in model discovery.

Visualization Algorithm:

  1. 1.

    Organize the correlation coefficient differences as established in Sect. 4.3 in a data frame along with classification of being a joke or not.

  2. 2.

    Color code the correlation score differences based on value.

  3. 3.

    Sort the rows into groups by classification, that is into two groups of joke and non joke.

  4. 4.

    Identify regions of the heat map where there is a distinguishable difference between the joke and non joke sections in terms of color.

Fig. 3.
figure 3

Heat map for correlation differences. Column A shows that the first meaning has a higher correlation score than the second given the first part of the joke while Column B show that the second meaning Y has a higher correlation than the first meaning X in the second part of the joke.

8.1 Discussion

While this heat map only uses three colors when color coding correlation coefficients by value, clearly we can identify areas where the joke data set differs from the non joke dataset. Lets look at the column representing the difference between \(C_{2x}-C_{2y}\). In Fig. 3 this is the column indicating the difference between correlation with meaning X and meaning Y given the second part of the joke. If this value is less than 0 then meaning Y is greater given \(P_{2}\), if it is greater than 1 then meaning X remains dominant. While we already expected this to happen, the heat map would allow us to automatically identify this value as being a distinguishing feature between classes.

9 Visualization 3: Visualizing a Model Space Using Monotone Boolean Chain Visualizations

In the last viusalization we use a two-dimensional representation of Boolean space based on the plotting of chains of monotonically increasing Boolean vectors [3] to visualize the difference between garden path jokes and non jokes. Vectors are arranged according to their norm, with the all true Boolean vector at one end of the plot and the all false vector at the other. The arrangement of the vectors form chains where monotonicity is preserved, that is as each succeeding vector in the chain is the same as the last except has an additional bit set to one. Each chain describes the change in features. The chains altogether represent a model space based on the Boolean features \(x_{1}...x_{4}\) derived from the meaning correlation coefficient differences described in Sect. 4.4.

Visualization overview:

  1. 1.

    For each joke/nonjoke establish a vector of Boolean values as described in Sect. 4.4.

  2. 2.

    Establish and visualize a 2D Boolean space representation as described in [3]

  3. 3.

    Plot vectors established each joke or non-joke as a dot on the Boolean plot.

  4. 4.

    Color code the dot as green if humorous, red if not humorous.

Figures 4 and 5 show the resulting visualization using our data set.

Fig. 4.
figure 4

Monotone boolean plot of jokes and nonjokes. Features from the data set of jokes and non jokes describing differences of correlation given different meanings and time steps plotted as Boolean vectors (Color figure online)

Fig. 5.
figure 5

Single chain. Here one chain of monotonically increasing Boolean vectors is isolated to establish a border between humorous and nonhumorous examples in terms of features. (Color figure online)

9.1 Discussion

Jokes and non jokes can be converted to a vector of Boolean values representing the presence or lack of various features in a such that we can visually distinguish and establish a border between the two classes of humorous and not humorous. By looking at a chains of Boolean vectors that contains examples from each class, each vector containing one additional feature, we can clearly see where non-humorous texts end and humorous ones begin in terms of model features. When looking at chain as shown in Fig. 5 we see that the key difference is the Boolean value which indicates that some second meaning correlated higher than the first given the second part of the text.

10 Comparison with the Results Using a Traditional Decision Tree Based Data Mining Approach

Our analysis of visualizations generated using humorous and non humorous data sets can be compared with the results using traditional data mining approaches. In particular we used a C4.5 decision tree algorithm which resulted in a model indicating the same key features involving changes in meaning correlation as our visualization show.

Resulting C4.5 model:

\(If C_{2x}-C_{2y} < 0.0075\) then class= joke (89.4% of 19 examples) \(If C_{2x}-C_{2y} >= 0.0075\) then class= nonjoke (100.0% of 15 examples)

The C4.5 decision tree results in one key splitting feature which is the same which we found through the visual data mining process. Given a two part garden path joke involving a lexical ambiguity where some meaning for an ambigous word is implied in the first part of the text, another alternate meaning shows higher correlation given the second.

11 Physical Humor and Humorous Smart Environments

These approaches can be used to detect and classify other forms of humor including physical humor [9]. One situation where these data visualization strategies can be used is in detecting humorous shifts of interpretation given non verbal scenes. There are many non verbal variants of the garden path joke where a viewer is given partial information, makes some assumption, and then given new information resulting in ‘forced reinterpretation’. For example consider the image of a lemonade stand. The interpreted ‘season’ is ‘Summer’. Addtional information provided via a ‘zoom out’ mechanism shows that the lemonade stand is actually in the middle of a snow storm. The viewer must reinterpet the season to be ‘Winter’. ‘Summer’ and ‘Winter’ are mutually exclusive in that they generally do not exist simultaneously in some given area. A computer vision approach to scene analysis can be used to detect these humorous sequences by identifying instances where scene elements must be reinterpreted given the introduction of new data. Ultimately this approach takes non verbal humor and converts it to verbal humor as most computer vision algorithms take non verbal image data and label it with words such as we find with standard bag of words approaches. This could result in a predictive model but one of the main goal of this session is to build generative models which will generate humor. Let us look at how this might work.

One potential future role for smart environments is to introduce humor for the entertainment of its inhabitants [9]. A smart environment would need the following things to generate sequential humor: a model for variant of sequential humor, an algorithmic approach to using this model, an ontology describing the things which exist in the world and their relations, and finally the methods of introducing new information. First there are many models for sequential humor though many are vague including ours. We hope the results of visual data mining and traditional data mining approaches will help in automating this modeling process resulting in more robust models. Let us look at two of many algorithmic approaches to utilizing this model. The first algorithm is to ‘hide and reveal’. With this algoithm the smart environment identifies an incongruity in an environment, hide elements so that one interpretation is likely, and then reveal those elements so that there will be a shift in interpretation. The second algorithm is the ‘introduce and reveal’ where the smart environment identifies new elements it can introduce to trigger the incongruity resolution process. This will take more extensive computational power as the environment is not just limited to what is already in a scene but all the possibilities of what could be in a scene. Search strategies such as the ‘Monte Carlo’ search will help when dealing with the massive number of elements a smart environment can introduce. Third the smart environment needs an ontology. This can be something extremely robust and include many types of relations such as causal relations or it could be simpler. A limited ontology might result in less opportunities for humor introduction but will still work. A smart environment does not need to pass a turing test to have a sense of humor. We would like to see smart environments perform their own automated ontology construction such as we do when web mining relations so they each might have a slightly different sense of humor. Finally for sequential humor a smart environment will need a methods of introducing new information. There are many ways of introducing new information which range from very simple to very difficult. Let us look at three of these.

First are the ‘pan’ and ‘zoom’ mechanisms where a smart environment can somehow facilitate moving things in and out of view by panning and zooming in or out. In film this may be easy but to actually move items would be hard and to move the platform a user is on even harder. There are some possible intermediate steps such as the use of ‘smart windows’ with panning and zooming capabilities. For example if an window can adjust its magnification it can zoom in or out, hiding and revealing some detail. Second we have mechanisms based on illumination. Shadows can hide objects and lighting can reveal them. The initial dimmed lighting might show two fish in a tank swimming around but suddenly a light shines on the tanks wheels and barrels. Finally amongst many other techniques for introducing information we have projection based techniques such as the use of film projectors to project images onto surfaces, lighting to project shadows such as when a hanging plant seems like a flying saucer or coat rack a monster, or even something simple like printing off and attaching stickers to objects. For example a smart environment may project the image of wheels onto a fish tank and remove them in a sequence designed to trigger potential shifts in interpretation.

Incongruities and their resolution appear in many other places where classification occurs though. For example in the fruit industry incongruities arise and are resolved many times as day as fruit is sorted in a complex sequential process involving classification at many stages. There is presort, which uses one system to sort fruit but misclassifiees a small percentage. When going through the sizer these are knocked onto belts leading them to cull and peeler bins. Then the sizer classifies but there are still errors so these are caught by hand selection at the tray filler stations. It is a hybrid computer-human sorting operation which allows for fixing mistakes at several stages. In general incongruities can arise and are resolved where classification occurs based on multiple sources of evidence, for example where multiple sensors are used, or where a sensor takes readings at multiple steps in time. These visualization should be able to identity some of these non verbal and non humorous ‘mistakes’ and their resolution which can be useful for a number of tasks from process control to sensor management.

12 Conclusion

Overall the results from this study show that visualization can be used as a valid strategy for approaching the modeling and detection of humor within text. This paper presented three approaches that were all successful in enabling a person to identify key features that distinguish humorous and non humorous garden path jokes. One future direction is to use these visualization techniques on other joke types to see what they would look like in terms of patterns of meaning correlation over time. These techniques can potentially be used to visualize many other forms of incongruity within texts such shilling within product review sets, paradigm level formation of incongruity and resolution within academic document sets over time, and for other phenomena involving opposing states and patterns of shifting such as the writings of a bipolar patient who might shift from one opposing emotion to the other in a cyclic fashion. They also allow for ploting of many examples at once whic lets us visualize big data in terms of natural language texts. As discussed they can also potentially be used to identify non verbal incongruities and their resolution such as non verbal humor. Overall the results are promosing and these methods will be developed beyond the toy level they are currently at.