Keywords

1 Introduction

The ability to automate the interpretation of human language descriptions of location, effectively mapping a string of text to a coordinate location, is useful for a number of applications. Social media, blogs, scientific reports and logs are all examples of potential sources of geographic data that is currently untapped. Expressions like there has been an accident outside the post office and the specimen was collected on the hillside above Arthurs Pass contain geographic information, but need to be processed in order to make use of it.

Several research directions are being pursued with the goal of automating the interpretation process, and also performing the reverse conversion, in which text is generated to describe a location. These directions include but are not confined to: so-called spatial role labelling to identify the key elements of a spatial expressions, focused on spatial preposition, locatum and relatum (e.g. (Kordjamshidi et al. 2011)); the development of mathematical models for the interpretation of specific spatial prepositions (e.g. (Hall et al. 2015; Shariff et al. 1998)) and linguistically-based discussions of how spatial prepositions are used (e.g. (Talmy 2000)). A number of researchers have identified specific factors that may have an impact on the meaning of geospatial natural language (and thus interpretation and generation), but there has been no broad view taken across a range of contextual factors relevant for geospatial language at different levels. We aim to address this gap, and to test the use of an empirical methodology to try to identify contextual factors that may not already be well understood. We limit the scope of our work to the use of context in location descriptions in the English language. Variations are likely in the role and importance of different factors in different languages.

2 Geographic Information and Context

In the broader computer science and natural language processing fields, the importance of context is well recognized, and has been explored in some detail. For example, Porzel (2010) identifies four types of contextual information: domain, discourse, interlocutionary and situational context, and identifies some of the contextual challenges posed by spatial information descriptions.

In the geographic information literature, researchers studying semantics and ontologies have addressed the subject, and in one of the most extensive treatments, Cai (2007) defines situation as the circumstances that surround a particular action, and context as the collection of situations that vary in ways that do not impact on the user’s behavior. His schema for context includes: user task; location; expected features of the context; domain ontologies; goals, subgoals and events.

Other broad schemes are proposed by Souza et al. (2006), who identify four types of context: user context; data context; association context and procedure context; and Brodaric (2007) who represents context using dimensions, which consist of origins, uses and effects, and involve entities with specific roles. These schemes identify the importance of pragmatics, a fact that is also clearly demonstrated by Herskovitz (1985). Attention has also been given to the role of object affordance (the uses to which an object is put) (Sen 2008).

Other researchers have focused on one or a small subset of factors that might be considered part of the broader picture of context, rather than attempting to define a comprehensive scheme. For example, context has been considered closely related to application domain, combining object types of interest, the relations between them and the functions that the user wishes to perform on them (Rodriguez and Egenhofer 2004). In geospatial natural language research, Lautenshütz et al. (2006) explores object liquidity or solidity as one aspect of context, while Kray et al. (2013) discusses the influence of indoors, outdoors or transitional space.

There have also been various efforts to define formal and semi-formal structures into which language can be slotted, considering different aspects of context, including Hornsby and Li (2009) and Zwarts (2005), who define representations of paths and Kracht (2002), who defines a structure for locative expressions. In addition, SpatialML (2009), ISO-Space (Pustejovsky et al. 2011) and GUM-Space (Bateman et al. 2010) are more comprehensive structures for the representation of spatial language, incorporating some contextual components.

Given the varying use of the term context, in this paper, we adopt a wide definition, encompassing anything that is not directly stated in the textual expression itself, including the wider situation and environment, as well as the characteristics of the terms used in the expression (e.g. the relatum and locatum), and potentially incorporating both parts of the two-level semantics discussed by Lang and Maienborn (2011).

3 Contextual Factors Discussed in the Literature

The following paragraphs present a set of contextual factors that have been identified by researchers, either by directly specifying their importance in geospatial natural language interpretation, or indirectly by discussing how they vary across expressions. They are presented in no particular order.

Image schema is one of the most commonly studied contextual factors in geospatial language (Lakoff and Johnson 1980; Mark and Frank 1996), and is particularly connected to the preposition. The house is on Main Street, the house is on the island and the house is on the hill show different interpretations of the preposition on with varying image-schemas.

The importance of perspectival mode is another factor that is commonly discussed (Bateman et al. 2010; Coventry and Garrod 2004; Hubona et al. 1998; Talmy 2000), with the interpretation varying depending on whether the viewer has a survey (bird’s eye) or a route perspective. In there were some houses in the valley the valley is viewed from above and seen as a container, while in there were houses now and then through the valley it is a path viewed as if moving through the scene, and time is used to indicate spatial location.

Frame of reference is well recognized as an important factor, in which the expression may take an intrinsic (frame of reference is the object itself), absolute (frame of reference is external) or relative (frame of reference is relative to observer or other object) frame of reference. The kiosk is in front of the building may refer to the front relative to the building (e.g. the entrance) or front relative to the observer (Landau and Jackendoff 1993; Levinson 2003).

Shape (or geometry type) may impact on the way an expression is interpreted (Landau and Jackendoff 1993). For example, some prepositions only make sense with some shapes, whilst others are interpreted differently depending on the shape. The road goes beside the river implies a degree of alignment while the church is beside the river does not.

Some spatial expressions refer to parts of an object that are related to the notion of an axial structure of an object (front, back, left, right, top, bottom). The axis may be inherent in the object (the front of a building is normally the main entrance) or may be contextually imposed (the front of a moving flood is the side facing its direction of travel). For example: the city is in front of the hurricane versus. the kiosk is in front of the building (Bateman et al. 2010; Landau and Jackendoff 1993).

Scale has been identified as a factor that may influence the meaning of a spatial expression (Lautenschütz et al. 2006). For example: the road runs across the park versus. the canal runs across the country. Lautenshütz et al. (Lautenschütz et al. 2006) also discuss the importance of whether an object is liquid or solid in interpretation of meaning, as in the river runs to the sea versus the road runs to the sea.

Coventry and Garrod (2004) discuss the importance of force dynamics that exist between objects (e.g. how they push against each other) on the way language is used, along with the nature of objects and the purpose of the expression. For example, they assert that the semantics of the in preposition includes the notion of location control (the car is in a traffic jam, the car is in a queue). GUM-Space classifies spatial modalities in ways that reflect this theory (Bateman et al. 2010).

The domain (or more specifically, the geographic feature types) involved in the spatial expression may affect its interpretation (Klippel et al. 2011). However, the characteristics of the objects in the domain (for example, the factors that have been presented above) and the types of expressions that make sense with particular feature types may explain these domain variations.

Spatial expressions may refer to objects that are bounded or unbounded, and spatial relations may be conceptually bounded or unbounded. For example: the plane crossed the lake in 3 min versus the plane crossed the water for 3 min (Talmy 2000; Zwarts 2005).

Spatial expressions may refer to objects that are single items, or that are collections of items (and thus divided in nature). Factors such as Dividedness, Quantity and Plexity fall into this category (Landau and Jackendoff 1993; Talmy 2000). For example: there were houses in the valley versus there was a house in the valley.

Pattern of distribution of objects in space can impact on language interpretation (Talmy 2000). For example, every second shop had flags hung outside.

4 Empirical Identification of Contextual Factors

While a number of factors have been investigated in these studies, and several broad schemes for representing context have been developed, little attention has been given to the contextual factors that are specific to a given spatial relation. The role of the surrounding environment has been incorporated in defining mathematical models to represent the meaning of relations such as near (Gahegan 1995) and opposite (Bartie et al. 2011), but this has not been extended to other relations, and little work has been done to ensure that we have a clear picture of the range of factors that might determine the meaning of many spatial relations.

In order to address this gap, we developed and tested a methodology to empirically collect information on the factors that might determine whether and why a particular preposition is used by an individual in a given situation. We focus on prepositions as they are an important way of expressing spatial relations, but we acknowledge that spatial relations are often expressed through other parts of speech, and that this approach provides only a partial picture.

We conducted a study in which we asked participants to provide a brief written explanation of the reasons a particular preposition was applicable in a given situation. The data was collected as part of a wider study, in which participants were shown an aerial photographic image of part of a city and asked to judge the level of applicability of a preposition to describe locations relative to a specified reference object (e.g. photo taken at the Millenium Centre) by rating a marker on the image on a scale of 1–9. We do not report this first part of the survey here as it has been reported elsewhere (Hall et al. 2015), but instead focus on the following step, in which participants were asked to provide qualitative free text descriptions of how they interpreted the preposition used in the applicability question (“How do you define < spatial preposition >?”). The survey was distributed by email to all staff and students at Cardiff University with an incentive of a 50 lb voucher prize-draw for those who completed it. Over a six week period, 1210 responses were collected. The study investigated four different prepositional phrases (at, near, next to, between) and one phrase that combined a preposition and relatum (at the corner). The study was conducted in English only. All of the location descriptions were given in English and referred to geographic locations in Cardiff, Wales.

We analyzed a randomly selected sample of 100 responses for each of the given phrases, summarizing and grouping together similar reasons for the selection of a particular applicability measure for a given phrase. The following are some examples of the responses received for next to:

  • “So that there is little or nothing between the photograph and the object photographed”

  • “Very close to the location (no more than 10 metres approx)”

  • “Stood outside a specified building, or with nothing but space between you and the building, and either in sight of a street or on an adjacent/very close street.”

In analyzing the results, we performed a simple count of the number of times a particular contextual factor (or reason) was identified for each phrase and across all phrases, as we were interested in identifying factors that were more broadly important than for just a single phrase. The reasons/factors were manually identified in a bottom up fashion by one of the authors. We recognize this as a limitation of the current work, and in future work, inter-annotator agreement will be evaluated. Descriptions that used similar text or were synonymous were grouped together, as shown in Table 1. In some cases, similar reasons were grouped together to create a single factor. For example, visibility was identified as important for several phrases, but the nature of visibility varied (from the locatum, to the locatum, etc. Similarly, immediacy (whether or not there were intervening objects) had several different types, including whether or not the intervening objects were of the same type, were large, etc.

Table 1 Examples of reasons given for each factor

We ignored clauses within the responses that used the name of the spatial relation that was being defined (e.g. in “next to or nearby with no large object in between”, next to was ignored). Figure 1 shows the number of times a particular factor was mentioned for each spatial relation for those that were mentioned 5 or more times across all spatial relations. Infrequently mentioned factors were excluded as they were only mentioned by one or two people for a given spatial relation, and were thus thought to be less critical in the determination of meaning of the relation.

Fig. 1
figure 1

Counts of factors in reasons given for applicability of spatial phrases

The results indicate that the factors that are important for next to, near and at are much more similar than those for between. At the corner also has its own profile in terms of the factors that are considered important.

5 A Typology of Contextual Factors

The factors involved in context have been categorized in various ways, as described in Sect. 2. However, the broadest schemes (e.g. (Porzel 2010)) do not address spatial aspects, and while some of the spatial schemes (e.g. (Brodaric 2007; Cai 2007; Souza et al. 2006)), incorporate spatial aspects, they do not include the level of detail identified in our empirical study. On the other hand, Talmy (Souza et al. 2006) lists factors that relate to specific spatial relations, but does not include the broader view. We therefore propose the typology shown in Fig. 2, which groups together many of the factors identified by previous researchers, and in our empirical study. We identify six broad types, moving from the most general environmental level factors with aspects such as indoor/outdoor, incorporating the observer and his or her goals and tasks as identified in the pragmatics research, down to the more detailed levels of specific spatial relation and object (geographic feature) factors. This is a preliminary typology and requires more detailed development and specification, both within each of the types and to define linkages and relationships between types.

Fig. 2
figure 2

A typology of contextual factors in geospatial natural language

6 Conclusions

Considerations of context are essential for accurately determining the meaning of geospatial location expressions, and in this paper, we have defined context very widely, to consider both aspects of the broader environment in which a description occurs, as well as characteristics of the observer, the objects involved and the spatial relations between them. Based on a review of the literature and an empirical study, we identify a range of contextual factors that influence the interpretation of spatial expressions and propose a typology that groups the factors into six types.

In future work we propose to apply the empirical approach presented here more widely to gain a more comprehensive picture of the contextual factors that influence the use of spatial language. Additional work is needed to develop the typology, specifying the factors within each type and the relationships between types.