Skip to main content
Log in

Interpreting spatial language in image captions

  • Research Report
  • Published:
Cognitive Processing Aims and scope Submit manuscript

Abstract

The map as a tool for accessing data has become very popular in recent years, but a lot of data do not have the necessary spatial meta-data to allow for that. Some data such as photographs however have spatial information in their captions and if this could be extracted, then they could be made available via map-based interfaces. Towards this goal, we introduce a model and spatio-linguistic reasoner for interpreting the spatial information in image captions that is based upon quantitative data about spatial language use acquired directly from people. Spatial language is inherently vague, and both the model and reasoner have been designed to incorporate this vagueness at the quantitative level and not only qualitatively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Notes

  1. http://www.geograph.org.uk.

  2. http://www.locr.com.

  3. http://www.flickr.com.

  4. A toponym is a place with a recognisable name used in communication, where a “place” is often defined simply as a meaningful geographic location (Goodchild and Hill 2008).

  5. There is also the possibility of using an approach based on supervaluation (see Bennet 2001 or Kulik 2001, which allows for the creation of a set of boundaries that describe the gradual transition from the definite to the definitely not area, similar to iso-lines used to represent height on conventional maps. The fuzzy methodology seems to be more frequently used and thus is given prominence.

  6. http://www.geograph.org.uk.

  7. To avoid a bias being introduced by a small group of frequent contributors producing most of the captions, only one caption per contributor was considered. This reduces the number of captions from around 350,000 to 580.

  8. See “Appendix” for the technical aspects of the vague field.

  9. See “Appendix” for details on how it is calculated.

  10. As Lodge (1984) phrases it “It’s not so easy, every decoding is a new encoding”.

  11. This can be seen in the evaluation results in “Evaluation experiment” where the low inter-participant agreement indicates the large number of different possible encodings and decodings.

  12. Named Entity Recognition is the task of identifying noun phrases that refer to specific individuals whether these be people, companies, dates, ....

  13. http://developer.yahoo.com/geo/geoplanet/.

  14. An example of how this additional information might help is in the caption “Flowers near Stackpole Head”, where Stackpole Head is a coastal headland, and the inclusion of the knowledge that the photograph is of “Flowers” could restrict the probable area where the photograph was taken to the land-side, whereas if the subject were “Sailing boat” then the sea-facing “near” area would be more likely. This kind of knowledge processing was decided to be beyond scope of this paper and thus the information is discarded.

  15. The results for “east of” will not be reported as they are analogous to the “north of” results and the “between” results due to space constraints.

  16. 0 and 1—high agreement, 2—medium agreement, or 3—higher–low agreement.

  17. For north and south this is the vertical axis, for east and west the horizontal axis.

References

  • Ahlqvist O, Keukelaar J, Oukbir K (1998) Using rough classification to represent uncertainty in spatial data. In: Proceedings of the SIRC Colloquium, pp 1–9

  • Altman D (1994) Fuzzy set theoretic approaches for handling imprecision in spatial analysis. Int J Geogr Inf Sci 8(3):271–289

    Article  Google Scholar 

  • Andogah G, Bouma G, Nerbonne J, Koster E (2008) Placename ambiguity resolution. In: LREC workshop on methodologies and resources for processing spatial language

  • Bennet B (2001) Application of supervaluation semantics to vaguely defined spatial concepts. In: Spatial information theory. foundations of geographic information science : international conference, COSIT 2001 Morro Bay, CA, USA, September 19–23, 2001. Proceedings, pp 108–123

  • Bennett B, Agarwal P (2007) Semantic categories underlying the meaning of ’place’. In: Spatial information theory, 8th international conference, COSIT 2007, Melbourne, Australia, September 19–23, 2007, Proceedings, pp 78–95

  • Bittner T, Stell J (2003) Stratified rough sets and vagueness. In: Spatial information theory, Springer, Berlin/Heidelberg, pp 270–286

  • Bowerman M, Choi S (2003) Space under construction: language-specific spatial categorization in first language acquisition. In: Gentner D, Goldin-Meadow S (eds) Language in mind. MIT, Cambridge, pp 387–428

    Google Scholar 

  • Brown P (1994) The ins and ons of tzeltal locative expressions. Linguistics 32:743–790

    Article  Google Scholar 

  • Burghardt D (2005) Controlled line smoothing by snakes. GeoInformatica 9(3):237–252

    Article  Google Scholar 

  • Buscaldi D, Rosso P (2008) Map-based vs. knowledge-based toponym disambiguation. In: Proceeding of the 2nd international workshop on geographic information retrieval.GIR’08, pp 19–22

  • Chomsky N (1965) Aspects of the theory of syntax. MIT, Cambridge

    Google Scholar 

  • Clementini E, Felice PD (1996) An algebraic model for spatial objects with indeterminate boundaries. In: Geographic objects with indeterminate boundaries. Taylor and Francis, London, pp 155–169

  • Clementini E, Felice PD (1997) Approximate topological relations. Int J Approx Reason 16(2):173–204

    Article  Google Scholar 

  • Cohn A, Gotts N (1996a) The ’egg-yolk’ representation of regions with indeterminate boundaries. In: Proceedings, GISDATA specialist meeting on geographical objects with undetermined boundaries, Francis Taylor, pp 171–187

  • Cohn A, Gotts N (1996b) Representing spatial vagueness: a merological approach. In: KR’96: principles of knowledge representation and reasoning. Morgan Kaufmann, San Mateo, pp 230–241

  • Couclelis H (1992) People manipulate objects (but cultivate fields): Beyond the raster-vector debate in gis. In: Theories and methods of spatio-temporal reasoning in geographic space, vol 639/1992, Springer, Berlin/Heidelberg, pp 65–77

  • Couclelis H, Gottsegen J (1997) What maps mean to people: Denotation, connotation, and geographic visualization in land-use debates. In: Spatial information theory: a theoretical basis for GIS (COSIT’97), vol 1329/1997, Springer, Berlin/Heidelberg, pp 151–162

  • Coventry K, Prat-Sala M, Richards L (2001) The interplay between geometry and function in the comprehension of over, under, above and below. J Memory Lang 44(3):376–398

    Article  Google Scholar 

  • Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: a framework and graphical development environment for robust nlp tools and applications. In: Proceedings of the 40th anniversary meeting of the association for computational linguistics, pp 168–175

  • Edwardes A, Purves R (2007) A theoretical grounding for semantic descriptions of place. Lect Notes Comput Sci 4857:106

    Google Scholar 

  • Egenhofer M (1991) Reasoning about binary topological relations. In: Second symposium on large spatial databases, lecture notes in computer science, vol 525. Springer, pp 143–160

  • Erwig M, Schneider M (1997) Partition and conquer. In: COSIT ’97: Proceedings of the international conference on spatial information theory. Springer, London, pp 389–407

  • Fabrikant S, Buttenfield B (2001) Formalizing semantic spaces for information access. Ann Assoc Am Geogr 91(2):263–280

    Article  Google Scholar 

  • Fisher P (2000) Sorites paradox and vague geographies. Fuzzy Sets Syst 113(1):7–18

    Article  Google Scholar 

  • Fisher P, Wood J, Cheng T (2004) Where is helvellyn? Fuzziness of multi-scale landscape morphometry. Trans Inst Br Geogr 29(1):106–128

    Article  Google Scholar 

  • Fisher PF, Orf TM (1991) An investigation of the meaning of near and close on a university campus. Comput Environ Urban Syst 15(1-2):23–35. doi:10.1016/0198-9715(91)90043-D

    Article  Google Scholar 

  • Frank AU, Raubal M (1999) Formal specification of image schemata—a step towards interoperability in geographic information systems. Spatial Cogn Comput 1(1):67–101

    Article  Google Scholar 

  • Friedman C, Kra P, Yu H, Krauthammer M, Rhzetsky A (2001) Genies: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(1):74–82

    Google Scholar 

  • Fuhr T, Socher G, Scheering C, Sagerer G (1995) A three-dimensional spatial model for the interpretation of image data. In: IJCAI-95 Workshop on the representation and processing of spatial expressions, pp 93–102

  • Gahegan M (1995) Proximity operators for qualitative spatial reasoning. In: Spatial information theory: a theoretical basis for GIS. Springer, Berlin/Heidelberg, pp 31–44

  • Gapp K (1994) Basic meanings of spatial relations: Computation and evaluation in 3d space. In: National conference on artificial intelligence, pp 1393–1398

  • Gärdenfors P (2000) Conceptual spaces: the geometry of thought. MIT Press, Cambridge

    Google Scholar 

  • Garrod S, Ferrier G, Campbell S (1999) In and on: investigating the functional geometry of spatial prepositions. Cogn Tech Work 72(2):167–189

    CAS  Google Scholar 

  • Goodchild M (1992) Geographical data modeling. Comput Geosci 18(4):401–408

    Article  Google Scholar 

  • Goodchild M, Hill L (2008) Introduction to digital gazetteer research. Int J Geogr Inf Sci 22(10):1039–1044

    Article  Google Scholar 

  • Guo Q, Liu Y, Wieczorek J (2008) Georeferencing locality descriptions and computing associated uncertainty using a probabilistic approach. Int J Geogr Inf Sci 22(10):1067–1090

    Article  Google Scholar 

  • Güting R, Schneider M (1993) Realms: A foundation for spatial data types in database systems. In: Proceedings of the 3rd international symposium on large databases, pp 33–44

  • Hall M, Jones C (2008) Quantifying spatial prepositions: an experimental study. In: Proceedings of the ACM GIS’08, pp 451–454

  • Hall MM, Jones CB (2009) Initialising and terminating active contours for vague field crisping. In: GISRUK 2009, pp 395–397

  • Hall S (1980) Encoding/decoding. In: For Contemporary Cultural Studies C (ed) Culture, media, language: working papers in cultural studies 1972–79. Hutchinson, London, pp. 128–138

  • Hengl T (2007) A practical guide to geostatistical mapping of environmental variables

  • Herskovits A (1986) Language and spatial cognition: an interdisciplinary study of prepositions in English. Cambridge University Press, Cambridge

    Google Scholar 

  • Horvath P, Jermyn I, Kato Z, Zerubia J (2009) A higher-order active contour model of a ’gas of circles’ and its application to tree crown extraction. Pattern Recogn Lett 42(5):699–709

    Google Scholar 

  • Hwang S, Thill JC (2005) Modeling localities with fuzzy sets and gis. Fuzzy modeling with spatial information for geographic problems, pp 71–104

  • Johnson M (1987) The body in the mind. University of Chicago Press, Chicago

    Google Scholar 

  • Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vis 1(4):321–331

    Article  Google Scholar 

  • Kemmerer D (2006) The semantics of space: integrating linguistic typology and cognitive neuroscience. Neuropsychologia 44(9):1607–1621

    Article  PubMed  Google Scholar 

  • Kemmerer D, Tranel D (2000) A double dissociation between linguistic and perceptual representations of spatial relationships. Cogn Neuropsychol 17(5):393–414

    Article  CAS  PubMed  Google Scholar 

  • Klippel A, Montello D (2007) Linguistic and nonlinguistic turn direction concepts. In: Spatial information theory, 8th international conference, COSIT 2007, Melbourne, Australia, September 19–23, 2007, Proceedings, pp 354–372

  • Klir G, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  • Krige D (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J Chem Metallur Min Soc 52:119–1139)

    Google Scholar 

  • Kuhn W (2002) Modeling the semantics of geographic categories through conceptual integration. In: Geographic information science: second international conference, GIScience, pp 108–118

  • Kulik L (2001) A geometric theory of vague boundaries based on supervaluation. In: Spatial information theory. Foundations of geographic information science : international conference, COSIT 2001. Springer, Berlin/Heidelberg, pp 44–59

  • Lakoff G, Johnson M (1980) Metaphors we live by. The University of Chicago Press, Chicago

    Google Scholar 

  • Lam KM, Yan H (1994) Fast greedy algorithm for active contours. Electron Lett 30(1):21–23

    Article  Google Scholar 

  • Landau B, Jackendoff R (1993) “What” and “where” in spatial language and spatial cognition. Behav Brain Sci 16(2):217–238

    Article  Google Scholar 

  • Laurini R, Pariente D (1996) Towards a field-oriented language: First specifications. In: Geographic objects with indeterminate boundaries. Taylor and Francis, London, pp 225–236

  • Leidner J (2007) Toponym resolution in text: annotation, evaluation and applications of spatial grounding of place names. PhD thesis, School of Informatics, Edinburgh, UK

  • Levinson S (2003) Space in language and cognition: explorations in cognitive diversity. CUP, Cambridge

    Book  Google Scholar 

  • Levinson S, Kita S, Haun D, Rasch B (2002) Returning the tables: language affects spatial reasoning. Cogn Tech Work 84(2):155–188

    Google Scholar 

  • Li P, Gleitman L (2002) Turning the tables: language and spatial reasoning. Cogn Tech Work 83(3):265–294

    CAS  Google Scholar 

  • Liu Y, Goodchild M, Guo Q, Tian Y, Wu L (2008) Towards a general field model and its order in gis. Int J Geogr Inf Sci 22(6):623–643

    Article  Google Scholar 

  • Liu Y, Guo Q, Wieczorek J, Goodchild M (2009) Positioning localities based on spatial assertions. Int J Geogr Inf Sci 23(11):1471–1501

    Article  Google Scholar 

  • Liu Y, Yuan Y, Xiao D, Zhang Y, Hu J (2010) A point-set-based approximation for areal objects: a case study of representing localities. Comput Environ Urban Syst 34(1):28–39

    Article  Google Scholar 

  • Lodge D (1984) Small world. Penguin Books, New York

    Google Scholar 

  • Mark D (1989) Cognitive image-schemata for geographic information: relations to user views and gis interfaces. In: Proceedings GIS/LIS’89, pp 551–560

  • Mark D, Frank A (1995) Experiential and formal models of geographic space. Environ Plan 23:3–24

    Google Scholar 

  • Mark D, Turk A, Stea D (2007) Progress on yindjibarndi ethnophysiography. In: Spatial information theory, 8th International Conference, COSIT 2007, Melbourne, Australia, September 19-23, 2007, Proceedings, pp 1–19

  • Matheron G (1962) Traité de géostatistique appliquée. Mémoires du Bureau de Recherches Géologiques et Minières 14

  • Miller G, Johnson-Laird P (1976) Language and Perception. Cambridge University Press, Cambridge

    Google Scholar 

  • Morrow D, Clark H (1988) Interpreting words in spatial descriptions. Lang Cogn Process 3:275–291

    Article  Google Scholar 

  • Mukerjee A, Gupta K, Nautiyal S, Singh M, Mishra N (2000) Conceptual description of visual scenes from linguistic models. Image Vis Comput 18(2):173–187

    Article  Google Scholar 

  • Parsons S (1996) Current approaches to handling imperfect information in data and knowledge bases. Knowl Data Eng 3(8):353–372

    Article  Google Scholar 

  • Pfoser D, Tryfona N, Jensen C (2005) Indeterminacy and spatiotemporal data: basic definitions and case study. GeoInformatica 9(3):211–236

    Article  Google Scholar 

  • Power C, Simms A, White R (2001) Hierarchical fuzzy pattern matching for the regional comparison of land use maps. Int J Geogr Sci 15:77–100

    Article  Google Scholar 

  • Purves R, Clough P, Joho H (2005) Identifying imprecise regions for geographic information retrieval using the web. In: GISRUK’05, pp 313–318

  • Randell D, Cui Z, Cohn A (1992) A spatial logic based on regions and connection. In: KR’92. Principles of knowledge representation and reasoning: proceedings of the third international conference. Morgan Kaufmann, pp 165–176

  • Raubal M, Worboys M (1999) A formal model of the process of wayfinding in built environments. In: Spatial information theory. cognitive and computational foundations of geographic information science: international conference COSIT’99, Stade, Germany, August 1999. Proceedings, Springer, Berlin/Heidelberg. Lect Notes Comput Sci 1661:748

  • Ravin Y, Wacholder N (1996) Extracting names from natural-language text. Tech. Rep. Report 20338, IBM Research

  • Robinson V (2000) Individual and multipersonal fuzzy spatial relations acquired using human–machine interaction. Fuzzy Sets Syst 113(1):133–145

    Article  Google Scholar 

  • Robinson V (2003) A perspective on the fundamentals of fuzzy sets and their use in geographic information systems. Trans GIS 7(1):3–30

    Article  Google Scholar 

  • Sapir E (1929) The status of linguistics as a science. Language 5

  • Schneider M (1996) Modelling spatial objects with undetermined boundaries using the realm/rose approach. In: Geographic objects with indeterminate boundaries, vol 2. Taylor and Francis, London, pp 141–152

  • Schneider M (2000) Finite resolution crisp and fuzzy spatial objects. In: International symposium on spatial data handling, pp 3–17

  • Schneider M (2001) A design of topological predicates for complex crisp and fuzzy regions. In: Conceptual modeling—ER 2001, Springer, Berlin/Heidelberg. Lect Notes Comput Sci 2224:103

  • Schockaert S, de Cock M, Kerre E (2008) Location approximation for local search services using natural language hints. Int J Geogr Inf Sci 22(3):315–336

    Article  Google Scholar 

  • Smart P, Jones C, Twaroch F (2010) Multi-source toponym data integration and mediation for a meta-gazetteer service. In: GISCIENCE 2010 (forthcoming)

  • Smith B, Varzi A (1997) Fiat and bona fide boundaries: towards an ontology of spatially extended objects. In: Spatial Information theory: a theoretical basis for GIS (COSIT’97). Lect Notes Comput Sci 1329:103–119

  • Smith D, Crane G (2001) Disambiguating geographic names in a historical digital library. In: Research and advanced technology for digital libraries: fifth European conference (ECDL 2001), pp 127–136

  • Srihari R, Rapaport W (1990) Combining linguistic and pictorial information: using captions to interpret newspaper photographs. In: Current trends in SNePS—semantic network processing system, no. 437/1990 in Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, pp 85–96

  • Steiniger S, Meier S (2004) Snakes: a technique for line smoothing and displacement in map generalisation. In: ICA workshop on generalisation and multiple representation

  • Talmy L (1983) How language structures space. In: Spatial orientation. Plenum, New York, pp 225–282

  • Tang X (2004) Spatial object model[l]ing in fuzzy topological spaces : with applications to land cover change. PhD thesis, University of Twente, Enschede

  • Terzopoulos D (1986) Regularization of inverse visual problems involving disontinuities. IEEE Transactions PAMI-8, p 413

  • Tversky B, Lee P (1998) How space structures language. In: Spatial cognition: an interdisciplinary approach to representing and processing spatial knowledge, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, p 157

  • Vorwerg C, Rickheit G (1998) Typicality effects in the categorization of spatial relations. In: Spatial cognition: an interdisciplinary approach to representing and processing spatial knowledge, Lecture Notes in Computer Science, vol 1404, Springer Berlin/Heidelberg, pp 203–222

  • Wang F, Hall G (1996) Fuzzy representation of geographical boundaries in gis. Int J Geogr Inf Sci 10(5):573–590

    Article  Google Scholar 

  • Whorf B, Carroll J, Chase S (1956) Language, thought, and reality: selected writings of Benjamin Lee Whorf. MIT, Cambridge

    Google Scholar 

  • Wieczorek J, Guo Q, Hiimans R (2004) The point-radius method for georeferencing locality descriptions and calculating associated uncertainty. Int J Geogr Inf Sci 18(8):745–767

    Article  Google Scholar 

  • Winter S (2000) Uncertain topological relations between imprecise regions. Int J Geogr Inf Sci 14(5):411–430

    Article  Google Scholar 

  • Worboys M (2001) Nearness relations in environmental space. Int J Geogr Inf Sci 15(7):633–651

    Article  Google Scholar 

  • Worboys M, Duckham M, Kulik L (2004) Commonsense notions of proximity and direction in environmental space. Spatial Cogn Comput 4(4):285–312

    Article  Google Scholar 

  • Xie X, Mirmehdi M (2006) Magnetostatic field for the active contour model: a study in convergence. In: Proceedings of the 17th British machine vision conference, pp 127–136

  • Yamada A, Yamamoto T, Ikeda H, Nishida T, Doshita S (1992) Reconstructing spatial image from natural language texts. In: Proceedings COLING-92, vol 4, pp 1279–1283

  • Zadeh L (1965) Fuzzy sets. Inf Control 8:338–353

    Article  Google Scholar 

Download references

Acknowledgments

We would like to gratefully acknowledge contributors to Geograph British Isles (see http://www.geograph.org.uk/credits/2007-02-24), whose work is made available under the following Creative Commons Attribution-ShareAlike 2.5 Licence (http://creativecommons.org/licenses/by-sa/2.5/). This material is based upon work supported by the European Community in the TRIPOD (FP6 cr \(\hbox{n}^\circ045335\)) project. We would also like to thank the two reviewers, whose comments and suggestions helped focus the ideas presented in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark M. Hall.

Appendix: Technical details of the vague field

Appendix: Technical details of the vague field

Definition

Conceptually, the vague field is a two-dimensional, unbounded, continuous scalar field defined on a external coordinate system. Computationally, it is impossible to store an unbounded, continuous field; therefore, in its internal representation, the vague field is a bounded, discretised, floating-point field which can easily be stored in a two-dimensional, floating-point matrix. This matrix which forms the foundation for the vague field is augmented with further attributes that are required for the instantiation and processing of the vague field.

To enable the translation between the external coordinate system and the internal matrix representation, the field stores an external and an internal anchor location. The external anchor represents the location that the vague field is defined as being relative to in the external coordinate system. For the spatial preposition fields, this is the location of the ground toponym, such as the location of “Cardiff” in the case of the vague field for “near Cardiff”. The internal anchor represents the point where the field is attached to the external anchor location. The values in the internal matrix are always to be interpreted as specifying the vague phenomenon’s applicability relative to this internal anchor location. The internal and external anchors are used in the read function to translate between the external and internal coordinate systems (“Appendix”—“Accessing the field values”).

Instantiation

The experiment described in the previous section resulted in a sparse set of measurement points and an applicability value for each measurement point. An interpolation using ordinary kriging is used to transform these point measurements into the continuous field representation. Ordinary kriging was developed in the geostatistics field to estimate the distribution of natural resources based on a set of point measurements (Krige 1951; Matheron (1962; Hengl 2007). To calculate the vague field, a grid is placed over the area defined by the measurement points and the extent of each grid cell defined by the field’s desired scale. The interpolated value for each cell is then calculated using a weighted average as shown in Eq. 9. The advantage of ordinary kriging over other distance-based interpolations is that the weighting values λ are automatically derived from the values and spatial distribution of the measurement points p (Fig. 23). The interpolated results are in the range [1, 9] and in a final step are normalised to the [0, 1] range (Eq. 10) where kriging[xy] is the result matrix produced by the kriging algorithm.

$$ kriging[x,y] = \sum_{i=0}^{n} \lambda \cdot \hbox{value}\left(p_i\right) $$
(9)
$$ values[x, y] = \frac{kriging[x, y] - 1}{\max(kriging - 1)} $$
(10)
Fig. 23
figure 23

The source point measurement data and the field calculated using ordinary kriging. The darker the field, the higher the applicability value at that point

The quality of the interpolation depends on the number of measurement points, and even for kriging, the number of measurement points as derived from the human-subject experiment is low. The effect of that is that the fitted variogram model is less stable. To increase the number of measurement points and thus the quality of the resulting field, additional measurement points were created based on the properties that the analysis described in “Background” revealed. For the cardinal directions where direction plays a the primary role, the measurement locations were mirrored across the cardinal direction’s primary axisFootnote 17 (Fig. 24). With “near” the analysis showed that angle played no significant role, it was thus possible to mirror the measurement locations across both axis, effectively quadrupling the number of measurement locations (Fig. 25) and making the resulting field more stable.

Fig. 24
figure 24

The original measurement points for “north” on the left and the mirrored, duplicated measurement points on the right

Fig. 25
figure 25

The original measurement points for “north” on the left and the quadrupled set of measurement points that is used in the final implementation on the right

Accessing the field values

The read operation provides access to the vague field’s applicability values. It translates between the external, unbounded, continuous representation and the internal, discrete, bounded field-value matrix. The translation from the external to the internal representation is performed using the internal and external anchor locations. The offsets of the external x and y coordinates relative to the external anchor location are calculated and then using the field’s scale value transformed into internal offset coordinates. These internal offset coordinates are then added to the internal anchor’s coordinates to determine the internal coordinates. The internal coordinates are then used to read a value from the field matrix, which is returned.

Combining fields

The field combination calculation (Eq. 1) is performed every time the combined field is accessed. While this is a computationally expensive approach, it has the advantage that fields of any scale can be combined without having to align their internal matrix representations, as the fields are accessed through the read function and can thus be treated as continuous and scale-free.

The normalisation factor nf is defined as the maximum combined field value and is calculated by placing a virtual grid over all fields, calculating the value at each grid point and taking the maximum of these values. One problem with this approach is that if the source fields’ cells overlap as shown in Fig. 26, then none of the maximum measurement points actually measure the combined maximum. This means that if the combined field is read at a location that would produce the actual maximum, then the calculated value would be larger than the normalisation factor and the resulting value would be larger than 1, which is not allowed. To avoid this, if the combined value is larger than the normalisation factor, then a value of 1 is returned, regardless of what the actual measurement value is. While this may seem to skew the data, if all the source fields are continuous, as is usually the case, then the difference between the calculated and the actual maximum is very small and can be disregarded.

Fig. 26
figure 26

Three fields that overlap at their boundaries and where none of the measurement points used to calculate the field maximum (small white circles) measure the actual maximum where the three fields overlap

Crisping the vague field

The crisp operation is used to transform the continuous vague field into a crisp polygon for integration with existing GI systems and algorithms and is based on active contours.

Active contours

The concept of active contours was introduced by Kass et al. (1988) as a method of finding boundaries in image data, but have also been used in GIS for various purposes (Burghardt 2005; Steiniger and Meier 2004; Horvath et al. 2009). They are defined as controlled continuity splines (Terzopoulos 1986) upon which image and external forces act to move them into the desired shape. In the original method, the energies acting upon the active contour are defined as in Eq. 11, consisting of an internal energy, the image energy and external constraint energy, which the active contour then tries to minimise.

$$ \begin{aligned} E_{snake}^* &= \int\limits_0^1 E_{snake}(v(s)) ds \\ &=\int\limits_0^1 E_{int}(v(s)) + E_{image}(v(s)) + E_{con}(v(s)) ds \end{aligned} $$
(11)

The internal energy acts to maintain the active contour’s shape, image energy can be defined via the image intensity (Kass et al. 1988), image gradient (Lam and Yan 1994), or via more complex methods (Xie and Mirmehdi 2006), and the external energy defines constraints that the active contour needs to observe that are not directly defined by the active contour itself or the image data. An iterative method on a grid is used to move the active contour’s control points to their final solution. For each control point, the minimum energy neighbour is calculated, and the control point moved there immediately. The energy calculation for the next control point will thus take into account the updated position of the previous control point. This is repeated until the active contour’s final shape is found and means that the active contour will achieve a locally minimal solution, but not necessarily a globally minimal solution. Due to this iterative way of moving, active contours are often also referred to as snakes, as they seem to slither across their processing space (Fig. 30).

Crisping fields with active contours

To enable the use of active contours in creating a crisp representation of the vague field, a slightly modified energy function is used (Eq. 12). The first two energies, internal and field, are similar to the internal and image energies as defined earlier. The contract energy is an external energy that pulls the active contour towards the centre of the field.

$$ E_{snake} = \alpha \cdot E_{int} + \beta \cdot E_{field} + \gamma \cdot E_{contract} $$
(12)

Each energy is defined as a vector field, with the direction of each cell’s vector defining the direction in which an active contour control point at that location would be pushed (Fig. 27). The length of the cell’s vector defines how strongly the control point is being pushed in the specified direction; thus, the energy function (Eq. 12) can be implemented as a simple vector addition, with the final vector defining the direction the control point will move.

Fig. 27
figure 27

The three vectors that define the direction the control point will move. The dashed line represents the field’s vector, the dotted line is the contraction field vector and the small final arrow is the internal energy vector. In the left-hand case, the control point will be moved one grid cell in the direction indicated, while in the right-hand case, the control point will not move, as a local minimum has been found for that control point

In this framework, the internal energy (Eq. 13) is defined as the vector from the control point (p i ) to the centre point between the preceeding (p i-1) and following (p i+1) control points (Fig. 28). This definition ensures that the control points always move so as to create a snake where the control points are evenly spread, since the further a control point moves towards its predecessor and further away from is successor, the stronger it will be pulled towards the successor.

$$ E_{int} = \left(v_{prev} + \frac{v_{next} - v_{prev}}{2}\right) - v_{current} $$
(13)
Fig. 28
figure 28

The internal energy is defined as the vector from the current point (p i ) to the half-way point between the previous (p i-1) and the next control point (p i+1)

The scalar vague field is transformed into the required vector representation by applying the gradient operator. The gradient operator defines each cell’s vector so that it points in the direction of the neighbouring cell with the lowest scalar value. The length of the vector is determined by the value of the current cell, unless the minimum is equal to the cell’s value in which case the vector’s length is set to 0 as the cell is a local minimum (Fig. 29).

Fig. 29
figure 29

The vague field for ”near“ and a simplified representation of its gradient field

Fig. 30
figure 30

The active contour moving from the initial location to its final solution on a field for ”near“. To illustrate the principle, the snake was initialised at α = 0.4

The contraction energy is used to define how far the active contour will contract. It is defined as a constant vector field of the same extent as the vague field that is being crisped, with each cell’s vector pointing towards the centre of the vague field and of length 1. The centre of the vague field is defined as the centroid of all cells with the maximum value. This guarantees that the active contour will contract towards the strongest part of the field.

In the active contour energy function (Eq. 12), each component energy has a weight attached to it, to define their relative influences on the total energy. The weights have been tuned experimentally and in the results shown in this section are \(\alpha=0.2745,\,\beta=1,\,\gamma=0.4314\). The weights were chosen so as to create crisp polygons that had extents that roughly matched the angles and distances observed in the initial Geograph experiment (Hall and Jones 2008). The snake is initialised and terminated so as to minimise the number of iterations it has to run through, while guaranteeing a valid result. Details of the methods used to enable this can be found in Hall and Jones (2009).

Rights and permissions

Reprints and permissions

About this article

Cite this article

M. Hall, M., Smart, P.D. & Jones, C.B. Interpreting spatial language in image captions. Cogn Process 12, 67–94 (2011). https://doi.org/10.1007/s10339-010-0385-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10339-010-0385-5

keywords

Navigation