Interpreting spatial language in image captions

M. Hall, Mark; Smart, Philip D.; Jones, Christopher B.

doi:10.1007/s10339-010-0385-5

Interpreting spatial language in image captions

Research Report
Published: 14 December 2010

Volume 12, pages 67–94, (2011)
Cite this article

Cognitive Processing Aims and scope Submit manuscript

Mark M. Hall¹,
Philip D. Smart¹ &
Christopher B. Jones¹

425 Accesses
14 Citations
Explore all metrics

Abstract

The map as a tool for accessing data has become very popular in recent years, but a lot of data do not have the necessary spatial meta-data to allow for that. Some data such as photographs however have spatial information in their captions and if this could be extracted, then they could be made available via map-based interfaces. Towards this goal, we introduce a model and spatio-linguistic reasoner for interpreting the spatial information in image captions that is based upon quantitative data about spatial language use acquired directly from people. Spatial language is inherently vague, and both the model and reasoner have been designed to incorporate this vagueness at the quantitative level and not only qualitatively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://www.geograph.org.uk.
http://www.locr.com.
http://www.flickr.com.
A toponym is a place with a recognisable name used in communication, where a “place” is often defined simply as a meaningful geographic location (Goodchild and Hill 2008).
There is also the possibility of using an approach based on supervaluation (see Bennet 2001 or Kulik 2001, which allows for the creation of a set of boundaries that describe the gradual transition from the definite to the definitely not area, similar to iso-lines used to represent height on conventional maps. The fuzzy methodology seems to be more frequently used and thus is given prominence.
http://www.geograph.org.uk.
To avoid a bias being introduced by a small group of frequent contributors producing most of the captions, only one caption per contributor was considered. This reduces the number of captions from around 350,000 to 580.
See “Appendix” for the technical aspects of the vague field.
See “Appendix” for details on how it is calculated.
As Lodge (1984) phrases it “It’s not so easy, every decoding is a new encoding”.
This can be seen in the evaluation results in “Evaluation experiment” where the low inter-participant agreement indicates the large number of different possible encodings and decodings.
Named Entity Recognition is the task of identifying noun phrases that refer to specific individuals whether these be people, companies, dates, ....
http://developer.yahoo.com/geo/geoplanet/.
An example of how this additional information might help is in the caption “Flowers near Stackpole Head”, where Stackpole Head is a coastal headland, and the inclusion of the knowledge that the photograph is of “Flowers” could restrict the probable area where the photograph was taken to the land-side, whereas if the subject were “Sailing boat” then the sea-facing “near” area would be more likely. This kind of knowledge processing was decided to be beyond scope of this paper and thus the information is discarded.
The results for “east of” will not be reported as they are analogous to the “north of” results and the “between” results due to space constraints.
0 and 1—high agreement, 2—medium agreement, or 3—higher–low agreement.
For north and south this is the vertical axis, for east and west the horizontal axis.

References

Ahlqvist O, Keukelaar J, Oukbir K (1998) Using rough classification to represent uncertainty in spatial data. In: Proceedings of the SIRC Colloquium, pp 1–9
Altman D (1994) Fuzzy set theoretic approaches for handling imprecision in spatial analysis. Int J Geogr Inf Sci 8(3):271–289
Article Google Scholar
Andogah G, Bouma G, Nerbonne J, Koster E (2008) Placename ambiguity resolution. In: LREC workshop on methodologies and resources for processing spatial language
Bennet B (2001) Application of supervaluation semantics to vaguely defined spatial concepts. In: Spatial information theory. foundations of geographic information science : international conference, COSIT 2001 Morro Bay, CA, USA, September 19–23, 2001. Proceedings, pp 108–123
Bennett B, Agarwal P (2007) Semantic categories underlying the meaning of ’place’. In: Spatial information theory, 8th international conference, COSIT 2007, Melbourne, Australia, September 19–23, 2007, Proceedings, pp 78–95
Bittner T, Stell J (2003) Stratified rough sets and vagueness. In: Spatial information theory, Springer, Berlin/Heidelberg, pp 270–286
Bowerman M, Choi S (2003) Space under construction: language-specific spatial categorization in first language acquisition. In: Gentner D, Goldin-Meadow S (eds) Language in mind. MIT, Cambridge, pp 387–428
Google Scholar
Brown P (1994) The ins and ons of tzeltal locative expressions. Linguistics 32:743–790
Article Google Scholar
Burghardt D (2005) Controlled line smoothing by snakes. GeoInformatica 9(3):237–252
Article Google Scholar
Buscaldi D, Rosso P (2008) Map-based vs. knowledge-based toponym disambiguation. In: Proceeding of the 2nd international workshop on geographic information retrieval.GIR’08, pp 19–22
Chomsky N (1965) Aspects of the theory of syntax. MIT, Cambridge
Google Scholar
Clementini E, Felice PD (1996) An algebraic model for spatial objects with indeterminate boundaries. In: Geographic objects with indeterminate boundaries. Taylor and Francis, London, pp 155–169
Clementini E, Felice PD (1997) Approximate topological relations. Int J Approx Reason 16(2):173–204
Article Google Scholar
Cohn A, Gotts N (1996a) The ’egg-yolk’ representation of regions with indeterminate boundaries. In: Proceedings, GISDATA specialist meeting on geographical objects with undetermined boundaries, Francis Taylor, pp 171–187
Cohn A, Gotts N (1996b) Representing spatial vagueness: a merological approach. In: KR’96: principles of knowledge representation and reasoning. Morgan Kaufmann, San Mateo, pp 230–241
Couclelis H (1992) People manipulate objects (but cultivate fields): Beyond the raster-vector debate in gis. In: Theories and methods of spatio-temporal reasoning in geographic space, vol 639/1992, Springer, Berlin/Heidelberg, pp 65–77
Couclelis H, Gottsegen J (1997) What maps mean to people: Denotation, connotation, and geographic visualization in land-use debates. In: Spatial information theory: a theoretical basis for GIS (COSIT’97), vol 1329/1997, Springer, Berlin/Heidelberg, pp 151–162
Coventry K, Prat-Sala M, Richards L (2001) The interplay between geometry and function in the comprehension of over, under, above and below. J Memory Lang 44(3):376–398
Article Google Scholar
Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: a framework and graphical development environment for robust nlp tools and applications. In: Proceedings of the 40th anniversary meeting of the association for computational linguistics, pp 168–175
Edwardes A, Purves R (2007) A theoretical grounding for semantic descriptions of place. Lect Notes Comput Sci 4857:106
Google Scholar
Egenhofer M (1991) Reasoning about binary topological relations. In: Second symposium on large spatial databases, lecture notes in computer science, vol 525. Springer, pp 143–160
Erwig M, Schneider M (1997) Partition and conquer. In: COSIT ’97: Proceedings of the international conference on spatial information theory. Springer, London, pp 389–407
Fabrikant S, Buttenfield B (2001) Formalizing semantic spaces for information access. Ann Assoc Am Geogr 91(2):263–280
Article Google Scholar
Fisher P (2000) Sorites paradox and vague geographies. Fuzzy Sets Syst 113(1):7–18
Article Google Scholar
Fisher P, Wood J, Cheng T (2004) Where is helvellyn? Fuzziness of multi-scale landscape morphometry. Trans Inst Br Geogr 29(1):106–128
Article Google Scholar
Fisher PF, Orf TM (1991) An investigation of the meaning of near and close on a university campus. Comput Environ Urban Syst 15(1-2):23–35. doi:10.1016/0198-9715(91)90043-D
Article Google Scholar
Frank AU, Raubal M (1999) Formal specification of image schemata—a step towards interoperability in geographic information systems. Spatial Cogn Comput 1(1):67–101
Article Google Scholar
Friedman C, Kra P, Yu H, Krauthammer M, Rhzetsky A (2001) Genies: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(1):74–82
Google Scholar
Fuhr T, Socher G, Scheering C, Sagerer G (1995) A three-dimensional spatial model for the interpretation of image data. In: IJCAI-95 Workshop on the representation and processing of spatial expressions, pp 93–102
Gahegan M (1995) Proximity operators for qualitative spatial reasoning. In: Spatial information theory: a theoretical basis for GIS. Springer, Berlin/Heidelberg, pp 31–44
Gapp K (1994) Basic meanings of spatial relations: Computation and evaluation in 3d space. In: National conference on artificial intelligence, pp 1393–1398
Gärdenfors P (2000) Conceptual spaces: the geometry of thought. MIT Press, Cambridge
Google Scholar
Garrod S, Ferrier G, Campbell S (1999) In and on: investigating the functional geometry of spatial prepositions. Cogn Tech Work 72(2):167–189
CAS Google Scholar
Goodchild M (1992) Geographical data modeling. Comput Geosci 18(4):401–408
Article Google Scholar
Goodchild M, Hill L (2008) Introduction to digital gazetteer research. Int J Geogr Inf Sci 22(10):1039–1044
Article Google Scholar
Guo Q, Liu Y, Wieczorek J (2008) Georeferencing locality descriptions and computing associated uncertainty using a probabilistic approach. Int J Geogr Inf Sci 22(10):1067–1090
Article Google Scholar
Güting R, Schneider M (1993) Realms: A foundation for spatial data types in database systems. In: Proceedings of the 3rd international symposium on large databases, pp 33–44
Hall M, Jones C (2008) Quantifying spatial prepositions: an experimental study. In: Proceedings of the ACM GIS’08, pp 451–454
Hall MM, Jones CB (2009) Initialising and terminating active contours for vague field crisping. In: GISRUK 2009, pp 395–397
Hall S (1980) Encoding/decoding. In: For Contemporary Cultural Studies C (ed) Culture, media, language: working papers in cultural studies 1972–79. Hutchinson, London, pp. 128–138
Hengl T (2007) A practical guide to geostatistical mapping of environmental variables
Herskovits A (1986) Language and spatial cognition: an interdisciplinary study of prepositions in English. Cambridge University Press, Cambridge
Google Scholar
Horvath P, Jermyn I, Kato Z, Zerubia J (2009) A higher-order active contour model of a ’gas of circles’ and its application to tree crown extraction. Pattern Recogn Lett 42(5):699–709
Google Scholar
Hwang S, Thill JC (2005) Modeling localities with fuzzy sets and gis. Fuzzy modeling with spatial information for geographic problems, pp 71–104
Johnson M (1987) The body in the mind. University of Chicago Press, Chicago
Google Scholar
Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vis 1(4):321–331
Article Google Scholar
Kemmerer D (2006) The semantics of space: integrating linguistic typology and cognitive neuroscience. Neuropsychologia 44(9):1607–1621
Article PubMed Google Scholar
Kemmerer D, Tranel D (2000) A double dissociation between linguistic and perceptual representations of spatial relationships. Cogn Neuropsychol 17(5):393–414
Article CAS PubMed Google Scholar
Klippel A, Montello D (2007) Linguistic and nonlinguistic turn direction concepts. In: Spatial information theory, 8th international conference, COSIT 2007, Melbourne, Australia, September 19–23, 2007, Proceedings, pp 354–372
Klir G, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice-Hall, Englewood Cliffs
Google Scholar
Krige D (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J Chem Metallur Min Soc 52:119–1139)
Google Scholar
Kuhn W (2002) Modeling the semantics of geographic categories through conceptual integration. In: Geographic information science: second international conference, GIScience, pp 108–118
Kulik L (2001) A geometric theory of vague boundaries based on supervaluation. In: Spatial information theory. Foundations of geographic information science : international conference, COSIT 2001. Springer, Berlin/Heidelberg, pp 44–59
Lakoff G, Johnson M (1980) Metaphors we live by. The University of Chicago Press, Chicago
Google Scholar
Lam KM, Yan H (1994) Fast greedy algorithm for active contours. Electron Lett 30(1):21–23
Article Google Scholar
Landau B, Jackendoff R (1993) “What” and “where” in spatial language and spatial cognition. Behav Brain Sci 16(2):217–238
Article Google Scholar
Laurini R, Pariente D (1996) Towards a field-oriented language: First specifications. In: Geographic objects with indeterminate boundaries. Taylor and Francis, London, pp 225–236
Leidner J (2007) Toponym resolution in text: annotation, evaluation and applications of spatial grounding of place names. PhD thesis, School of Informatics, Edinburgh, UK
Levinson S (2003) Space in language and cognition: explorations in cognitive diversity. CUP, Cambridge
Book Google Scholar
Levinson S, Kita S, Haun D, Rasch B (2002) Returning the tables: language affects spatial reasoning. Cogn Tech Work 84(2):155–188
Google Scholar
Li P, Gleitman L (2002) Turning the tables: language and spatial reasoning. Cogn Tech Work 83(3):265–294
CAS Google Scholar
Liu Y, Goodchild M, Guo Q, Tian Y, Wu L (2008) Towards a general field model and its order in gis. Int J Geogr Inf Sci 22(6):623–643
Article Google Scholar
Liu Y, Guo Q, Wieczorek J, Goodchild M (2009) Positioning localities based on spatial assertions. Int J Geogr Inf Sci 23(11):1471–1501
Article Google Scholar
Liu Y, Yuan Y, Xiao D, Zhang Y, Hu J (2010) A point-set-based approximation for areal objects: a case study of representing localities. Comput Environ Urban Syst 34(1):28–39
Article Google Scholar
Lodge D (1984) Small world. Penguin Books, New York
Google Scholar
Mark D (1989) Cognitive image-schemata for geographic information: relations to user views and gis interfaces. In: Proceedings GIS/LIS’89, pp 551–560
Mark D, Frank A (1995) Experiential and formal models of geographic space. Environ Plan 23:3–24
Google Scholar
Mark D, Turk A, Stea D (2007) Progress on yindjibarndi ethnophysiography. In: Spatial information theory, 8th International Conference, COSIT 2007, Melbourne, Australia, September 19-23, 2007, Proceedings, pp 1–19
Matheron G (1962) Traité de géostatistique appliquée. Mémoires du Bureau de Recherches Géologiques et Minières 14
Miller G, Johnson-Laird P (1976) Language and Perception. Cambridge University Press, Cambridge
Google Scholar
Morrow D, Clark H (1988) Interpreting words in spatial descriptions. Lang Cogn Process 3:275–291
Article Google Scholar
Mukerjee A, Gupta K, Nautiyal S, Singh M, Mishra N (2000) Conceptual description of visual scenes from linguistic models. Image Vis Comput 18(2):173–187
Article Google Scholar
Parsons S (1996) Current approaches to handling imperfect information in data and knowledge bases. Knowl Data Eng 3(8):353–372
Article Google Scholar
Pfoser D, Tryfona N, Jensen C (2005) Indeterminacy and spatiotemporal data: basic definitions and case study. GeoInformatica 9(3):211–236
Article Google Scholar
Power C, Simms A, White R (2001) Hierarchical fuzzy pattern matching for the regional comparison of land use maps. Int J Geogr Sci 15:77–100
Article Google Scholar
Purves R, Clough P, Joho H (2005) Identifying imprecise regions for geographic information retrieval using the web. In: GISRUK’05, pp 313–318
Randell D, Cui Z, Cohn A (1992) A spatial logic based on regions and connection. In: KR’92. Principles of knowledge representation and reasoning: proceedings of the third international conference. Morgan Kaufmann, pp 165–176
Raubal M, Worboys M (1999) A formal model of the process of wayfinding in built environments. In: Spatial information theory. cognitive and computational foundations of geographic information science: international conference COSIT’99, Stade, Germany, August 1999. Proceedings, Springer, Berlin/Heidelberg. Lect Notes Comput Sci 1661:748
Ravin Y, Wacholder N (1996) Extracting names from natural-language text. Tech. Rep. Report 20338, IBM Research
Robinson V (2000) Individual and multipersonal fuzzy spatial relations acquired using human–machine interaction. Fuzzy Sets Syst 113(1):133–145
Article Google Scholar
Robinson V (2003) A perspective on the fundamentals of fuzzy sets and their use in geographic information systems. Trans GIS 7(1):3–30
Article Google Scholar
Sapir E (1929) The status of linguistics as a science. Language 5
Schneider M (1996) Modelling spatial objects with undetermined boundaries using the realm/rose approach. In: Geographic objects with indeterminate boundaries, vol 2. Taylor and Francis, London, pp 141–152
Schneider M (2000) Finite resolution crisp and fuzzy spatial objects. In: International symposium on spatial data handling, pp 3–17
Schneider M (2001) A design of topological predicates for complex crisp and fuzzy regions. In: Conceptual modeling—ER 2001, Springer, Berlin/Heidelberg. Lect Notes Comput Sci 2224:103
Schockaert S, de Cock M, Kerre E (2008) Location approximation for local search services using natural language hints. Int J Geogr Inf Sci 22(3):315–336
Article Google Scholar
Smart P, Jones C, Twaroch F (2010) Multi-source toponym data integration and mediation for a meta-gazetteer service. In: GISCIENCE 2010 (forthcoming)
Smith B, Varzi A (1997) Fiat and bona fide boundaries: towards an ontology of spatially extended objects. In: Spatial Information theory: a theoretical basis for GIS (COSIT’97). Lect Notes Comput Sci 1329:103–119
Smith D, Crane G (2001) Disambiguating geographic names in a historical digital library. In: Research and advanced technology for digital libraries: fifth European conference (ECDL 2001), pp 127–136
Srihari R, Rapaport W (1990) Combining linguistic and pictorial information: using captions to interpret newspaper photographs. In: Current trends in SNePS—semantic network processing system, no. 437/1990 in Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, pp 85–96
Steiniger S, Meier S (2004) Snakes: a technique for line smoothing and displacement in map generalisation. In: ICA workshop on generalisation and multiple representation
Talmy L (1983) How language structures space. In: Spatial orientation. Plenum, New York, pp 225–282
Tang X (2004) Spatial object model[l]ing in fuzzy topological spaces : with applications to land cover change. PhD thesis, University of Twente, Enschede
Terzopoulos D (1986) Regularization of inverse visual problems involving disontinuities. IEEE Transactions PAMI-8, p 413
Tversky B, Lee P (1998) How space structures language. In: Spatial cognition: an interdisciplinary approach to representing and processing spatial knowledge, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, p 157
Vorwerg C, Rickheit G (1998) Typicality effects in the categorization of spatial relations. In: Spatial cognition: an interdisciplinary approach to representing and processing spatial knowledge, Lecture Notes in Computer Science, vol 1404, Springer Berlin/Heidelberg, pp 203–222
Wang F, Hall G (1996) Fuzzy representation of geographical boundaries in gis. Int J Geogr Inf Sci 10(5):573–590
Article Google Scholar
Whorf B, Carroll J, Chase S (1956) Language, thought, and reality: selected writings of Benjamin Lee Whorf. MIT, Cambridge
Google Scholar
Wieczorek J, Guo Q, Hiimans R (2004) The point-radius method for georeferencing locality descriptions and calculating associated uncertainty. Int J Geogr Inf Sci 18(8):745–767
Article Google Scholar
Winter S (2000) Uncertain topological relations between imprecise regions. Int J Geogr Inf Sci 14(5):411–430
Article Google Scholar
Worboys M (2001) Nearness relations in environmental space. Int J Geogr Inf Sci 15(7):633–651
Article Google Scholar
Worboys M, Duckham M, Kulik L (2004) Commonsense notions of proximity and direction in environmental space. Spatial Cogn Comput 4(4):285–312
Article Google Scholar
Xie X, Mirmehdi M (2006) Magnetostatic field for the active contour model: a study in convergence. In: Proceedings of the 17th British machine vision conference, pp 127–136
Yamada A, Yamamoto T, Ikeda H, Nishida T, Doshita S (1992) Reconstructing spatial image from natural language texts. In: Proceedings COLING-92, vol 4, pp 1279–1283
Zadeh L (1965) Fuzzy sets. Inf Control 8:338–353
Article Google Scholar

Download references

Acknowledgments

We would like to gratefully acknowledge contributors to Geograph British Isles (see http://www.geograph.org.uk/credits/2007-02-24), whose work is made available under the following Creative Commons Attribution-ShareAlike 2.5 Licence (http://creativecommons.org/licenses/by-sa/2.5/). This material is based upon work supported by the European Community in the TRIPOD (FP6 cr $\hbox{n}^\circ045335$) project. We would also like to thank the two reviewers, whose comments and suggestions helped focus the ideas presented in this paper.

Author information

Authors and Affiliations

Cardiff School of Computer Science & Informatics, Cardiff University, Queen’s Buildings 5, The Parade, Roath, Cardiff, CF24 3AA, UK
Mark M. Hall, Philip D. Smart & Christopher B. Jones

Authors

Mark M. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Philip D. Smart
View author publications
You can also search for this author in PubMed Google Scholar
Christopher B. Jones
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark M. Hall.

Appendix: Technical details of the vague field

Definition

Conceptually, the vague field is a two-dimensional, unbounded, continuous scalar field defined on a external coordinate system. Computationally, it is impossible to store an unbounded, continuous field; therefore, in its internal representation, the vague field is a bounded, discretised, floating-point field which can easily be stored in a two-dimensional, floating-point matrix. This matrix which forms the foundation for the vague field is augmented with further attributes that are required for the instantiation and processing of the vague field.

To enable the translation between the external coordinate system and the internal matrix representation, the field stores an external and an internal anchor location. The external anchor represents the location that the vague field is defined as being relative to in the external coordinate system. For the spatial preposition fields, this is the location of the ground toponym, such as the location of “Cardiff” in the case of the vague field for “near Cardiff”. The internal anchor represents the point where the field is attached to the external anchor location. The values in the internal matrix are always to be interpreted as specifying the vague phenomenon’s applicability relative to this internal anchor location. The internal and external anchors are used in the read function to translate between the external and internal coordinate systems (“Appendix”—“Accessing the field values”).

Instantiation

The experiment described in the previous section resulted in a sparse set of measurement points and an applicability value for each measurement point. An interpolation using ordinary kriging is used to transform these point measurements into the continuous field representation. Ordinary kriging was developed in the geostatistics field to estimate the distribution of natural resources based on a set of point measurements (Krige 1951; Matheron (1962; Hengl 2007). To calculate the vague field, a grid is placed over the area defined by the measurement points and the extent of each grid cell defined by the field’s desired scale. The interpolated value for each cell is then calculated using a weighted average as shown in Eq. 9. The advantage of ordinary kriging over other distance-based interpolations is that the weighting values λ are automatically derived from the values and spatial distribution of the measurement points p (Fig. 23). The interpolated results are in the range [1, 9] and in a final step are normalised to the [0, 1] range (Eq. 10) where kriging[x, y] is the result matrix produced by the kriging algorithm.

$$ kriging[x,y] = \sum_{i=0}^{n} \lambda \cdot \hbox{value}\left(p_i\right) $$

(9)

$$ values[x, y] = \frac{kriging[x, y] - 1}{\max(kriging - 1)} $$

(10)

The quality of the interpolation depends on the number of measurement points, and even for kriging, the number of measurement points as derived from the human-subject experiment is low. The effect of that is that the fitted variogram model is less stable. To increase the number of measurement points and thus the quality of the resulting field, additional measurement points were created based on the properties that the analysis described in “Background” revealed. For the cardinal directions where direction plays a the primary role, the measurement locations were mirrored across the cardinal direction’s primary axis^{Footnote 17} (Fig. 24). With “near” the analysis showed that angle played no significant role, it was thus possible to mirror the measurement locations across both axis, effectively quadrupling the number of measurement locations (Fig. 25) and making the resulting field more stable.

Accessing the field values

The read operation provides access to the vague field’s applicability values. It translates between the external, unbounded, continuous representation and the internal, discrete, bounded field-value matrix. The translation from the external to the internal representation is performed using the internal and external anchor locations. The offsets of the external x and y coordinates relative to the external anchor location are calculated and then using the field’s scale value transformed into internal offset coordinates. These internal offset coordinates are then added to the internal anchor’s coordinates to determine the internal coordinates. The internal coordinates are then used to read a value from the field matrix, which is returned.

Combining fields

The field combination calculation (Eq. 1) is performed every time the combined field is accessed. While this is a computationally expensive approach, it has the advantage that fields of any scale can be combined without having to align their internal matrix representations, as the fields are accessed through the read function and can thus be treated as continuous and scale-free.

The normalisation factor nf is defined as the maximum combined field value and is calculated by placing a virtual grid over all fields, calculating the value at each grid point and taking the maximum of these values. One problem with this approach is that if the source fields’ cells overlap as shown in Fig. 26, then none of the maximum measurement points actually measure the combined maximum. This means that if the combined field is read at a location that would produce the actual maximum, then the calculated value would be larger than the normalisation factor and the resulting value would be larger than 1, which is not allowed. To avoid this, if the combined value is larger than the normalisation factor, then a value of 1 is returned, regardless of what the actual measurement value is. While this may seem to skew the data, if all the source fields are continuous, as is usually the case, then the difference between the calculated and the actual maximum is very small and can be disregarded.

Crisping the vague field

The crisp operation is used to transform the continuous vague field into a crisp polygon for integration with existing GI systems and algorithms and is based on active contours.

Active contours

The concept of active contours was introduced by Kass et al. (1988) as a method of finding boundaries in image data, but have also been used in GIS for various purposes (Burghardt 2005; Steiniger and Meier 2004; Horvath et al. 2009). They are defined as controlled continuity splines (Terzopoulos 1986) upon which image and external forces act to move them into the desired shape. In the original method, the energies acting upon the active contour are defined as in Eq. 11, consisting of an internal energy, the image energy and external constraint energy, which the active contour then tries to minimise.

$$ \begin{aligned} E_{snake}^* &= \int\limits_0^1 E_{snake}(v(s)) ds \\ &=\int\limits_0^1 E_{int}(v(s)) + E_{image}(v(s)) + E_{con}(v(s)) ds \end{aligned} $$

(11)

The internal energy acts to maintain the active contour’s shape, image energy can be defined via the image intensity (Kass et al. 1988), image gradient (Lam and Yan 1994), or via more complex methods (Xie and Mirmehdi 2006), and the external energy defines constraints that the active contour needs to observe that are not directly defined by the active contour itself or the image data. An iterative method on a grid is used to move the active contour’s control points to their final solution. For each control point, the minimum energy neighbour is calculated, and the control point moved there immediately. The energy calculation for the next control point will thus take into account the updated position of the previous control point. This is repeated until the active contour’s final shape is found and means that the active contour will achieve a locally minimal solution, but not necessarily a globally minimal solution. Due to this iterative way of moving, active contours are often also referred to as snakes, as they seem to slither across their processing space (Fig. 30).

Crisping fields with active contours

To enable the use of active contours in creating a crisp representation of the vague field, a slightly modified energy function is used (Eq. 12). The first two energies, internal and field, are similar to the internal and image energies as defined earlier. The contract energy is an external energy that pulls the active contour towards the centre of the field.

$$ E_{snake} = \alpha \cdot E_{int} + \beta \cdot E_{field} + \gamma \cdot E_{contract} $$

(12)

Each energy is defined as a vector field, with the direction of each cell’s vector defining the direction in which an active contour control point at that location would be pushed (Fig. 27). The length of the cell’s vector defines how strongly the control point is being pushed in the specified direction; thus, the energy function (Eq. 12) can be implemented as a simple vector addition, with the final vector defining the direction the control point will move.

In this framework, the internal energy (Eq. 13) is defined as the vector from the control point (p _i) to the centre point between the preceeding (p _i-1) and following (p _i+1) control points (Fig. 28). This definition ensures that the control points always move so as to create a snake where the control points are evenly spread, since the further a control point moves towards its predecessor and further away from is successor, the stronger it will be pulled towards the successor.

$$ E_{int} = \left(v_{prev} + \frac{v_{next} - v_{prev}}{2}\right) - v_{current} $$

(13)

The scalar vague field is transformed into the required vector representation by applying the gradient operator. The gradient operator defines each cell’s vector so that it points in the direction of the neighbouring cell with the lowest scalar value. The length of the vector is determined by the value of the current cell, unless the minimum is equal to the cell’s value in which case the vector’s length is set to 0 as the cell is a local minimum (Fig. 29).

The contraction energy is used to define how far the active contour will contract. It is defined as a constant vector field of the same extent as the vague field that is being crisped, with each cell’s vector pointing towards the centre of the vague field and of length 1. The centre of the vague field is defined as the centroid of all cells with the maximum value. This guarantees that the active contour will contract towards the strongest part of the field.

In the active contour energy function (Eq. 12), each component energy has a weight attached to it, to define their relative influences on the total energy. The weights have been tuned experimentally and in the results shown in this section are $\alpha=0.2745,\,\beta=1,\,\gamma=0.4314$. The weights were chosen so as to create crisp polygons that had extents that roughly matched the angles and distances observed in the initial Geograph experiment (Hall and Jones 2008). The snake is initialised and terminated so as to minimise the number of iterations it has to run through, while guaranteeing a valid result. Details of the methods used to enable this can be found in Hall and Jones (2009).

Rights and permissions

Reprints and permissions

About this article

Cite this article

M. Hall, M., Smart, P.D. & Jones, C.B. Interpreting spatial language in image captions. Cogn Process 12, 67–94 (2011). https://doi.org/10.1007/s10339-010-0385-5

Download citation

Received: 30 April 2010
Accepted: 25 November 2010
Published: 14 December 2010
Issue Date: February 2011
DOI: https://doi.org/10.1007/s10339-010-0385-5

keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interpreting spatial language in image captions

Abstract

Access this article

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Technical details of the vague field

Appendix: Technical details of the vague field

Definition

Instantiation

Accessing the field values

Combining fields

Crisping the vague field

Active contours

Crisping fields with active contours

Rights and permissions

About this article

Cite this article

Share this article

keywords

Search

Navigation