Contextual mapping: Visualization of high-dimensional spatial patterns in a single geo-map

https://doi.org/10.1016/j.compenvurbsys.2016.08.005Get rights and content

Highlights

  • A generic method based Self Organizing Maps to encode the high dimensional spatial patterns into single numerical vector.

  • The numerical vector makes it possible to visualize high dimensional patterns in a single geo-map, called contextual maps.

  • Instead of rigid spatial clusters it produces a color-coded spectrum of changing high dimensional emergent patterns.

  • It can be used in a hierarchically to combine several contextual maps, each representing a unique high dimensional context.

Abstract

In this study, we proposed a generic methodology for combining high-dimensional spatial data to identify and visualize the hidden spatial patterns in a single-layer geo-map. By using the less explored one-dimensional self-organizing maps, we showed how the high-dimensional data can be transformed into a spectrum of one-dimensional ordered numbers. These numbers (codes) can index a high-dimensional space with the important property that similar indices refer to similar high-dimensional contexts. Thus, the high-dimensional vectors will be attributed to single numbers, and this one-dimensional output can be easily rendered as a new single data layer in the original geographic map. As a result, it simultaneously identifies the main spatial clusters and visualizes the high-dimensional correlations (if any) in a single geographic map. Further, because the output of the proposed method is a set of ordered indices, there is no need to define a fixed number of clusters in advance.

Because these composite spatial layers are identified on the basis of the selected context (i.e., the selected features or aspects of the spatial phenomena), they are called contextual maps.

Finally, we showed the results of applying the proposed methodology to several synthetic and real-world data sets.

Introduction

With the current rapid growth in the amount of digital data, we must address the challenge of finding appropriate techniques to harness the power of these data streams. For example, in many cities across the world, no longer does anyone lack access to digital spatial maps; instead, the current challenge is, considering the amount and diversity of these digital data regarding different aspects of the cities, how one can picture his/her own map of the space as a combination of several factors of interest.

Toward this direction, there have been several interesting cases such as peoplemaps1 or Livehood projects (Cranshaw, Schwartz, Hong, & Sadeh, 2012), which are explorations and mapping of activities within cities based on data available from online social networks. One of the cases most similar to our work is a project called Whereabout,2 where by applying the K-means data-clustering algorithm to a collection of spatial data consisting of > 200 different aspects of each ward in the city of London, a fixed number of groups were created by grouping based on informational similarities (not physical locations). Then, on top of the classical map of London, people get an impression of different regions on the basis of their similarities in all of these categories. In a similar manner, but only based on demographic information, a new coding system of London called LOAC was developed (Longley & Singleton, 2014).

The classical clustering algorithms divide the high-dimensional data space into a predetermined number of groups, where each will be given a label (usually an arbitrary number). Then, these cluster labels attributed to each spatial data point can be visualized on the geographic map with a specified color code. However, despite the fact that standard clustering methods such as K-means are easy to use, they have some limitations in the domain of spatial pattern recognition. One of the main problems is that they divide the space into a small number of categories. Instead, it would be preferred to have a continuous and smooth changing pattern on top of the high-dimensional data. Further, one needs to select the number of clusters in advance, which is a critical decision (Tibshirani, Walther, & Hastie, 2001). In addition, in the context of spatial clustering, because the cluster labels are not ordered according to their high-dimensional similarities, the colored visualization of clusters in the geographic map is not directly helpful. Therefore, similar colors in a clustered geo-map do not necessarily refer to similar high-dimensional patterns. As a result, increasing the number of clusters with different colors may result in final spatial visualizations that are not helpful, but having too few clusters produces results that are too aggregated. One current solution to this problem is to create an RGB (red, green, blue) pattern after data clustering by reducing the high-dimensional vectors of the cluster centers to their first three principal components (Mahinthakumar, Hoffman, Hargrove, & Karonis, 1999). However, in addition to losing some information (by selecting only three principal components), the color interpretations will need an additional step.

The main hypothesis of this study is that if we find a method to sort the clusters in a way such that similar cluster indices refer to similar contexts (i.e., similar high-dimensional patterns), we can make a direct projection from high-dimensional spatial data to a one-dimensional vector and visualize the high-dimensional patterns in the geographical maps using a simple color spectrum. In this manner, by having many indices instead of dividing the high-dimensional data into a few distinct groups, one can create a spectrum of high-dimensional patterns that are visualized with a colored spectrum on spatial maps. Because the high-dimensional patterns would change gradually, this would also solve the problem of distinct cluster borders and the fixed number of clusters. As we show in Section 2, our proposed approach can be discussed from the viewpoint of dimensionality reduction and manifold learning (Bengio, Courville, & Vincent, 2013), where one of the best methods that satisfies these requirements is self-organizing maps (SOMs) (Kohonen, 2013).

Section snippets

SOMs in the domain of spatial analysis

SOM is a general-purpose machine-learning method that offers interesting solutions to different data-driven modeling tasks (Kohonen, 2013).

SOM is a nonlinear space transformation method that tries to preserve the topology of high-dimensional data, while transforming them into a low-dimensional space. This means that SOM projects the high-dimensional data points to a lower-dimensional space (normally a two-dimensional grid) in a manner such that neighboring objects in high-dimensional space

One-dimensional SOMs and spatial clustering

In this section, we assume that the reader is familiar with the original SOM algorithm. Therefore, we skip its re-explanation here and refer the reader to Kohonen (2001) for details regarding the training process.

We instead present how one can project high-dimensional spatial data onto geographical maps while preserving the high-dimensional correlations by using the less explored one-dimensional SOM.

We consider the training data set X = {xi,  , xM} as a set of M points in an n-dimensional space xi  R

Experiments with real-world spatial data

In this section, we show the results of the proposed method using two real-world spatial data sets. One is a collection of 235 attributes of the so-called wards in London (Fig. 2). The data set is provided by Future Cities Catapult from the abovementioned project Whereabout. The second data set is obtained from US census 2000 and 2010, including the distribution of different race groups at the census block level, corresponding to five boroughs of New York City.

In the data set from London, there

Discussions and future research

In this section, we will discuss two main technical issues related to the proposed methodology plus one potential application in the field of urban planning and zoning.

The first point is about the chosen one-dimensional topology of SOM. As we briefly mentioned before, it is known that having higher grid dimensions or a more-connected neighborhood topology in the SOM network can improve the performance and quality of the trained SOM in terms of quantization error and topology preservation.

Conclusions

With the ever-growing availability of digital data in many spatial domains, we need to develop appropriate methods to explore high-dimensional and complex spatial patterns. Compared to classical data clustering problems, one of the main issues of spatial pattern recognition and spatial clustering is that in spatial clustering, in addition to finding high-dimensional patterns, one needs to keep the spatial coordinates in parallel to other features. Finally, it is always desired to project the

Acknowledgments

This research was supported by the National Research Foundation Singapore (NRFS) through the Singapore-ETH Centre for Global Environmental Sustainability (SEC) and the Chair for Computer Aided Architectural Design (CAAD) at ETH Zurich. Further, the author would like to thank the reviewers of the paper as their comments on the initial submission significantly improved the quality of the final paper.

References (32)

  • Y. Bengio et al.

    Representation learning: A review and new perspectives

    Pattern Analysis and Machine Intelligence, IEEE Transactions on

    (2013)
  • Y. Cheng

    Convergence and ordering of Kohonen's batch map

    Neural Computation

    (1997)
  • J. Cranshaw et al.

    The livehoods project: Utilizing social media to understand the dynamics of a city

  • E. Delmelle et al.

    Trajectories of multidimensional neighbourhood quality of life change

    Urban Studies

    (2013)
  • E. Erwin et al.

    Self-organizing maps: Ordering, convergence properties and energy functions

    Biological Cybernetics

    (1992)
  • Cited by (17)

    • Mapping urban underground potential in Dakar, Senegal: From the analytic hierarchy process to self-organizing maps

      2020, Underground Space (China)
      Citation Excerpt :

      The aggregation of the four resources proves challenging for the AHP, however, because the mapping process does not seek to give precedence to one resource potential over another, which is at odds with the AHP’s need for a clear hierarchy in the criteria. This shortcoming will be addressed in the case of Dakar by testing an alternative method for establishing relationships between criteria: a slightly unorthodox—but highly promising—use of the self-organizing map (SOM) algorithm (Kohonen, 2001, 2015; Moosavi, 2017). Rather than aggregate the resource potentials with the AHP, the SOM indexes the underlying patterns of combined potentials.

    • Urban morphology meets deep learning: Exploring urban forms in one million cities, towns, and villages across the planet

      2022, Machine Learning and the City: Applications in Architecture and Urban Design
    View all citing articles on Scopus
    View full text