Networks and geography: Modelling community network structures as the outcome of both spatial and network processes
Introduction
It is not a new idea that physical distance limits people's capacity to form and maintain relationships. Empirical research on the associations between social and geographic space has occurred in disconnected scientific communities, including human geography, tourism and regional science since at least the 1930s (see summary in Butts, 2011, Carrothers, 1956, Mok et al., 2007). Physical propinquity effects have been demonstrated to occur for different types of relationship and at multiple levels of analysis (Festinger et al., 1950, Merton, 1948, Caplow and Forman, 1950, Blake et al., 1956, Whyte, 1957, Sommer, 1969, Wellman, 1996, Mok et al., 2007, Faust et al., 1999, Axhausen, 2006, Larsen et al., 2006). Furthermore, this relationship appears remarkably robust to advances in technology (internet, phones), transportation (highways, freeways, etc.), and cultural differences (Latane et al., 1995, Carley and Wendt, 1991). Over the last 50 years researchers have consistently argued that human relationships are predominantly “local” and that the probability of a tie diminishes as the distance between actors increases, often following either a power law or an exponential decay function (Brown and Moore, 1970, Freeman and Sunshine, 1976, Irwin and Hughes, 1992, Morrill, 1963, Kleinberg, 2000, Wong et al., 2005, Butts, 2002, Butts and Carley, 2000, Butts et al., 2007). The most recent study (Preciado et al., in press) provides empirical evidence that the log odds of a friendship tie between adolescents decreases smoothly as the logarithm of their distance increases. They have also shown that the strength of distance dependence is negatively related with age and shared meeting places. A number of empirical studies of the features of large-scale, spatially embedded human networks have also been conducted (Bernard et al., 1988, Butts, 2002, Butts and Carley, 2000, Dodds et al., 2003, Killworth and Bernard, 1978, Korte and Milgram, 1970, Liben-Nowell et al., 2005, Milgram, 1967, Travers and Milgram, 1969).
While it has been borne out by the past research that the geographical arrangements of individuals have powerful structuring effects on social relationships and social interactions, there have been relatively few attempts to build explicitly spatial models of social networks and to use these models to understand the way in which social networks are embedded geographically. Models proposed by Butts (2002) and Wong et al. (2005) are notable exceptions and aim to understand how different geographical arrangements of connected actors give rise to a particular social network structure, and so, for example, whether geographical proximity between individuals can explain some of commonly observed properties of social networks, such as clustering, skewed degree distributions, and short average geodesic distances.
Butts (2002) utilised a family of spatial non-directed inhomogeneous Bernoulli graphs to study the empirical relationship between geographical distance and network tie probability. He proposed a model that assumes that ties between individuals are independent of one another conditional on an observed distance structure (Butts, 2002)where X is an adjacency matrix of network tie variables with xij = 1, if there is a tie between i and j, xij = 0, otherwise; D is a distance matrix with dij the geographical distance between i and j; φd = φ(d) is a distance interaction function, which is defined as a function mapping distances defined on (0,∞) onto tie probabilities in [0,1]; x and d refer to realisations of X and D, respectively; and B is the Bernoulli probability mass function given byHere the substantial steps are to identify a choice of distance measure, D, as a function of locations in a physical space, and to specify the distance interaction function, φd, that relates distance to tie probability. Given a distance matrix D and a distance interaction function φd, the spatial Bernoulli graph can be constructed as the outcome of series of independent Bernoulli trials where the probability of each edge between actors i and j is determined probabilistically by the distance between them and the distance interaction function. Butts (2002) argued that this model captures some of the commonly observed structural characteristics of the network, such as a high degree of transitivity and the formation of locally dense clusters.
Butts emphasised that spatial locations can be considered not only in terms of physical locations of actors (for example, residential location) but also in terms of social positions, for example, positions in Blau space, where social locations of individuals are represented in a socio-demographic coordinate system, however, the focus of this paper is particularly on physical locations of individuals represented by “physical distance” or “geographical proximity”.
Wong et al. (2005) also developed a model for networks in which the edge probability between any two nodes is considered to be dependent on the spatial distance between those nodes and demonstrated that this model is also able to capture many commonly observed properties of social networks. They represented social networks through an extension of non-directed Erdös-Rényi random graphs by proposing a step-function relationship between edge probability and spatial distance, i.e.:where xij is a network tie; dij is the Euclidian distance between actors i and j; p is the density (i.e. average probability) of the network; pb is the proximity bias, which specifies the sensitivity to geographical proximity; R is the neighbourhood radius within which the proximity bias applies; and α is a correction term, that is required to ensure that the average density remains the same, given a distance matrix, d, for all possible R and pb. It is assumed that the Xij are identical and independent distributed Bernoulli random variables conditional on distance being ≤R or >R, and hence that proximity bias, pb, is the same for all actors.
These models demonstrate how the potential importance of geographical proximity to tie formation processes can be parameterised in models for social networks. The main advantage of these models is that they enable us to explore the parametric function relating social interactions to physical distance and can demonstrate simple regularities in a social network that may be associated with spatial proximity of actors. The models propose different functions. In Butts’ model, tie probabilities vary according to distance via a continuous spatial interaction function, whereas in Wong et al.’s model, there is a simple threshold, or step-function, relating tie probability and distance, with the proximity bias in operation only when distance is below the threshold. In spite of this clear difference between these two models, they are similar in one important respect. These models do not incorporate complex dependences between edges, and explain emergence of social network structure solely in terms of spatial proximities among actors. Yet it may be unrealistic to assume that network ties are conditionally independent entities given spatial proximity and whether there is a network tie between two actors, Ben and Sarah may depend not only on the geographical proximity of Ben and Sarah, but also on whether they have network partners in common. While Butts (2002) and Wong et al. (2005) primarily focus on the cases in which edges are independent given the distance, they both recognize that endogenous social processes may account for some aspects of emergent social structure, so that spatial proximity may be both a by-product as well as a determinant of social structure. They also indicate the necessity of constructing a nested family of exponential random graph models that can be used to evaluate the empirical evidence for spatial and network dependences and demonstrate that both models can be easily translated into the more general exponential family framework with some minor modification. It is worth noting that (to our knowledge) there have been no applications to date of this promising approach.
The first objective of this paper is therefore to formulate models that allow simultaneous estimation of potential spatial and network effects involved in tie formation. We do so by building empirically testable models for network structure that accommodate geographical proximity as well as endogenous network processes in explanations of network structure. In the process of model construction, we utilise exponential random graph models as well as Butts’ (2011) framework for specifying the distance interaction function, a function that describes the relationship between tie probability and the distance between actors. Our second objective is to apply these models to network data gathered using a snowball sampling methodology in a suburban community within Melbourne, Australia, and hence to assess, at the level of a large community, the potentially distinctive roles that spatial proximity and network processes may play in shaping social network structure.
Section snippets
Exponential random graph models
As noted earlier, a random graph or network is represented by a binary matrix X = [Xij] of network tie variables on a node set N. Each possible edge or tie in the network is regarded as a random variable, with xij = 1 if there is an edge from node i to node j, and xij = 0 otherwise. Here, we regard the node set as fixed and possible ties as nondirected (Xij = Xji) and disallow self-ties of the form Xii. The matrix of all network variables is denoted by X, while x = [xij] refers to a realisation of X.
Data
The data used in this example is derived from a survey designed to assess the role of networks and geographical proximity in explaining the distribution of unemployment in a suburban Australian region (see Daraganova, 2008, for details). The survey was a quantitative study using an interviewer-administered survey where participants were recruited via a 2-wave snowball sampling scheme (e.g., Frank, 2005, Frank and Snijders, 1994, Goodman, 1961, Handcock and Gile, 2010). Specifically, a
Analysis and results
The results are presented in two parts. The first part describes the analysis of different forms of the distance interaction function. The second part presents the results of fitting the exponential random graph models with geographical proximity to the empirical data.
Conclusions
The main goal of this study was to formulate models that allow simultaneous estimation of spatial and network effects from observed network data, and, then, to assess these effects simultaneously. The assessment was made using a study designed to assess community network structure as a function of spatial locations of individuals in a suburban Australian region. As a method we utilised the exponential random graph model approach as it allows the relaxation of the assumption of independence
Acknowledgements
We thank Carter Butts for his helpful suggestions and discussion. We would also like to thank Peng Wang for the programming of the simulation and estimation programs.
References (59)
- et al.
Reversal small-world experiment
Social Networks
(1978) - et al.
Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation
Statistical Methodology
(2010) - et al.
Did distance matter before the Internet? Interpersonal contact and support in the 1970s
Social Networks
(2007) Are personal communities local? A Dumptarian reconsideration
Social Networks
(1996)Spatial interaction and the statistical analysis of lattice systems (with discussion)
Journal of the Royal Statistical Society, Series B: Methodological
(1974)- et al.
Housing Architecture and Social Interaction
Sociometry
(1956) - et al.
Studying social relations cross-culturally
Ethnology
(1988) Spatial interaction and the statistical analysis of lattice systems
Journal of the Royal Statistical Society, Series B
(1974)- et al.
Urban acquaintance fields: an evaluation of a spatial model
Environment and Planning
(1970)