Skip to main content

Advertisement

Log in

Consistency of survey opinions and external data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The Soul of the Community Survey was conducted in twenty-six communities in the United States in the years 2008, 2009, and 2010. Respondents were asked to rate their community in terms of quality of life, social offerings, and other aspects to determine the qualities that cause people to be most attached to their community. This paper focuses on describing the geographic distribution of responses to several of the questions within one of the communities, Long Beach, CA. We first provide a general description of the city and compare the geographic distribution of population, income, and race of survey respondents with external data. With this demographic profile in mind, we analyze respondents’ ratings of local safety, availability of green spaces, and quality of local public schools to see if they are consistent with external data sources. In the case of public school quality, where these ratings appear inconsistent, we propose an explanation to resolve this.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel Ackerman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The author wishes to thank Dr. Richard M. Heiberger, Professor Emeritus at the Department of Statistics, Temple University, for his guidance in preparing this paper and associated poster (Ackerman 2013b), as well as in designing the R package mapStats used to produce the maps in this paper.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 29697 KB)

Appendix

Appendix

1.1 mapStats demonstration examples

The installed version of the mapStats package includes a demo that illustrates the full range of visual and statistical options, beyond what can be shown in this work. To run the demo in R, once mapStats is installed and loaded, submit the command demo("map_examples", package="CR1").

1.2 Reporting district (RD) aggregation

The RDs are small areas whose borders do not follow zip code boundaries; an RD may overlap with multiple zip codes, and hence there is no clean-cut mapping of RDs to zip codes. To overcome this, we conduct approximations by using shapefiles of the RD and zip code boundaries (City of Long Beach 2013), since crime data was available at the RD level, but Census population estimates and survey responses were available only for zip codes. First, the spatial area shared by any RD and zip code was calculated from the shapefiles by intersecting them. We make the simplifying assumption that the population in each zip code (using the Census totals of persons 16 years and older) are evenly spatially-distributed within the zip code. Thus, within a zip code, the population belonging in each RD was estimated as the proportion of the area in the zip code in the RD, multiplied by the Census population total for the zip code. The population totals for each RD were then summed across all the zip codes they overlap with. Then, population-adjusted crime rates at the RD level could be calculated from the RD population estimate.

Crime totals were aggregated from the RD to zip code level in a similar way, by assuming the crimes occurred in an even spatial distribution in the RD. Then, the crimes were aggregated to the zip code by summing the crime totals by RD in proportion to the RD areas intersecting the zip code. The estimates of crime totals by zip code were then used to calculate population-adjusted crime rates using the known zip code population totals.

1.3 Using mapStats to analyze higher-level geographic units

Figure 5 shows crime rates at the district level on a shapefile of reporting districts (RDs), which are a lower-level geographic unit. This plot demonstrates a useful feature of the mapStats function. A shapefile can only show geography at a given level. For instance, in a map of the US, each shape might be a county, which then aggregate into states. However, it is not trivial, as far as we know, to convert the shapefile from counties to states only, eliminating the county boundaries within the state. In this example, our shapefile boundaries are at the RD level, but we wish to display crime rates at the higher level of district. There is a many-to-one mapping from RDs to districts, since districts are made of of mutually-exclusive sets of RDs. However, we do not have a separate shapefile that just shows the district boundaries without the RDs.

We can overcome this, if we are willing to tolerate displaying the RD boundaries. We simply use the original dataset of crime totals by RD and add a variable for the district both to this dataset and to the dataset associated with the shapefile. Then use the calcStats function, setting var="crime_by_RD", stat="total", and d.geo.var="district" to sum crime totals from the RD to district level, generating a new output dataset, say crime.district. Then, use this as data input d to mapStats, but with the geography variables now at the district level. An example of this is shown in the package demo, where a shapefile of US states is used, and states are aggregated into four regions. The function will color geographic units at the RD level together, since this is the level that each sub-shape represents, but because a district consists of contiguous RDs, which in this case are the same color, each district will be a single color. The only drawback is that the RD boundary lines still show, and if we geographic units to be labeled, we can only label each RD as belonging to a district, but cannot label the district with a single label unless it is done manually, such as by creating an object with sp_layout.pars. We believe this is a minor inconvenience, when we consider the ease of which the package enables visualizing data at different geography levels when there is a many-to-one mapping between them.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ackerman, S. Consistency of survey opinions and external data. Comput Stat 34, 1489–1509 (2019). https://doi.org/10.1007/s00180-019-00882-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00882-2

Keywords

Navigation