Abstract
The Soul of the Community Survey was conducted in twenty-six communities in the United States in the years 2008, 2009, and 2010. Respondents were asked to rate their community in terms of quality of life, social offerings, and other aspects to determine the qualities that cause people to be most attached to their community. This paper focuses on describing the geographic distribution of responses to several of the questions within one of the communities, Long Beach, CA. We first provide a general description of the city and compare the geographic distribution of population, income, and race of survey respondents with external data. With this demographic profile in mind, we analyze respondents’ ratings of local safety, availability of green spaces, and quality of local public schools to see if they are consistent with external data sources. In the case of public school quality, where these ratings appear inconsistent, we propose an explanation to resolve this.
Similar content being viewed by others
References
Ackerman S (2013a) mapStats: Geographic display of survey data statistics. R software package. https://cran.r-project.org/web/packages/mapStats/mapStats.pdf. Accessed 1 Sept 2013
Ackerman S (2013b) Safety, scenery, and schools in Long Beach, CA: consistency of survey and external data. Poster presented at the 2013 Joint Statistical Meetings Data Exposition, Montreal, Canada
Baldassare M, Bonner D, Petek S, Shrestha J (2013) Californians and education. Public Policy Institute of California. http://www.ppic.org/content/pubs/survey/S_413MBS.pdf. Accessed 1 Sept 2013
Bivand R, Ono H, Dunlap R, Stigler M (2018) classInt: Choose univariate class intervals. R software package. https://cran.r-project.org/web/packages/classInt/classInt.pdf. Accessed 15 Dec 2018
California Deparment of Education (2012) Executive summary explaining the academic performance index (API). http://www.cde.ca.gov/ta/ac/ap/documents/apiexecsummary.pdf. Accessed 1 Sept 2013
California Department of Finance (2011) 2011 City population rankings. http://www.cacities.org/UploadedFiles/LeagueInternet/59/59c32753-9e10-4662-8b7b-d1fe6bcea4ba.pdf. Accessed 1 Sept 2013
City of Long Beach (2013) GIS data catalog. http://www.longbeach.gov/ti/gis-maps---data/data-catalog/. Accessed 1 Sept 2013
City of Long Beach Police Department (2013) Crime statistics. http://www.longbeach.gov/police/crime-info/crime-statistics/. Accessed 1 Sept 2013
Eggleston J S, Gideon M (2017) Evaluating wealth data in the redesigned 2014 survey of income and program participation. Social, economic, and housing statistics division working paper
Heiberger RM (2013) HH: statistical analysis and data display: Heiberger and Holland. R package version 2.3-42. http://CRAN.R-project.org/package=HH. Accessed 1 Sept 2013
Hofmann H, Wickham H, Cook D (2019) The 2013 Data Expo of the American Statistical Association. Comput Stat XX(YY)
Ihaka R, Murrell P, Hornik K, Fisher JC, Stauffer R, Zeilis A (2016) colorspace: Color space manipulation. R software package. https://cran.r-project.org/web/packages/colorspace/colorspace.pdf. Accessed 15 Dec 2018
Katz CM, Fox A, Nuno L, Cortez M, Choate D (2017) Compton, California, Gang Assessment. National Gang Center, Phoenix
Knight Foundation (2010a) Knight foundation soul of the community: data documentation. http://www.soulofthecommunity.org. Accessed 1 Sept 2013
Knight Foundation (2010b) Knight soul of the community 2010: why people love where they live and why it matters: a national perspective. http://www.soulofthecommunity.org/sites/default/files/SOTC_2010_Report_OVERALL_11-12-10_mh.pdf. Accessed 1 Sept 2013
Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9(1):1–19
Lumley T (2014) survey: analysis of complex survey samples. R package version 3.30
Neuwirth E (2014) RColorBrewer: ColorBrewer palettes. R software package. https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf. Accessed 1 Sept 2013
Port of Long Beach (2013) Port of Long Beach. http://www.polb.com/about/default.asp. Accessed 1 Sept 2013
Quillian L, Pager D (2010) Estimating risk stereotype amplification and the perceived risk of criminal victimization. Soc Psychol Q 73(1):79–104
R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. Accessed 1 Sept 2013
Reese P (2017) See the California cities with the highest, lowest homicide rates. Sacramento Bee. https://www.sacbee.com/news/local/crime/article181299806.html. Accessed 15 Dec 2018
Sarkar D (2018) lattice: Trellis graphics for R. R software package. https://cran.r-project.org/web/packages/lattice/lattice.pdf. Accessed 15 Dec 2018
The Broad Prize for Urban Education (2013) About the Broad Prize. http://www.broadprize.org/about/overview.html. Accessed 1 Sept 2013
US Census Bureau (2010) Census 2010. https://catalog.data.gov/dataset/2010-census-populations-by-zip-code. Accessed 1 Sept 2013
US Census Bureau (2013) State and county quick facts. http://quickfacts.census.gov/qfd/states/06/0643000.html. Accessed 1 Sept 2013
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The author wishes to thank Dr. Richard M. Heiberger, Professor Emeritus at the Department of Statistics, Temple University, for his guidance in preparing this paper and associated poster (Ackerman 2013b), as well as in designing the R package mapStats used to produce the maps in this paper.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 mapStats demonstration examples
The installed version of the mapStats package includes a demo that illustrates the full range of visual and statistical options, beyond what can be shown in this work. To run the demo in R, once mapStats is installed and loaded, submit the command demo("map_examples", package="CR1").
1.2 Reporting district (RD) aggregation
The RDs are small areas whose borders do not follow zip code boundaries; an RD may overlap with multiple zip codes, and hence there is no clean-cut mapping of RDs to zip codes. To overcome this, we conduct approximations by using shapefiles of the RD and zip code boundaries (City of Long Beach 2013), since crime data was available at the RD level, but Census population estimates and survey responses were available only for zip codes. First, the spatial area shared by any RD and zip code was calculated from the shapefiles by intersecting them. We make the simplifying assumption that the population in each zip code (using the Census totals of persons 16 years and older) are evenly spatially-distributed within the zip code. Thus, within a zip code, the population belonging in each RD was estimated as the proportion of the area in the zip code in the RD, multiplied by the Census population total for the zip code. The population totals for each RD were then summed across all the zip codes they overlap with. Then, population-adjusted crime rates at the RD level could be calculated from the RD population estimate.
Crime totals were aggregated from the RD to zip code level in a similar way, by assuming the crimes occurred in an even spatial distribution in the RD. Then, the crimes were aggregated to the zip code by summing the crime totals by RD in proportion to the RD areas intersecting the zip code. The estimates of crime totals by zip code were then used to calculate population-adjusted crime rates using the known zip code population totals.
1.3 Using mapStats to analyze higher-level geographic units
Figure 5 shows crime rates at the district level on a shapefile of reporting districts (RDs), which are a lower-level geographic unit. This plot demonstrates a useful feature of the mapStats function. A shapefile can only show geography at a given level. For instance, in a map of the US, each shape might be a county, which then aggregate into states. However, it is not trivial, as far as we know, to convert the shapefile from counties to states only, eliminating the county boundaries within the state. In this example, our shapefile boundaries are at the RD level, but we wish to display crime rates at the higher level of district. There is a many-to-one mapping from RDs to districts, since districts are made of of mutually-exclusive sets of RDs. However, we do not have a separate shapefile that just shows the district boundaries without the RDs.
We can overcome this, if we are willing to tolerate displaying the RD boundaries. We simply use the original dataset of crime totals by RD and add a variable for the district both to this dataset and to the dataset associated with the shapefile. Then use the calcStats function, setting var="crime_by_RD", stat="total", and d.geo.var="district" to sum crime totals from the RD to district level, generating a new output dataset, say crime.district. Then, use this as data input d to mapStats, but with the geography variables now at the district level. An example of this is shown in the package demo, where a shapefile of US states is used, and states are aggregated into four regions. The function will color geographic units at the RD level together, since this is the level that each sub-shape represents, but because a district consists of contiguous RDs, which in this case are the same color, each district will be a single color. The only drawback is that the RD boundary lines still show, and if we geographic units to be labeled, we can only label each RD as belonging to a district, but cannot label the district with a single label unless it is done manually, such as by creating an object with sp_layout.pars. We believe this is a minor inconvenience, when we consider the ease of which the package enables visualizing data at different geography levels when there is a many-to-one mapping between them.
Rights and permissions
About this article
Cite this article
Ackerman, S. Consistency of survey opinions and external data. Comput Stat 34, 1489–1509 (2019). https://doi.org/10.1007/s00180-019-00882-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-019-00882-2