Skip to main content
Log in

Soul of the community: an attempt to assess attachment to a community

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this article, we work with data from the Soul of the Community survey project that was conducted by the Knight Foundation from 2008 to 2010. Overall, 26 communities across the United States with a total of more than 47,800 participants took part in this study. Each year, around 200 different questions were posed to each participant. One key variable is attachment to one’s community. In our article, we provide an assessment via various machine learning algorithms which factors may have an effect on attachment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

Download references

Acknowledgements

We would like to thank Dr. Adele Cutler for her input on the methodology of this manuscript and for providing access to her archetype software. In addition, we would like to thank the reviewers for their helpful comments and suggestions. This article was submitted prior to Jürgen Symanzik becoming Editor-in-Chief of Computational Statistics, and was handled by Yuichi Mori, the previous Editor-in-Chief.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Quach.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Appendix

A number of variables and cases were removed prior to our analyses. Variables were removed for the following reasons: (i) Variables with a large number of missing responses (more than 45%) among the cases were excluded. (ii) When variables were provided as 5–level variables and also as aggregated 3–level variables (variables names ending with an r), the aggregated 3–level variables were removed as they provided less nuanced information. (iii) Variables not observed in all 3 years were excluded for comparison purposes. (iv) All index variables (see Table 3) were removed, assuming that the variables that were aggregated into an index variable would show up together if the index variable is an important predictor variable. Including the index variables that are linearly dependent on the variables they were derived from creates issues in some of the models we used. We determined how the index variables were derived by using PCA. (v) Finally, all variables were removed that form the basis for “Community Attachment” (which is one of our main response variables). For (iv) and (v), PCA was conducted. Ultimately, 55 variables were retained for our analysis from the original 179 (2008), 195 (2009), and 229 (2010) variables, respectively.

Table 3 Table of the formulation of the 15 index variables (variables calculated by taking the mean of several variables) found using PCA

After the removal of variables, cases were removed for the following reasons: (i) Cases with at least one missing value in the remaining variables were removed. (ii) Answers such as “don’t know”, “refuse to answer”, or “did not answer the question” in the survey were replaced as missing and then were handled according to (i). Figure 12 shows the effect of data cleaning for the sample sizes in each community in each year. Although steps (i) and (ii) sound rigorous, in most communities/years, only a few cases had to be deleted. Notice that communities with considerable decreases in sample size after data cleaning were mostly urban communities (such as Philadelphia, Pennsylvania, Miami, Florida, and San Jose, California, in 2008 and Charlotte, North Carolina, Akron, Ohio, and Detroit, Michigan, in 2009). An explanation of why we see these dramatic changes in urban communities following data cleaning would be interesting, but has not been investigated here.

Figure 13 provides a graphical representation of the variables and cases that were removed from further analysis. Overall, the largest number of cases were removed from the 2010 data set, but the original sample size that year was approximately 50% larger than the sample sizes in 2008 and 2009.

Fig. 12
figure 12

Dot chart of the sample size in each year for each community before and after data cleaning

Fig. 13
figure 13

Heatmaps showing missing data in each year. Also shown are cases and variables removed from further analysis (further explained in the main text)

B Appendix

“Appendix B” summarizes the predictor variables in Table 4 and lists additional variables related to the index variables in Table 5. The variables in bold in Table 4 are the variables that are found to be the three most important predictor variables in predicting attachment status. The table also lists some of the variables that make up the index variables in Table 3. Table 5 lists the remaining variables and descriptions used to make up the index variables in Table 3.

Table 4 Table of predictor variables
Table 5 Table of additional variables that make up the index variables in Table 3

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Quach, A., Symanzik, J. & Forsgren, N. Soul of the community: an attempt to assess attachment to a community. Comput Stat 34, 1565–1589 (2019). https://doi.org/10.1007/s00180-019-00866-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00866-2

Keywords

Navigation