Skip to main content

Advertisement

Log in

Does big data serve policy? Not without context. An experiment with in silico social science

  • S.I. : Ground Truth: in silico Social Science (GTIS3)
  • Published:
Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

The DARPA Ground Truth project sought to evaluate social science by constructing four varied simulated social worlds with hidden causality and unleashed teams of scientists to collect data, discover their causal structure, predict their future, and prescribe policies to create desired outcomes. This large-scale, long-term experiment of in silico social science, about which the ground truth of simulated worlds was known, but not by us, reveals the limits of contemporary quantitative social science methodology. First, problem solving without a shared ontology—in which many world characteristics remain existentially uncertain—poses strong limits to quantitative analysis even when scientists share a common task, and suggests how they could become insurmountable without it. Second, data labels biased the associations our analysts made and assumptions they employed, often away from the simulated causal processes those labels signified, suggesting limits on the degree to which analytic concepts developed in one domain may port to others. Third, the current standard for computational social science publication is a demonstration of novel causes, but this limits the relevance of models to solve problems and propose policies that benefit from the simpler and less surprising answers associated with most important causes, or the combination of all causes. Fourth, most singular quantitative methods applied on their own did not help to solve most analytical challenges, and we explored a range of established and emerging methods, including probabilistic programming, deep neural networks, systems of predictive probabilistic finite state machines, and more to achieve plausible solutions. However, despite these limitations common to the current practice of computational social science, we find on the positive side that even imperfect knowledge can be sufficient to identify robust prediction if a more pluralistic approach is applied. Applying competing approaches by distinct subteams, including at one point the vast TopCoder.com global community of problem solvers, enabled discovery of many aspects of the relevant structure underlying worlds that singular methods could not. Together, these lessons suggest how different a policy-oriented computational social science would be than the computational social science we have inherited. Computational social science that serves policy would need to endure more failure, sustain more diversity, maintain more uncertainty, and allow for more complexity than current institutions support.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  • Althouse BM, Wenger EA, Miller JC, Scarpino SV, Allard A, Hébert-Dufresne L, and Hao H (2020) "Superspreading events in the transmission dynamics of SARS-CoV-2: Opportunities for interventions and control." PLoS Biol 18(11): e3000897. https://doi.org/10.1371/journal.pbio.3000897

  • Bandalos DL (2018) Measurement theory and applications for the social sciences. Guilford Publications, New York

    Google Scholar 

  • Becker HS, Gans HJ, Newman KS, Vaughan D (2004) On the value of ethnography: sociology and public policy: a dialogue. Ann Am Acad Pol Soc Sci 595(1):264–276

    Article  Google Scholar 

  • Bok DC (2001) The trouble with government. Harvard University Press, Cambridge

    Google Scholar 

  • Chami I, Ying R, Ré C, Leskovec J (2019) Hyperbolic graph convolutional neural networks. Adv Neural Inf Process Syst 32(December):4869–4880

    Google Scholar 

  • Charles CZ (2003) The dynamics of racial residential segregation. Ann Rev Soc 29(1):167–207. https://doi.org/10.1146/annurev.soc.29.010202.100002

    Article  Google Scholar 

  • Chattopadhyay, I. 2014. “Causality networks.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1406.6651. Accessed 24 Nov 2022

  • Cheng J, Greiner R, Kelly J, Bell D, Liu W (2002) Learning Bayesian networks from data: an information-theory based approach. Artif Intell 137(1):43–90

    Article  Google Scholar 

  • Cusumano-Towner, MF, Feras AS, Alexander KL, and Vikash KM. (2019). “Gen: a general-purpose probabilistic programming system with programmable inference.” In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 221–36. PLDI 2019. New York, NY, USA: Association for Computing Machinery.

  • Dahrendorf R (1973) Homo sociologus. Routlege Kegan Paul, London

    Google Scholar 

  • Denzinger J. 1995. “Knowledge-based distributed search using teamwork.” In V Lesser (ed), Proceedings of the First International Conference on Multiagent Systems, 81–88. Association for the Advancement of Artificial Intelligence.

  • Dorigo M, Bonabeau E, Theraulaz G (2000) Ant algorithms and stigmergy. Futu Gener Comput Syst: FGCS 16(8):851–871

    Article  Google Scholar 

  • Edelmann A, Wolff T, Montagne D, Bail CA (2020) Computational social science and sociology. Ann Rev Sociol 46(1):61–81

    Article  Google Scholar 

  • Fine TL (2006) Feedforward neural network methodology. Springer Science & Business Media, Heidelberg

    Google Scholar 

  • Goertz G, Mahoney J (2012) Concepts and measurement: ontology and epistemology. Soc Sci Inf. Information Sur Les Sciences Sociales 51(2):205–16

    Google Scholar 

  • Goodman N, Vikash M, Daniel MR, Keith B, and Joshua BT. (2012). “Church: a language for generative models.” arXiv [cs.PL]. arXiv. http://arxiv.org/abs/1206.3255. Accessed 24 Nov 2022.

  • Granovetter M (1985) Economic action and social structure: the problem of embeddedness. Am J Soc 91(3):481–510

    Article  Google Scholar 

  • Hacking I (1990) The taming of chance. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Ha D, Schmidhuber J (2018) Recurrent world models facilitate policy evolution. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates Inc, New York, pp 2450–62

    Google Scholar 

  • Hall KL, Vogel AL, Huang GC, Serrano KJ, Rice EL, Tsakraklides SP, Fiore SM (2018) The science of team science: a review of the empirical evidence and research gaps on collaboration in science. Am Psychol 73(4):532–548

    Article  Google Scholar 

  • Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2008) Statnet: software tools for the representation, visualization, analysis and simulation of network data. J Stat Softw 24(1):1548

    Article  Google Scholar 

  • Head BW (2019) Forty years of wicked problems literature: forging closer links to policy studies. Policy Soc 38(2):180–197. https://doi.org/10.1080/14494035.2018.1488797

    Article  Google Scholar 

  • Heylighen F (2016) Stigmergy as a universal coordination mechanism I: definition and components. Cogn Syst Res 38(June):4–13

    Article  Google Scholar 

  • Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Holzhauer S, Krebs F, Ernst A (2013) Considering baseline homophily when generating spatial social networks for agent-based modelling. Comput Math Organ Theory 19(2):128–150

    Article  Google Scholar 

  • Jessor R, Colby A, Shweder RA (1996) Ethnography and human development: context and meaning in social inquiry. University of Chicago Press, Chicago

    Google Scholar 

  • Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. ICLR 2017. https://openreview.net/forum?id=SJU4ayYgl. Accessed 24 Nov 2022.

  • Lazer D, Pentland A, Adamic L, Aral S, Barabasi A-L, Brewer D, Christakis N et al (2009) Social science. Computational social science. Science 323(5915):721–723

    Article  Google Scholar 

  • Leplège Alain (2003) Editorial. Epistemology of measurement in the social sciences: historical and contemporary perspectives. Soc Sci Inf. Information Sur Les Sciences Sociales 42(4):451–62

    Google Scholar 

  • Li T, Yi H, James E, and Ishanu C. 2019. “Long-range event-level prediction and response simulation for urban crime and global terrorism with granger networks.” arXiv [stat.AP]. arXiv. http://arxiv.org/abs/1911.05647. Accessed 24 Nov 2022.

  • Liu Xi, Gong Li, Gong Y, Liu Yu (2015) Revealing travel patterns and city structure with taxi trip data. J Transp Geogr 43(February):78–90

    Article  Google Scholar 

  • Markusen JR, Venables AJ (1988) Trade policy with increasing returns and imperfect competition: contradictory results from competing assumptions. J Int Econ 24(3):299–316. https://doi.org/10.1016/0022-1996(88)90039-6

    Article  Google Scholar 

  • DS Massey and NA Denton (1993) American Apartheid: Segregation and the Making of the Underclass. Harvard University Press, Cambridge, MA

    Google Scholar 

  • Massey DS, Denton NA (1988) The dimensions of residential segregation. Soc Forces 67(2):281. https://doi.org/10.2307/2579183

    Article  Google Scholar 

  • McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444

    Article  Google Scholar 

  • Nelson D, Yackee SW (2012) Lobbying coalitions and government policy change: an analysis of federal agency rulemaking. J Politics 74(2):339–353. https://doi.org/10.1017/S0022381611001599

    Article  Google Scholar 

  • Newell A, Simon HA et al (1972) Human problem solving, vol 104. Prentice-Hall Englewood Cliffs, NJ

    Google Scholar 

  • Ney S (2009) Resolving messy policy problems: handling conflict in environmental, transport, health and ageing policy. Routledge, London. https://doi.org/10.4324/9781849772389.

    Book  Google Scholar 

  • Pacewicz Josh (2020) What can you do with a single case? How to think about ethnographic case selection like a historical sociologist. Sociol Methods Res. https://doi.org/10.1177/0049124119901213

    Article  Google Scholar 

  • Padgett JF, Powell WW (2012) The emergence of organizations and markets. Princeton University Press, New Jersey

    Google Scholar 

  • Pareja, A, Domeniconi, G, Chen, J, Ma, T, Suzumura, T, Kanezashi, H, Kaler, T, Schardl, T, and C Leiserson (2020) EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 5363-5370. https://doi.org/10.1609/aaai.v34i04.5984

  • Pielke R, Wigley T, Green C (2008) Dangerous assumptions. Nature 452(7187):531–532. https://doi.org/10.1038/452531a

    Article  Google Scholar 

  • Rabinowitz NC, Perbet F, Song HF, Zhang C, Ali Eslami SM, Botvinick M (2018) Machine theory of mind. In Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4218-4227

  • Reardon SF, Fox L, Townsend J (2015) Neighborhood income composition by household race and income, 1990–2009. Ann Am Acad Pol Soc Sci 660(1):78–97. https://doi.org/10.1177/0002716215576104

    Article  Google Scholar 

  • Royce E (2018) Poverty and power: the problem of structural inequality. Rowman & Littlefield, Washington, DC

    Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  • Sahlins M (1995) How “natives” think: about Captain Cook, for example. University of Chicago Press, Chicago

    Book  Google Scholar 

  • Salvatier J, Wiecki TV, Fonnesbeck C (2016) Probabilistic programming in python using PyMC3. PeerJ Comput Sci 2:e55

    Article  Google Scholar 

  • Schneider A, Ingram H (1990) Behavioral assumptions of policy tools. J Politics 52(2):510–529. https://doi.org/10.2307/2131904

    Article  Google Scholar 

  • Shultz KS, Whitney DJ, Zickar MJ (2013) Measurement theory in action: case studies and exercises, 2nd edn. Routledge, Oxfordshire

    Book  Google Scholar 

  • Small ML (2009) `How many cases do i need?’: On science and the logic of case selection in field-based research. Ethnography 10(1):5–38

    Article  Google Scholar 

  • Sriram A, Jun H, Satheesh S, Coates A (2017) “Cold fusion: training Seq2Seq models together with language models.” arXiv [cs.CL]. arXiv. http://arxiv.org/abs/1708.06426.

  • Star SL, Griesemer JR (1989) Institutional ecology, translations’ and boundary objects: amateurs and professionals in Berkeley’s museum of vertebrate zoology, 1907–39. Soc Stud Sci 19(3):387–420

    Article  Google Scholar 

  • Tambe M, Adibi J, Al-Onaizan Y, Erdem A, Kaminka GA, Marsella SC, Muslea I (1999) Building agent teams using an explicit teamwork model and learning. Artif Intell 110(2):215–239

    Article  Google Scholar 

  • Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1134–1142

    Article  Google Scholar 

  • Wasserman S, Katherine F (1994) Social network analysis methods and applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Weber M (1958) Science as a vocation. Daedalus 87(1):111–134

    Google Scholar 

  • Wilson WJ (1987) The truly disadvantaged Chicago. University of Chicago Press, Chicago

    Google Scholar 

  • Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82

    Article  Google Scholar 

  • Wolpert DH, Macready WG (2005) Coevolutionary free lunches. IEEE Trans Evol Comput 9(6):721–735

    Article  Google Scholar 

  • Wood F, Meent WF, and Mansinghka V. 2014. “A new approach to probabilistic programming inference.” In S Kaski & J Corander Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. PMLR(33):1024–1032.

  • Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SYu (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.2978386

    Article  Google Scholar 

  • Xyrichis A, Ream E (2008) Teamwork: a concept analysis. J Adv Nurs 61(2):232–241

    Article  Google Scholar 

  • Yung L, Louder E, Gallagher LA, Jones K, Wyborn C (2019) How methods for navigating uncertainty connect science and policy at the water-energy-food nexus. Front Environ Sci. https://doi.org/10.3389/fenvs.2019.00037

    Article  Google Scholar 

  • Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, and Sun M (2020) Graph neural networks: a review of methods and applications. AI Open (1):57-81. https://doi.org/10.1016/j.aiopen.2021.01.001

Download references

Acknowledgements

The authors gratefully acknowledge DARPA grant HR00111820006 for the Ground Truth program, for Adam Russell, the architect of that program, and other participants in the program (and authors of articles in this special issue) for their inspiration as fellow travelers and contributors to this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Evans.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Graziul, C., Belikov, A., Chattopadyay, I. et al. Does big data serve policy? Not without context. An experiment with in silico social science. Comput Math Organ Theory 29, 188–219 (2023). https://doi.org/10.1007/s10588-022-09362-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-022-09362-3

Keywords