Does big data serve policy? Not without context. An experiment with in silico social science

Graziul, Chris; Belikov, Alexander; Chattopadyay, Ishanu; Chen, Ziwen; Fang, Hongbo; Girdhar, Anuraag; Jia, Xiaoshuang; Krafft, P. M.; Kleiman-Weiner, Max; Lewis, Candice; Liang, Chen; Muchovej, John; Vientós, Alejandro; Young, Meg; Evans, James

doi:10.1007/s10588-022-09362-3

Does big data serve policy? Not without context. An experiment with in silico social science

S.I. : Ground Truth: in silico Social Science (GTIS3)
Published: 30 November 2022

Volume 29, pages 188–219, (2023)
Cite this article

Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Chris Graziul¹,
Alexander Belikov¹,
Ishanu Chattopadyay¹,
Ziwen Chen¹,
Hongbo Fang²,
Anuraag Girdhar¹,
Xiaoshuang Jia³,
P. M. Krafft⁴,
Max Kleiman-Weiner^5,6,
Candice Lewis¹,
Chen Liang¹,
John Muchovej^5,6,
Alejandro Vientós^5,7,
Meg Young⁸ &
…
James Evans ORCID: orcid.org/0000-0001-9838-0707^1,9

5225 Accesses
6 Citations
Explore all metrics

Abstract

The DARPA Ground Truth project sought to evaluate social science by constructing four varied simulated social worlds with hidden causality and unleashed teams of scientists to collect data, discover their causal structure, predict their future, and prescribe policies to create desired outcomes. This large-scale, long-term experiment of in silico social science, about which the ground truth of simulated worlds was known, but not by us, reveals the limits of contemporary quantitative social science methodology. First, problem solving without a shared ontology—in which many world characteristics remain existentially uncertain—poses strong limits to quantitative analysis even when scientists share a common task, and suggests how they could become insurmountable without it. Second, data labels biased the associations our analysts made and assumptions they employed, often away from the simulated causal processes those labels signified, suggesting limits on the degree to which analytic concepts developed in one domain may port to others. Third, the current standard for computational social science publication is a demonstration of novel causes, but this limits the relevance of models to solve problems and propose policies that benefit from the simpler and less surprising answers associated with most important causes, or the combination of all causes. Fourth, most singular quantitative methods applied on their own did not help to solve most analytical challenges, and we explored a range of established and emerging methods, including probabilistic programming, deep neural networks, systems of predictive probabilistic finite state machines, and more to achieve plausible solutions. However, despite these limitations common to the current practice of computational social science, we find on the positive side that even imperfect knowledge can be sufficient to identify robust prediction if a more pluralistic approach is applied. Applying competing approaches by distinct subteams, including at one point the vast TopCoder.com global community of problem solvers, enabled discovery of many aspects of the relevant structure underlying worlds that singular methods could not. Together, these lessons suggest how different a policy-oriented computational social science would be than the computational social science we have inherited. Computational social science that serves policy would need to endure more failure, sustain more diversity, maintain more uncertainty, and allow for more complexity than current institutions support.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Collaboration Between Social Sciences and Computer Science: Toward a Cross-Disciplinary Methodology for Studying Big Social Data from Online Communities

Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods

Article Open access 18 November 2021

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Althouse BM, Wenger EA, Miller JC, Scarpino SV, Allard A, Hébert-Dufresne L, and Hao H (2020) "Superspreading events in the transmission dynamics of SARS-CoV-2: Opportunities for interventions and control." PLoS Biol 18(11): e3000897. https://doi.org/10.1371/journal.pbio.3000897
Bandalos DL (2018) Measurement theory and applications for the social sciences. Guilford Publications, New York
Google Scholar
Becker HS, Gans HJ, Newman KS, Vaughan D (2004) On the value of ethnography: sociology and public policy: a dialogue. Ann Am Acad Pol Soc Sci 595(1):264–276
Article Google Scholar
Bok DC (2001) The trouble with government. Harvard University Press, Cambridge
Google Scholar
Chami I, Ying R, Ré C, Leskovec J (2019) Hyperbolic graph convolutional neural networks. Adv Neural Inf Process Syst 32(December):4869–4880
Google Scholar
Charles CZ (2003) The dynamics of racial residential segregation. Ann Rev Soc 29(1):167–207. https://doi.org/10.1146/annurev.soc.29.010202.100002
Article Google Scholar
Chattopadhyay, I. 2014. “Causality networks.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1406.6651. Accessed 24 Nov 2022
Cheng J, Greiner R, Kelly J, Bell D, Liu W (2002) Learning Bayesian networks from data: an information-theory based approach. Artif Intell 137(1):43–90
Article Google Scholar
Cusumano-Towner, MF, Feras AS, Alexander KL, and Vikash KM. (2019). “Gen: a general-purpose probabilistic programming system with programmable inference.” In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 221–36. PLDI 2019. New York, NY, USA: Association for Computing Machinery.
Dahrendorf R (1973) Homo sociologus. Routlege Kegan Paul, London
Google Scholar
Denzinger J. 1995. “Knowledge-based distributed search using teamwork.” In V Lesser (ed), Proceedings of the First International Conference on Multiagent Systems, 81–88. Association for the Advancement of Artificial Intelligence.
Dorigo M, Bonabeau E, Theraulaz G (2000) Ant algorithms and stigmergy. Futu Gener Comput Syst: FGCS 16(8):851–871
Article Google Scholar
Edelmann A, Wolff T, Montagne D, Bail CA (2020) Computational social science and sociology. Ann Rev Sociol 46(1):61–81
Article Google Scholar
Fine TL (2006) Feedforward neural network methodology. Springer Science & Business Media, Heidelberg
Google Scholar
Goertz G, Mahoney J (2012) Concepts and measurement: ontology and epistemology. Soc Sci Inf. Information Sur Les Sciences Sociales 51(2):205–16
Google Scholar
Goodman N, Vikash M, Daniel MR, Keith B, and Joshua BT. (2012). “Church: a language for generative models.” arXiv [cs.PL]. arXiv. http://arxiv.org/abs/1206.3255. Accessed 24 Nov 2022.
Granovetter M (1985) Economic action and social structure: the problem of embeddedness. Am J Soc 91(3):481–510
Article Google Scholar
Hacking I (1990) The taming of chance. Cambridge University Press, Cambridge
Book Google Scholar
Ha D, Schmidhuber J (2018) Recurrent world models facilitate policy evolution. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates Inc, New York, pp 2450–62
Google Scholar
Hall KL, Vogel AL, Huang GC, Serrano KJ, Rice EL, Tsakraklides SP, Fiore SM (2018) The science of team science: a review of the empirical evidence and research gaps on collaboration in science. Am Psychol 73(4):532–548
Article Google Scholar
Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2008) Statnet: software tools for the representation, visualization, analysis and simulation of network data. J Stat Softw 24(1):1548
Article Google Scholar
Head BW (2019) Forty years of wicked problems literature: forging closer links to policy studies. Policy Soc 38(2):180–197. https://doi.org/10.1080/14494035.2018.1488797
Article Google Scholar
Heylighen F (2016) Stigmergy as a universal coordination mechanism I: definition and components. Cogn Syst Res 38(June):4–13
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Holzhauer S, Krebs F, Ernst A (2013) Considering baseline homophily when generating spatial social networks for agent-based modelling. Comput Math Organ Theory 19(2):128–150
Article Google Scholar
Jessor R, Colby A, Shweder RA (1996) Ethnography and human development: context and meaning in social inquiry. University of Chicago Press, Chicago
Google Scholar
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. ICLR 2017. https://openreview.net/forum?id=SJU4ayYgl. Accessed 24 Nov 2022.
Lazer D, Pentland A, Adamic L, Aral S, Barabasi A-L, Brewer D, Christakis N et al (2009) Social science. Computational social science. Science 323(5915):721–723
Article Google Scholar
Leplège Alain (2003) Editorial. Epistemology of measurement in the social sciences: historical and contemporary perspectives. Soc Sci Inf. Information Sur Les Sciences Sociales 42(4):451–62
Google Scholar
Li T, Yi H, James E, and Ishanu C. 2019. “Long-range event-level prediction and response simulation for urban crime and global terrorism with granger networks.” arXiv [stat.AP]. arXiv. http://arxiv.org/abs/1911.05647. Accessed 24 Nov 2022.
Liu Xi, Gong Li, Gong Y, Liu Yu (2015) Revealing travel patterns and city structure with taxi trip data. J Transp Geogr 43(February):78–90
Article Google Scholar
Markusen JR, Venables AJ (1988) Trade policy with increasing returns and imperfect competition: contradictory results from competing assumptions. J Int Econ 24(3):299–316. https://doi.org/10.1016/0022-1996(88)90039-6
Article Google Scholar
DS Massey and NA Denton (1993) American Apartheid: Segregation and the Making of the Underclass. Harvard University Press, Cambridge, MA
Google Scholar
Massey DS, Denton NA (1988) The dimensions of residential segregation. Soc Forces 67(2):281. https://doi.org/10.2307/2579183
Article Google Scholar
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444
Article Google Scholar
Nelson D, Yackee SW (2012) Lobbying coalitions and government policy change: an analysis of federal agency rulemaking. J Politics 74(2):339–353. https://doi.org/10.1017/S0022381611001599
Article Google Scholar
Newell A, Simon HA et al (1972) Human problem solving, vol 104. Prentice-Hall Englewood Cliffs, NJ
Google Scholar
Ney S (2009) Resolving messy policy problems: handling conflict in environmental, transport, health and ageing policy. Routledge, London. https://doi.org/10.4324/9781849772389.
Book Google Scholar
Pacewicz Josh (2020) What can you do with a single case? How to think about ethnographic case selection like a historical sociologist. Sociol Methods Res. https://doi.org/10.1177/0049124119901213
Article Google Scholar
Padgett JF, Powell WW (2012) The emergence of organizations and markets. Princeton University Press, New Jersey
Google Scholar
Pareja, A, Domeniconi, G, Chen, J, Ma, T, Suzumura, T, Kanezashi, H, Kaler, T, Schardl, T, and C Leiserson (2020) EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 5363-5370. https://doi.org/10.1609/aaai.v34i04.5984
Pielke R, Wigley T, Green C (2008) Dangerous assumptions. Nature 452(7187):531–532. https://doi.org/10.1038/452531a
Article Google Scholar
Rabinowitz NC, Perbet F, Song HF, Zhang C, Ali Eslami SM, Botvinick M (2018) Machine theory of mind. In Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4218-4227
Reardon SF, Fox L, Townsend J (2015) Neighborhood income composition by household race and income, 1990–2009. Ann Am Acad Pol Soc Sci 660(1):78–97. https://doi.org/10.1177/0002716215576104
Article Google Scholar
Royce E (2018) Poverty and power: the problem of structural inequality. Rowman & Littlefield, Washington, DC
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article Google Scholar
Sahlins M (1995) How “natives” think: about Captain Cook, for example. University of Chicago Press, Chicago
Book Google Scholar
Salvatier J, Wiecki TV, Fonnesbeck C (2016) Probabilistic programming in python using PyMC3. PeerJ Comput Sci 2:e55
Article Google Scholar
Schneider A, Ingram H (1990) Behavioral assumptions of policy tools. J Politics 52(2):510–529. https://doi.org/10.2307/2131904
Article Google Scholar
Shultz KS, Whitney DJ, Zickar MJ (2013) Measurement theory in action: case studies and exercises, 2nd edn. Routledge, Oxfordshire
Book Google Scholar
Small ML (2009) `How many cases do i need?’: On science and the logic of case selection in field-based research. Ethnography 10(1):5–38
Article Google Scholar
Sriram A, Jun H, Satheesh S, Coates A (2017) “Cold fusion: training Seq2Seq models together with language models.” arXiv [cs.CL]. arXiv. http://arxiv.org/abs/1708.06426.
Star SL, Griesemer JR (1989) Institutional ecology, translations’ and boundary objects: amateurs and professionals in Berkeley’s museum of vertebrate zoology, 1907–39. Soc Stud Sci 19(3):387–420
Article Google Scholar
Tambe M, Adibi J, Al-Onaizan Y, Erdem A, Kaminka GA, Marsella SC, Muslea I (1999) Building agent teams using an explicit teamwork model and learning. Artif Intell 110(2):215–239
Article Google Scholar
Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1134–1142
Article Google Scholar
Wasserman S, Katherine F (1994) Social network analysis methods and applications. Cambridge University Press, Cambridge
Book Google Scholar
Weber M (1958) Science as a vocation. Daedalus 87(1):111–134
Google Scholar
Wilson WJ (1987) The truly disadvantaged Chicago. University of Chicago Press, Chicago
Google Scholar
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Article Google Scholar
Wolpert DH, Macready WG (2005) Coevolutionary free lunches. IEEE Trans Evol Comput 9(6):721–735
Article Google Scholar
Wood F, Meent WF, and Mansinghka V. 2014. “A new approach to probabilistic programming inference.” In S Kaski & J Corander Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. PMLR(33):1024–1032.
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SYu (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.2978386
Article Google Scholar
Xyrichis A, Ream E (2008) Teamwork: a concept analysis. J Adv Nurs 61(2):232–241
Article Google Scholar
Yung L, Louder E, Gallagher LA, Jones K, Wyborn C (2019) How methods for navigating uncertainty connect science and policy at the water-energy-food nexus. Front Environ Sci. https://doi.org/10.3389/fenvs.2019.00037
Article Google Scholar
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, and Sun M (2020) Graph neural networks: a review of methods and applications. AI Open (1):57-81. https://doi.org/10.1016/j.aiopen.2021.01.001

Download references

Acknowledgements

The authors gratefully acknowledge DARPA grant HR00111820006 for the Ground Truth program, for Adam Russell, the architect of that program, and other participants in the program (and authors of articles in this special issue) for their inspiration as fellow travelers and contributors to this project.

Author information

Authors and Affiliations

University of Chicago, Chicago, USA
Chris Graziul, Alexander Belikov, Ishanu Chattopadyay, Ziwen Chen, Anuraag Girdhar, Candice Lewis, Chen Liang & James Evans
Carnegie Mellon University, Pittsburgh, USA
Hongbo Fang
Sun Yat-sen University, Guangzhou, China
Xiaoshuang Jia
University of Oxford, Oxford, England
P. M. Krafft
MIT, Cambridge, USA
Max Kleiman-Weiner, John Muchovej & Alejandro Vientós
Harvard University, Cambridge, USA
Max Kleiman-Weiner & John Muchovej
Rutgers University, New Brunswick, USA
Alejandro Vientós
Cornell University, Ithaca, USA
Meg Young
Santa Fe Institute, Santa Fe, USA
James Evans

Authors

Chris Graziul
View author publications
Search author on:PubMed Google Scholar
Alexander Belikov
View author publications
Search author on:PubMed Google Scholar
Ishanu Chattopadyay
View author publications
Search author on:PubMed Google Scholar
Ziwen Chen
View author publications
Search author on:PubMed Google Scholar
Hongbo Fang
View author publications
Search author on:PubMed Google Scholar
Anuraag Girdhar
View author publications
Search author on:PubMed Google Scholar
Xiaoshuang Jia
View author publications
Search author on:PubMed Google Scholar
P. M. Krafft
View author publications
Search author on:PubMed Google Scholar
Max Kleiman-Weiner
View author publications
Search author on:PubMed Google Scholar
Candice Lewis
View author publications
Search author on:PubMed Google Scholar
Chen Liang
View author publications
Search author on:PubMed Google Scholar
John Muchovej
View author publications
Search author on:PubMed Google Scholar
Alejandro Vientós
View author publications
Search author on:PubMed Google Scholar
Meg Young
View author publications
Search author on:PubMed Google Scholar
James Evans
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to James Evans.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Graziul, C., Belikov, A., Chattopadyay, I. et al. Does big data serve policy? Not without context. An experiment with in silico social science. Comput Math Organ Theory 29, 188–219 (2023). https://doi.org/10.1007/s10588-022-09362-3

Download citation

Accepted: 24 May 2022
Published: 30 November 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10588-022-09362-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Does big data serve policy? Not without context. An experiment with in silico social science

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Collaboration Between Social Sciences and Computer Science: Toward a Cross-Disciplinary Methodology for Studying Big Social Data from Online Communities

Collaboration Between Social Sciences and Computer Science: Toward a Cross-Disciplinary Methodology for Studying Big Social Data from Online Communities

Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now