A Joint Human/Machine Process for Coding Events and Conflict Drivers

Heap, Bradford; Krzywicki, Alfred; Schmeidl, Susanne; Wobcke, Wayne; Bain, Michael

doi:10.1007/978-3-319-69179-4_45

Bradford Heap¹⁸,
Alfred Krzywicki¹⁸,
Susanne Schmeidl¹⁹,
Wayne Wobcke¹⁸ &
…
Michael Bain¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10604))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3098 Accesses
3 Citations

Abstract

Constructing datasets to analyse the progression of conflicts has been a longstanding objective of peace and conflict studies research. In essence, the problem is to reliably extract relevant text snippets and code (annotate) them using an ontology that is meaningful to social scientists. Such an ontology usually characterizes either types of violent events (killing, bombing, etc.), and/or the underlying drivers of conflict, themselves hierarchically structured, for example security, governance and economics, subdivided into conflict-specific indicators. Numerous coding approaches have been proposed in the social science literature, ranging from fully automated “machine” coding to human coding. Machine coding is highly error prone, especially for labelling complex drivers, and suffers from extraction of duplicated events, but human coding is expensive, and suffers from inconsistency between annotators; thus hybrid approaches are required. In this paper, we analyse experimentally how human input can most effectively be used in a hybrid system to complement machine coding. Using two newly created real-world datasets, we show that machine learning methods improve on rule-based automated coding for filtering large volumes of input, while human verification of relevant/irrelevant text leads to improved performance of machine learning for predicting multiple labels in the ontology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
PETRARCH2, the most recent release, was used in our experiments (http://github.com/openeventdata/petrarch2).
2.
This appears in timings in our log files and was explicitly stated by one of our coders.
3.
http://www.cs.waikato.ac.nz/ml/weka/.

References

Azar, E.E.: The conflict and peace data bank (COPDAB) project. J. Confl. Resolut. 24, 143–152 (1980)
Article Google Scholar
Bagozzi, B.E., Schrodt, P.A.: The dimensionality of political news reports. Paper Presented at the Second Annual General Conference of the European Political Science Association, Berlin (2012)
Google Scholar
Bond, D., Bond, J., Oh, C., Jenkins, J.C., Taylor, C.L.: Integrated data for events analysis (IDEA): an event typology for automated events data development. J. Peace Res. 40, 733–745 (2003)
Article Google Scholar
Bond, D., Jenkins, J.C., Taylor, C.L., Schock, K.: Mapping mass political conflict and civil society: issues and prospects for the automated development of event data. J. Confl. Resolut. 41, 553–579 (1997)
Article Google Scholar
Gerner, D.J., Schrodt, P.A., Yilmaz, O., Abu-Jabr, R.: Conflict and mediation event observations (CAMEO): a new event data framework for the analysis of foreign policy interactions. Paper Presented at the Annual Meetings of the International Studies Association, New Orleans, LA (2002)
Google Scholar
LaFree, G., Dugan, L.: Introducing the global terrorism database. Terrorism Political Violence 19, 181–204 (2007)
Article Google Scholar
Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. Paper Presented at the Annual Meetings of the International Studies Association, San Francisco, CA (2013)
Google Scholar
McClelland, C.: World Event/Interaction Survey (WEIS) Project 1966–1978. Inter-University Consortium for Political and Social Research (1978)
Google Scholar
Murphy, K.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA (2012)
MATH Google Scholar
Nardulli, P.F., Althaus, S.L., Hayes, M.: A progressive supervised-learning approach to generating rich civil strife data. Sociol. Methodol. 45, 148–183 (2015)
Article Google Scholar
Raleigh, C., Linke, A., Hegre, H., Karlsen, J.: Introducing ACLED: an armed conflict location and event dataset special data feature. J. Peace Res. 47, 651–660 (2010)
Article Google Scholar
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 616–623 (2003)
Google Scholar
Schneider, K.-M.: Techniques for improving the performance of Naive Bayes for text classification. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 682–693. Springer, Heidelberg (2005). doi:10.1007/978-3-540-30586-6_76
Chapter Google Scholar
Schrodt, P.A., Davis, S.G., Weddle, J.L.: Political science: KEDS—a program for the machine coding of event data. Soc. Sci. Comput. Rev. 12, 561–587 (1994)
Article Google Scholar
Schrodt, P.A., Yonamine, J.E.: A guide to event data: past, present, and future. All Azimuth 2(2), 5–22 (2013)
Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 90–94 (2012)
Google Scholar

Download references

Acknowledgements

This work was supported by Data to Decisions Cooperative Research Centre. We are grateful to Josie Gardner for labelling the ICG DRC dataset, and to Michael Burnside and Kaitlyn Hedditch for coding the AfPak event data.

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
Bradford Heap, Alfred Krzywicki, Wayne Wobcke & Michael Bain
School of Social Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
Susanne Schmeidl

Authors

Bradford Heap
View author publications
You can also search for this author in PubMed Google Scholar
Alfred Krzywicki
View author publications
You can also search for this author in PubMed Google Scholar
Susanne Schmeidl
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Wobcke
View author publications
You can also search for this author in PubMed Google Scholar
Michael Bain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bradford Heap .

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore, Singapore
Gao Cong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng
Macquarie University, Sydney, New South Wales, Australia
Wei Emma Zhang
Wuhan University, Wuhan, China
Chengliang Li
Nanyang Technological University, Singapore, Singapore
Aixin Sun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heap, B., Krzywicki, A., Schmeidl, S., Wobcke, W., Bain, M. (2017). A Joint Human/Machine Process for Coding Events and Conflict Drivers. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-69179-4_45
Published: 14 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69178-7
Online ISBN: 978-3-319-69179-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics