Identifying the Responsible Group for Extreme Acts of Violence Through Pattern Recognition

Hashemi, Mahdi; Hall, Margeret

doi:10.1007/978-3-319-91716-0_47

Mahdi Hashemi²² &
Margeret Hall²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10923))

Included in the following conference series:

International Conference on HCI in Business, Government, and Organizations

3727 Accesses
2 Citations

Abstract

The expansion of Internet has eased the broadcasting of data, information, and propaganda. The availability of myriads of social and televised media have turned the spotlight on violent extremism, widened the rift between different sides of the spectrum, and expanded the scope and impact of ideology-oriented acts of violence on citizens and nations. The human casualties and psychological impacts on societies make any study on such acts worthwhile, let alone attempting to detect patterns among them. This study focuses on mining the information about each violent act, including human casualties and fatalities, level of coordination and expertise, importance of the targeted process, and the extent of its impact on the process, to identify the responsible group. Decision tree, a non-linear classifier, reached 20% cross-validation accuracy in identifying the correct group among 38 groups. This is the highest accuracy achieved in comparison with other linear classifiers, including Perceptron, SVM, and least squares. Our results also underscored the human casualties and fatalities as the most important predictors. The other four variables, including level of coordination, level of expertise, importance of the targeted process, and the extent of the impact on the process were all partly correlated and less helpful. However, the single feature, generated by linear combination of these four features using PCA, was as good of a predictor as the human casualties and fatalities.

You have full access to this open access chapter, Download conference paper PDF

Incidence, Frequency and Consequences of Violence

Quality of Classification Approaches for the Quantitative Analysis of International Conflict

Machine learning methods for “wicked” problems: exploring the complex drivers of modern slavery

Article Open access 17 November 2021

Keywords

1 Introduction

The advance of technology, facility of satellite communications, and ubiquity of the Internet [1] have multiplied the extent and impact of ideology-, politically-motivated acts of violence, have expanded their scope beyond specific locales and regions, and have made them a growing threat against humanity, across the world [2]. In such violence, groups or individuals commit acts of unbelievable brutality against a leader, citizens, an entire city, or nation. The motivation behind it is usually a radicalized interpretation of defending a greater good, politics, or extreme ideology [3]. However, such acts of violence are always disturbing to people’s minds and their everyday lives and destabilizing to societies and peace. They are associated with human and economic tolls, and challenge sustainable development in modern and third-world countries [4].

In the age of information, there are more detailed datasets regarding such violent acts across the world. This information includes the human casualties and fatalities, the level of coordination and expertise, the targeted process, and how much the process was adversely affected. Groups and individuals committing such violent acts are usually associated with a Violent Extremist Organization (VEO) [5]. The purpose of this study is to investigate the possibility of recognizing the responsible VEO based on the information available about the violent act. We explore different machine learning models, appropriate based on the sample size, for this purpose. Section 2 describes our dataset and the features used as input to the machine. Section 3 explains the machine learning models selected for this study. Section 4 reports and discusses the results and Sect. 5 provides insight into our results along with future directions.

2 Data and Features

Information about violent acts carried out by VEOs across the world is provided to this study by Radical and Violent Extremism (RAVE) Laboratory at The University of Nebraska Omaha. They developed this dataset by first relying an open-source database on characteristics of extreme acts of violence, called GTD [6]. Violent acts are included in the GTD if they have a political, social, religious, or economic motive, are intended to coerce, intimidate, or publicize the cause, and/or if they violate international humanitarian law. Among other sources for their dataset are: historical accounts described in open-source data gathered from academic and government sources, scholarly case studies, public-records databases (e.g. Lexis-Nexis), and primary documents from VEOs themselves, such as propaganda and websites. Information were gathered by graduate students with expertise in criminology, industrial and organizational psychology, and information science and technology from a cross functional research center. Coders received 20 h of training prior to data collection on the nature of VEOs, extremist recruitment, and related manifestations in the context of extremism as well as on search tactics and filtering information.

There are six features associated with each violent act in our dataset, including: number of casualties, number of fatalities, level of coordination, level of expertise, importance of the process targeted by the violent act, and scope of the impact on that process. All features are numerical. Casualties range from 0 to 1500 with an average of 7, fatalities range from 0 to 1180 with an average of 5, and the other four variables range from 1 to 5.

Pattern recognition models require large number of training samples from each class, usually linear or exponential with respect to the number of features. Thus, VEOs with sample sizes smaller than 50 were removed. We also removed records containing unknown variables. Eventually our dataset contained 5,661 violent acts by 38 different VEOs from Jul 21st, 1972 until Dec 31st, 2016. The histogram in Fig. 1 shows the number of violent acts carried out by each VEO and the histogram in Fig. 2 shows the number of violent acts per year.

Table 1 shows the correlation coefficient between pairs of variables. Number of casualties and number of fatalities are not much correlated with each other or with other variables. However, the other four variables are partly correlated with each other. The largest correlation coefficient in Table 1 (0.71) means that when the targeted process is important, the chances are high that the impact on that process will be also high. The remaining correlation coefficients in Table 1 mean that, on one hand, the level of expertise and the level coordination are partly correlated which is not surprising, and on the other hand, when the level of coordination and expertise are high, the violent act usually targets an important process and results in large impacts on that process. In order to be able to investigate the importance of different features in prediction models, we will not use feature generation methods for now, despite the slight correlation among the last four features. However, all features are normalized to have a zero average and unit variance.

Table 1. Correlation coefficient between pairs of features.

Full size table

Figure 3 shows the boxplots of casualties in each of the 38 classes. Number of casualties is one of the six features. Each group is a class in this plot, represented by an individual box. The boxplot is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. More overlap among boxes means that feature is less diverse and less helpful in distinguishing among classes. The boxplots in Fig. 3 show that the values of this feature are well diversified across different classes (little overlap among boxes) and they can be effective in recognizing classes. Figures 4, 5, 6, 7 and 8 represent the same type of plot for the other five features. Little overlap among boxes, observed in these plots, similarly indicates their effectiveness in distinguishing among classes.

3 Prediction Models

Decision tree, SVM, least squares, and Perceptron are the four classifiers that we applied for our prediction purposes. Here we briefly explain each of them.

3.1 Least Squares

The output of the least squares (LS) predictor is x^Tw where w is the extended weight vector to include the threshold or intercept (w₀) and x is the extended feature vector to include a 1. The desired output is denoted with y_i. The weight vector will be computed so as to minimize the sum of square errors between the desired and true outputs [7], that is:

$$ J(w) = \sum\limits_{i = 1}^{N} {\left( {y_{i} - x_{i}^{T} w} \right)^{2} } $$

(1)

where N is the number of training samples. Minimizing the cost function in Eq. 1 with respect to w results in:

$$ w = \left( {X^{T} X} \right)^{ - 1} X^{T} y $$

(2)

where X is an N × (l + 1) matrix whose rows are the feature vectors with an additional 1, l is the number of features, and y is a vector consisting of the corresponding desired responses:

$$ X = \left[ {\begin{array}{*{20}c} {x_{1}^{T} } \\ {x_{2}^{T} } \\ \vdots \\ {x_{N}^{T} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {x_{11} } & {x_{12} } & \ldots & {x_{1l} } & 1 \\ {x_{21} } & {x_{22} } & \ldots & {x_{2l} } & 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ {x_{N1} } & {x_{N2} } & \ldots & {x_{Nl} } & 1 \\ \end{array} } \right]\;and\;y = \left[ {\begin{array}{*{20}c} {y_{1} } \\ {y_{2} } \\ \vdots \\ {y_{N} } \\ \end{array} } \right] $$

(3)

3.2 Perceptron

The perceptron cost function is defined as [8]:

$$ J(\varvec{w}) = \sum\limits_{i = 1}^{N} {y_{i} w^{T} x_{i} } ,\quad y_{i} = \left\{ {\begin{array}{*{20}c} { + 1} & {if\;wx_{i} > 0\;but\;x_{i} \in \omega_{2} } \\ { - 1} & {if\;wx_{i} < 0\;but\;x_{i} \in \omega_{1} } \\ 0 & {if\;wx_{i} > 0\;and\;x_{i} \in \omega_{1} } \\ 0 & {if\;wx_{i} < 0\;and\;x_{i} \in \omega_{2} } \\ \end{array} } \right. $$

(4)

We can iteratively find the weight vector that minimizes the perceptron cost function using the gradient descent scheme [8, 9]:

$$ \varvec{w}_{t + 1} = \varvec{w}_{t} +\Delta \varvec{w}_{t} = \varvec{w}_{t} - \alpha \frac{{\partial J\left( \varvec{w} \right)}}{{\partial \varvec{w}}}|_{{\varvec{w} = \varvec{w}_{t} }} = \varvec{w}_{t} - \alpha \sum\limits_{i = 1}^{N} {y_{i} \varvec{x}_{i} } $$

(5)

where w_t is the weight vector estimate at the t-th iteration and α is the training rate which is a small positive number.

3.3 SVM

SVM [10,11,12] maximizes the margin around the hyperplane separating the two classes. If the two classes are not linearly separable, then it is not possible to find an empty band separating them. Each training sample will have one of the following constraints:

it falls outside the band and is correctly classified, i.e., y_i(w^Tx_i + w₀) > 1,
it falls inside the band and is correctly classified, i.e., 0 ≤ y_i(w^Tx_i + w₀) ≤ 1, or
it is misclassified, i.e., y_i(w^Tx_i + w₀) < 0.

We can summarize the three above constraints in one by introducing the slack variable (ξ_i) [10]:

$$ y_{i} \left( {w^{T} x_{i} + w_{0} } \right) \ge 1 - \xi_{i} ,\;\left\{ {\begin{array}{*{20}l} {\xi_{i} = 0} \hfill & {if\;x_{i} \;is\;outside\;the\;band\;and\;correctly\;classified} \hfill \\ {0 < \xi_{i} \le 1} \hfill & {if\;x_{i} \;is\;inside\;the\;band\;and\;correctly\;classified} \hfill \\ {\xi_{i} > 1} \hfill & {if\;x_{i} \;is\;misclassified} \hfill \\ \end{array} } \right. $$

(6)

The optimization task is now to maximize the margin (minimize the norm) while minimizing the slack variables [10]. The mathematical formulation for finding w and w₀ of the hyperplane follows:

$$ \left\{ {\begin{array}{*{20}l} {\hbox{min} imize\;J(w,w_{0} ,\xi ) = \frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{N} {\xi_{i} = \frac{1}{2}w^{T} w + C\sum\limits_{i = 1}^{N} {\xi_{i} } } } \hfill & {({\mathbf{7}})} \hfill \\ {subject\;to\;y_{i} \;(w^{T} x_{i} + w_{0} ) \ge 1 - \xi_{i} ,\quad i = 1,2, \ldots ,N} \hfill & {({\mathbf{8}})} \hfill \\ {\xi_{i} \ge 0,\quad i = 1,2, \ldots ,N} \hfill & {({\mathbf{9}})} \hfill \\ \end{array} } \right. $$

The smoothing parameter C is a positive user-defined constant that controls the trade-off between the two competing terms in the cost function.

3.4 Decision Tree

Ordinary binary decision trees (OBDTs) split the feature space into hyperrectangles with sides parallel to the axes [13]. Nodes in an OBDT are binary questions whose answers are either yes or no and the answer to these questions determines the path to a leaf which is equivalent to a response. Questions at nodes are of the form “is x_k ≤ α ?” where x_k is the k-th feature and α is a threshold. To predict the response of an irresponsive sample, one needs to answer the question at each node and traverse to the left or right node based on the answer until a leaf is reached.

The best question to ask at a node is the one which maximizes the impurity decrease (ΔI) [13]:

$$ \Delta I = I - \frac{{N_{Y} }}{N}I_{Y} - \frac{{N_{N} }}{N}I_{N} $$

(10)

where I is the impurity of the ancestor node, N is the number of training samples in the ancestor node, N_Y is the number of training samples in the descendant node corresponding with the answer “yes” to the question, N_N is the number of training samples in the descendant node corresponding with the answer “no” to the question, and I_Y and I_N are the impurities of the descendent nodes. Entropy of training samples at a node, in Eq. 11, is a common definition of node impurity in classification tasks (I_{classification}) [13], where M is the number of classes, and N(ω_i) is the number of training samples from class ω_i at this node.

$$ I_{classification} = - \sum\limits_{i = 1}^{M} {\frac{{N\left( {\omega_{i} } \right)}}{N}log_{2} \frac{{N\left( {\omega_{i} } \right)}}{N}} $$

(11)

A node is considered a leaf if the maximum impurity decrease (ΔI_max) for that node is less than a user-defined threshold, although other alternative conditions have been used in the literature [13, 14]. The majority rule in case of classification is commonly used to determine the response at that leaf [13].

The relative importance of the k-th feature (R(x_k)) is the sum of the impurity decrease (∆I) over all internal nodes (υ_i, i = 1,…, J) for which x_k was chosen as the splitting variable:

$$ R\left( {x_{k} } \right) = \sum\limits_{i = 1}^{J} {\Delta I\left( {\upsilon_{i} = x_{k} } \right)} $$

(12)

4 Results

A 10-fold cross validation of decision tree resulted in a generalization accuracy of 20%. Maximum impurity decrease (ΔI_max) was optimized to 0.055 using an internal cross-validation among training samples. This is a major improvement over a random classifier with an accuracy of 2.6%. Table 2 shows the recall and precision for each class. Larger values for both of these metrics indicate higher accuracies. While a small value for recall means that the classifier has not been able to identify all true samples from this class, a small value for precision means that only a small proportion of assigned samples to this class by the classifier truly belong to this class. In other words, a small value for recall means that the classifier is inaccurate and unfairly stingy to assign samples to this class, while a small value for precision means that the classifier is inaccurate and unfairly generous in assigning samples to this class. A zero value for both of these metrics means that no sample has been assigned to this class by the classifier. Rows shown with a bold font in Table 2 highlight the classes for whom the classifier performs more accurately and rows including a * in the beginning of the class name, indicate classes for whom the classifier performs less accurately. Class 4, 9, and 13 achieve the highest recall and precision which means the selected features in this study are very well capable of distinguishing these classes from others. This could also be inferred from the box plots in Figs. 3, 4, 5, 6, 7 and 8.

Table 2. Recall and precision for different classes obtained from 10-fold cross validation of decision tree.

Full size table

There is no meaningful correlation between recall and precision on one side and the size of classes (see Fig. 1) on the other side, which has also been observed by other researchers [15]. However, investigating the confusion matrix shows that largely represented classes among the training samples (e.g., class 22 and 23) have a higher systematic tendency to eat the samples from other classes during the classification. However, their precision stays at a reasonable rate because of their large size. For example, out of 5,661 test samples, 826 of them are assigned to class 22 by the classifier while only 205 of them truly belong to this class. Yet its recall and precision are 0.26 and 0.25 which is around the average among all classes. 371 samples were assigned to class 23 by the classifier, though only 88 of them truly belong to this class. The low recall for large classes, e.g. class 22 and 23, is surprising because it means many samples that truly belong to these classes (around 75% of them) are wrongly assigned to other classes by the classifier. This is mainly due to the large number of classes (38) and the overlap among classes in the feature space. While having more training samples might be helpful, finding features that are stronger in distinguishing among classes would certainly improve the accuracy. Besides, a larger sample size would allow the application of more nonlinear classifiers such as multi-layer Perceptron and kernel approaches such as kernel SVM and non-parametric Bayesian.

Table 3 shows how useful each feature has been in developing the decision tree. This can help in filtering out useless features or combining less useful features. Interestingly the number of casualties by itself makes a 41% contribution in developing the tree. The number of casualties and fatalities together from 64% of the decision nodes in the decision tree which means they are strong predictors of our classes. However, the other four features, together, only make a 36% contribution in developing the decision tree.

Table 3. Relative importance of different features in developing the decision tree.

Full size table

Based on the correlation among the last four variables (see Table 2) and their relatively lower importance in the decision tree classifier, we combined them in one feature using principle component analysis (PCA). Table 4 shows the relative importance of new features in developing a decision tree. The new PCA-based feature is almost as good as all four original features combined.

Table 4. Relative importance of different features, after combining the last four features using PCA, in developing the decision tree.

Full size table

We used the number of casualties, fatalities, and the PCA-based feature to measure the accuracy of five different classifiers using 10-fold cross validation. These accuracies are reported in Table 5. Hyper-parameters for each classifier are optimized using cross-validation among training samples. While all classifiers outperform the random classifier, SVM is the least accurate and decision tree is the most accurate classifier. The fact that the only non-linear classifier, decision tree, outperforms all the other linear classifiers (SVM, least squares, and Perceptron), informs us of the complexity of the class distributions in the feature space which could be best separated by non-linear classifiers.

Table 5. Generalization accuracy of different classifiers obtained from 10-fold cross validation.

Full size table

5 Conclusions

This study was the first attempt to predict the responsible VEO for acts of violence based on human casualties and fatalities, level of coordination and expertise, importance of the targeted process, and the extent of its impact on the process. The two first features proved the best predictors while the last four features showed slight correlation and less predictive power. While decision tree, a non-linear classifier, outperformed other linear classifiers, its accuracy does not reach above 20% in identifying the correct group among 38 groups. The inability of classifiers to reach higher accuracies is the result of three shortcomings with respect to our dataset: (a) the features are not predictive enough of the classes, (b) the training sample size from different classes is imbalanced, and (c) the low number of training samples does not allow the application and proper training of non-linear classifiers. In our future work, we intend to investigate other features including weapon type, economical damage, location, and time [16] as additional predictors. These features are mostly unknown at the time. Additionally, we are going to apply more flexible classifiers, such as deep networks [17] and kernel methods, as our dataset is expanding.

References

Hashemi, M., Sadeghi-Niaraki, A.: A theoretical framework for ubiquitous computing. Int. J. Adv. Pervasive Ubiquit. Comput. 8(2), 1–15 (2016)
Article Google Scholar
Taylor, R., Fritsch, E., Liederbach, J.: Digital Crime and Digital Terrorism. Prentice Hall Press, Upper Saddle River (2014)
Google Scholar
Martin, G.: Understanding Terrorism: Challenges, Perspectives and Issues. SAGE Publications, Thousand Oaks (2017)
Google Scholar
Altier, M., Thoroughgood, C., Horgan, J.: Turning away from terrorism: lessons from psychology, sociology, and criminology. J. Peace Res. 51(5), 647–661 (2014)
Article Google Scholar
Ligon, G., Logan, M., Hall, M., Derrick, D., Fuller, J., Church, S.: The Jihadi Industry: Assessing the Organizational, Leadership, and Cyber Profiles. Final report, Department of Homeland Science and Technology Directorate’s Office of University Programs, College Park, MD: START (2017)
Google Scholar
LaFree, G., Dugan, L., Miller, E.: Global terrorism database. In: National Consortium for the Study of Terrorism and Responses to Terrorism (START). https://www.start.umd.edu/gtd
Leon, S.: Linear algebra with applications, 8th edn. Prentice Hall, Upper Saddle River (2010)
Google Scholar
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)
Article Google Scholar
Minsky, M., Papert, S.A.: Perceptrons, Expanded Edition. MIT Press, Cambridge (1988)
MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995). https://doi.org/10.1007/978-1-4757-3264-1
Book MATH Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
MATH Google Scholar
Ripley, B.: Neural networks and related methods for classification. J. Roy. Stat. Soc. B 56(3), 409–456 (1994)
MathSciNet MATH Google Scholar
Hashemi, M.: Intelligent GPS trace management for human mobility pattern detection. Cogent Eng. 4(1), 1390813 (2017)
Article Google Scholar
Hashemi, M.: Reusability of the Output of Map-Matching Algorithms Across Space and Time Through Machine Learning. IEEE Trans. Intell. Transp. Syst. 18(11), 3017–3026 (2017)
Article Google Scholar
Hashemi, M., Karimi, H.: A machine learning approach to improve the accuracy of GPS-based map-matching algorithms. In: Proceedings of the IEEE 17th International Conference on Information Reuse and Integration, Pittsburgh, PA, pp. 77–86 (2016a)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Technology, University of Nebraska Omaha, Omaha, USA
Mahdi Hashemi & Margeret Hall

Authors

Mahdi Hashemi
View author publications
You can also search for this author in PubMed Google Scholar
Margeret Hall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahdi Hashemi .

Editor information

Editors and Affiliations

Missouri University of Science and Technology, Rolla, MO, USA
Fiona Fui-Hoon Nah
University of Hawaii at Manoa, Honolulu, HI, USA
Bo Sophia Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hashemi, M., Hall, M. (2018). Identifying the Responsible Group for Extreme Acts of Violence Through Pattern Recognition. In: Nah, FH., Xiao, B. (eds) HCI in Business, Government, and Organizations. HCIBGO 2018. Lecture Notes in Computer Science(), vol 10923. Springer, Cham. https://doi.org/10.1007/978-3-319-91716-0_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-91716-0_47
Published: 05 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91715-3
Online ISBN: 978-3-319-91716-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Identifying the Responsible Group for Extreme Acts of Violence Through Pattern Recognition

Abstract

Similar content being viewed by others

Incidence, Frequency and Consequences of Violence

Quality of Classification Approaches for the Quantitative Analysis of International Conflict

Machine learning methods for “wicked” problems: exploring the complex drivers of modern slavery

Keywords

1 Introduction

2 Data and Features

3 Prediction Models

3.1 Least Squares

3.2 Perceptron

3.3 SVM

3.4 Decision Tree

4 Results

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Identifying the Responsible Group for Extreme Acts of Violence Through Pattern Recognition

Abstract

Similar content being viewed by others

Incidence, Frequency and Consequences of Violence

Quality of Classification Approaches for the Quantitative Analysis of International Conflict

Machine learning methods for “wicked” problems: exploring the complex drivers of modern slavery

Keywords

1 Introduction

2 Data and Features

3 Prediction Models

3.1 Least Squares

3.2 Perceptron

3.3 SVM

3.4 Decision Tree

4 Results

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation