research-article

Open access

Application of XGBoost algorithm in Social Adaptation of Prisoners

Authors:

Yong Li,

Jie ZhouAuthors Info & Claims

CAIBDA '24: Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms

Pages 874 - 879

https://doi.org/10.1145/3690407.3690552

Published: 24 October 2024 Publication History

All formats PDF

Abstract

This study is based on Social comparison theory and symbolic interaction theory to explore the factors influencing social adaptation among prisoners. Specifically, it examines the empirical relationships between discrimination perception, interpersonal trust, hope, and social adaptation among incarcerated individuals. The research survey included 519 prisoners and the results revealed the following findings: With 90% of the data used as a training set, an algorithm model is established to predict the remaining 10% of the data, and the results show an accuracy of 96.15% in predictions. Furthermore, XGBoost can rank the importance of independent variables, allowing for more timely and effective prediction and adjustment of prisoners’ social adaption based on important factors.

1 Introduction

Prisoners, as marginalized individuals due to their involvement in illegal activities, undergo educational reform and correction within the prison system, as their behavior itself represents a failure of socialization. [1] However, following their socialization failure, prisoners often face increased societal discrimination. Cognitive transformation theory identified four main components of desistance or types of cognitive openness to change, exposure and reaction to ‘hooks of change’, the replacement of self and the transformation of the ex-offender's views regarding the deviant behaviour. Improving the social adaptability of criminals, reducing the possibility of them engaging in antisocial behaviors such as illegal and criminal activities again after being released from prison, and becoming law-abiding citizens have always been the core contents of correctional work for prison criminals. In order to find more scientific and effective methods to enhance the social adaptability of criminals, a large amount of research and criminal correction practices at home and abroad have focused on the influencing factors and mechanisms of criminal social adaptability [2]. Based on social comparison theory and symbolic interaction theory, the hope is the important contributor for improving social adaption [3]. Interpersonal trust centers around individual trust trait as its foundation, with the dynamic equilibrium between trust risk and trust expectation acting as intermediaries, ultimately resulting in trust behavior and improving social adaption [4].

Conventional psychological research techniques, such as survey-based approaches, aim to uncover underlying psychological principles and analyze the interconnections among variables using structural equation modeling[5]. Nonetheless, these methods have limitations in precisely measuring the extent of influence among different factors and accurately predicting outcomes. Conversely, artificial intelligence algorithms can construct algorithmic models based on the actual data, enhancing predictive accuracy by employing diverse algorithm types and iterative techniques. By identifying the optimal algorithm model, researchers can quantify the impact of each variable, comprehend its significance in the research context, and achieve more timely and effective forecasting and adjustment of prisoners’ social adaption. Among various artificial intelligence algorithms, XGBoost (eXtreme Gradient Boosting) has gained popularity and achieved top positions in Kaggle competitions.

Additionally, it is noteworthy that this paper has garnered over 3500 citations (according to Google Scholar, March 2, 2024) [6]. Decision trees are employed as base learners, and during each iteration, the error calculated is utilized to rectify the preceding predictor (learner), while the change in model performance is assessed using the objective function. However, in elaborating on the principles of gradient boosting, the XGBoost objective function, as defined in Eq. 1), incorporates additional regularization to mitigate overfitting, a prevalent issue in ensemble models. [7]

Nevertheless, as Eq. 1) comprises functions as parameters, traditional optimization methods are inadequate for optimizing the objective function. Instead, the model must be progressively trained, necessitating the utilization of the second-order Taylor approximation. To employ the Taylor approximation, the objective function had to be reformulated as Eq. 2).

\begin{equation} {L}^{\left( t \right)} = \mathop \sum \limits_i l\left( {{y}_i,{{\hat{y}}}_i} \right) + \mathop \sum \limits_k \Omega \left( {{f}_k} \right)\end{equation}

(1)

\begin{equation} {L}^{\left( t \right)} = \mathop \sum \nolimits_{i = 1}^n l\left( {{y}_i,\hat{y}_i^{t - 1} + {f}_t\left( {{x}_i} \right)} \right) + \Omega \left( {{f}_t} \right)\end{equation}

(2)

\begin{equation} {L}^{\left( t \right)} \cong \mathop \sum \nolimits_{i = 1}^n l\left( {{y}_i,\hat{y}_i^{t - 1} + \frac{1}{2}{h}_if_t^2\left( {{x}_i} \right)} \right) + \Omega \left( {{f}_t} \right)\end{equation}

(3)

In the XGBoost algorithm, with the previous classifiers locking a new weak classifier is added at each iteration to make the performance of the current model better. This process continues, with each new classifier considering areas where the previous ones were not performing well. The general flow of the XGBoost algorithm is illustrated in figure 1.

Figure 1.

XGBoost enhances the performance of the current model by fitting an additional weak classifier, without modifying the previous classifier. This iterative process continues, with each new classifier addressing areas where the previous classifiers exhibited suboptimal performance. Figure 1 illustrates the general flow of the Boosting algorithm. Extensive testing and validation have demonstrated XGBoost's exceptional performance and accuracy in various real-world applications. For example, in sentiment analysis [8], which involves analyzing and understanding emotions, opinions, and attitudes expressed in text data, XGBoost has been successfully employed. It effectively classifies and analyzes sentiment, enabling a deeper comprehension of people's reactions and opinions.

Moreover, XGBoost has proven valuable in depression prediction [9] and student performance analysis [10]. It aids in the early identification and prediction of depressive symptoms or disorders. Through its advanced algorithms and ensemble learning techniques, XGBoost can analyze multiple factors and indicators to provide accurate predictions of an individual's risk or likelihood of experiencing depression. These practical implementations highlight the robustness and versatility of XGBoost as a powerful tool in diverse domains. It demonstrates the ability to tackle complex problems and deliver reliable results.

2 Research Methods

2.1 Sample

Based on the work of researchers in Chinese prisons, we contacted relevant colleagues and conducted a nationwide sampling to study inmates in various parts of China. The participants had the following characteristics: Chinese nationality, aged 18 or above, without significant mental or physical illnesses, and undergoing sentencing and two months of educational programs before officially serving their sentence in the prison. Considering the different characteristics of various prison zones, five production zones were randomly selected.

The questionnaires were distributed with the assistance of on-duty police officers in the selected zones and the researcher as the primary investigator. The participants were provided with clear instructions regarding the confidentiality principles and the purpose of the questionnaire, ensuring informed consent and collective administration after the participants were aware of the purpose.

A total of 917 questionnaires were distributed, and 913 responses were collected, resulting in a response rate of 99.6%. The screening criteria involved excluding questionnaires with identifiable markers, obvious signs of plagiarism, clear response patterns, and a number of unanswered questions exceeding 15% of the total items. Finally, 519 valid questionnaires were selected, resulting in an effective response rate of 56.8%. The specific demographic data is shown in Table 1.

Table 1.

Category		Number	Percentage
Age	<30 years old	115	22.2
	30-40 years old	262	50.5
	>40 years old	138	26.6
Marital Status	Unmarried	149	28.7
	Married	256	49.3
	Divorced	107	20.6
	Widowed	2	0.4
Child Status	No children	164	31.6
Child Status	Has children	354	68.2
Parent Status	Both parents alive	356	68.6
	One parent alive	120	23.1
	Both parents deceased	41	7.9
Education Level	College or above	84	16.2
	High school or vocational school	113	21.8
	Junior high school	230	44.3
	Primary school	80	15.4
	Other	11	2.1
Occupation before incarceration	Unemployed	39	7.5
	Farmers	58	11.2
	Migrant workers	32	6.2
	Company employees	52	10.0
	Self-employed	127	24.5
	Freelancers	123	23.7
	Institutional employees	32	6.2
	Civil servants	9	1.7
	Retirees	3	0.6
	Other	43	8.3
Monthly income before incarceration	No fixed income	149	28.7
	Below 3000	57	11.0
	3000∼8000	195	37.6
	Above 8000	115	22.2
Type of crime	Violent crime	100	19.3
	Economic crime	190	36.6
	Sexual crime	66	12.7
	Other	155	29.9
Length of sentence	Less than 5 years	263	50.7
	5-10 years	156	30.1
	10-25 years	91	17.5
Time served in prison	Less than 3 years	313	60.3
	3-7 years	159	30.6
	More than 7 years	44	5.5
Number of previous incarcerations	First time	456	87.7
Number of previous incarcerations	More than once	59	11.4

Table 1. Descriptive analysis of participants (n=519).

2.2 Measurements

In this study, the prisoners’ social adaption were divided into three types: low social adaptation, medium social adaptation and high social adaptation. We used psychological variables such as discrimination perception, interpersonal trust, hope, coping styles, belief in a just world, social support, impulsivity, mental health, resilience self-esteem, and self-efficacy as core independent variables. In order to more accurately predict the social adaptability of prisoners, demographic variables such as age, marital status, child and parent status, education level, occupation, income, and other demographic variables were incorporated as independent variables. The XGBoost algorithm was used to build the model, and the SKlearn random sampling function was used in this study to randomly select 90% of the data as the training set for model training, and the optimal model was used to predict the remaining 10% of the data.

Discrimination Perception Scale A 6-item scale developed by Shen Jiliang (2009) was used, consisting of two dimensions: individual discrimination and group discrimination, each with three items. Example questions include ‘I feel that I have been treated unfairly’ and ‘Overall, other people who have been convicted like me are treated unfairly’.

Social Adaptation Scale Based on Liu Zhaoying's (2005) ‘Social Adaptation Scale for Reeducation-through-Labor Personnel’ the original scale was retained with three dimensions: pre-social preparation, job adaptation, and rule compliance. The scale was reduced to a 9-item scale with three items for each dimension. Example questions include ‘I find it difficult to adapt to society after release’ and ‘I don't know how to start a new life after release.’

Interpersonal Trust Scale Based on Xu Huiyan's revised version of Rotter's Interpersonal Trust Scale (2010), the scale retained two dimensions: social trust and trust in commitment or behavior. The scale was reduced to a 9-item scale with three items for each dimension. Three items from Liu Zhaoying's (2005) ‘Social Adaptation Scale for Reeducation-through-Labor Personnel’ were added to the interpersonal trust dimension. Example questions include ‘I find it hard to trust others’ and ‘There is an increasing amount of hypocrisy in our society.’

Hope Scale Based on Li Yuxuan's (2015) ‘Questionnaire of Hope for Inmates,’ the original scale consisted of three dimensions: hope before release, hope before incarceration, and hope after release. The scale was reduced to a 9-item scale with three items for each dimension. Example questions include ‘I eagerly look forward to the day of release’ and ‘After release, I will work hard to improve the lives of my family.’

3 Data Analysis and Results

In this study, the demographic variables include age, marriage, children, parent status, education, occupation, work years, position level, type of crime, length of sentence monthly income, time served in prison and number of previous incarcerations. In the original algorithm model, data were converted into text feature and numerical variables, and the process was illustrated in figure 2.

Figure 2.

3.1 Model Parameter Freezing

In the XGBoost model, since the dependent variable is continuous, the objective and scoring are set as reg: squarederror and neg_mean_squared_error, respectively. The mean squared error is used as the objective function and evaluation metric. The booster used is the commonly used gbtree, which iteratively trains a series of decision trees for prediction. This tree-based model can handle various types of data and has strong fitting and expressive power.

3.2 Model Parameter Grid Search

In the design of the model's hyperparameters, we used the GridSearchCV method from sklearn.model_selection to perform a grid search for the hyperparameters as shown in the Table 2.

Table 2.

Parameter	Search area	Beast value	Desrcription
Max depth	[5,10,20,30,40]	5	The maximum depth of a tree refers to the maximum number of levels or splits allowed in a decision tree.
Learning rate	[0.005,0.01,0.05,0.1]	0.01	the step size at each boosting iteration
N estimators	[200,300,400,500]	400	the quantity of iterative enhancements or the total count of decision trees constructed
subsample	[0.7,0.8,0.9]	0.7	the subset proportion of the training data allocated for developing each individual tree
Colsaple Bytree	[0.8,0.85,0.9,0.95,1]	0.8	the percentage of randomly selected attributes for the formation of each tree
Min Child Weight	[1,2,3,4,5,6,7,8,9,10]	3	the least required aggregate of sample weights (second derivative of the loss with respect to the prediction) for a descendant node
eta	[0.001,0.002,0.005,0.01]	0.001	the individual influence of each tree on the overall model's output

Table 2. Confusion matrix of predicted result.

3.3 Model Results

We used mean_squared_error to predict the training and test sets, and the root mean square error (RMSE) was 0.028 for both sets. Since the inter protability of mean_squared_error for continuous variables is weak, we further categorized the original dependent variable into three groups: low social adaptation which values below 3.3, medium social adaptation which values between 3.3 and 4.1 and high social adaptation which values above 4.1. After training the model, the accuracy of the predictions on the test set was 96.15%. The confusion matrix of the predicted results and actual results is shown in Table 3 and Table 4. There were 2 prediction errors: 1 case where the actual value was low social adaptation but the model predicted it as medium and 1case where the actual value was medium social adaptation but the model predicted it as high.

Table 3.

	Predicted Low	Predicted Medium	Predicted High
True Low	10	1	0
True Medium	0	12	0
True High	0	0	10

Table 3. Confusion matrix of predicted result.

Table 4.

	Precision	Recall	F1 Score
Class 1	1	0.909	0.952
Class 2	1	1	1
Class 3	1	1	1

Table 4. Confusion matrix of predicted result.

Using the XGBRegressor method feature.importances_, we can output the top ten important variables in the independent variable items of the model, as shown in the table 5 and figure 3.

Table 5.

RANK	Item	Importance
TOP1	I believe that I can successfully integrate into society after being released from prison.	0.144
TOP2	I feel that I am being looked down upon by others.	0.129
TOP3	I find it difficult to trust others.	0.047
TOP4	Relying on others to solve problems.	0.046
TOP5	I tend to act impulsively.	0.034
TOP6	You feel discriminated against to what extent in prison life.	0.033
TOP7	I want to live a better life after being released from prison.	0.029
TOP8	I believe in the principle of “Every man for himself, and the devil take the hindmost.”	0.026
TOP9	I have a positive attitude towards myself.	0.025
TOP10	I feel that I have been treated unfairly.	0.024

Table 5. TOP 10 Importance.

Figure 3.

4 Discussion

For the algorithm model, we employed the grid search method to identify the optimal hyperparameter values. In the grid search, considering the 66 independent variables, we determined that the maximum depth (max_depth) should not exceed the number of independent variables. Therefore, the optimal value for max_depth was found to be 5. After analyzing the parameter values in the grid search, we identified the optimal solution with a learning rate of 0.01, eta of 0.001, and n_estimators of 400. This indicates that the model utilized shallow depth and assigned small weights to each decision tree, while employing a strategy of fitting the data with a greater number of decision trees. To prevent overfitting, we set the model's subsample at 0.7, colsample_bytree at 0.8, and min_child_weight as 3, as these values were determined to be optimal.

Among the questionnaire items, the item “I believe that I can successfully integrate into society after being released from prison” had an importance of 14% in the feature importance. It is hoped that by alleviating the negative impacts of adverse events and promoting the generation of positive psychological and behavioral responses, the level of psychological adaptation can be enhanced [11]. The item “I feel that I am being looked down upon by others” had an importance of 14% in the feature importance. Discrimination perception widens the psychological distance between individuals and others, leading to feelings of loneliness and interpersonal alienation, which then breed more problems through internalization and externalization [12].

5 Conclusion

In this study, demographic variables and questionnaire items were used as independent variables to train an XGBoost model for predicting prisoners’ social adaptation, achieving an accuracy of 96.15% in the three-class classification. This indicates the effectiveness of the XGBoost algorithm in studying this problem. Subsequent researchers can further optimize the algorithm to enable more profound conclusions regarding this issue.

References

[1]

Fernando, H. (2023). The failure of prisons in fostering and re-socializing prisoners. Indonesian Journal of Criminal Law and Criminology (IJCLC), 4(3).

Crossref

Google Scholar

[2]

Paris, W., & White-Williams, C. (2005). Social adaptation after cardiothoracic transplantation: a review of the literature. Journal of Cardiovascular Nursing, 20(5S), S67-S73.

Crossref

Google Scholar

[3]

Truitt, M., Biesecker, B., Capone, G., Bailey, T., & Erby, L. (2012). The role of hope in adaptation to uncertainty: The experience of caregivers of children with Down syndrome. Patient education and counseling, 87(2), 233-238.

Crossref

Google Scholar

[4]

Fu, C., Yang, S., Zhai, M., Yong, T., Zheng, C., Ma, X., ... & Su, P. (2024). The component and structure of interpersonal trust. Heliyon, 10(9).

Crossref

Google Scholar

[5]

Kucharský, Š., Houtkoop, B. L., & Visser, I. (2020). Code sharing in psychological methods and statistics: An overview and associations with conventional and alternative research metrics.

Crossref

Google Scholar

[6]

Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., … & Zhou, T. (2015). XGBoost: Extreme gradient boosting. R package version 0.4-2, 1(4), 1-4.

Google Scholar

[7]

Trizoglou, P., Liu, X., & Lin, Z. (2021). Fault detection by an ensemble framework of Extreme Gradient Boosting (XGBoost) in the operation of offshore wind turbines. Renewable Energy, 179, 945-962.

Crossref

Google Scholar

[8]

Wang, S. H., Li, H. T., Chang, E. J., & Wu, A. Y. (2018). Entropy-assisted emotion recognition of valence and arousal using XGBoost classifier. Artificial Intelligence Applications and Innovations: 14th IFIP WG 12.5 International Conference, AIAI 2018, Rhodes, Greece, May 25–27, 2018, Proceedings 14 (pp. 249-260). Springer International Publishing.

Google Scholar

[9]

Sharma, A., & Verbeke, W. J. (2020). Improving diagnosis of depression with XGBOOST machine learning model and a large biomarkers Dutch dataset (n= 11,081). Frontiers in Big Data, 3, 15

Crossref

Google Scholar

[10]

Asselman, A., Khaldi, M., & Aammou, S. (2023). Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning Environments, 31(6), 3360-3379.

Crossref

Google Scholar

[11]

Ratinen, I. (2021). Students’ knowledge of climate change, mitigation and adaptation in the context of constructive hope. Education Sciences, 11(3), 103.

Crossref

Google Scholar

[12]

Hashemi, N., Marzban, M., Sebar, B., & Harris, N. (2020). Religious identity and psychological well-being among middle-eastern migrants in Australia: The mediating role of perceived social support, social connectedness, and perceived discrimination. Psychology of Religion and Spirituality, 12(4), 475–486.

Crossref

Google Scholar

Index Terms

Application of XGBoost algorithm in Social Adaptation of Prisoners

Recommendations

Social media users and their social adaptation process in virtual environment: Is it easier for Turkish Cypriots to be social but virtual beings?
Abstract
This study examined the relationship and difference between offline life and virtual environment social adaptation among Turkish Cypriots. A total of 113 male and female Turkish Cypriot Facebook users were surveyed using the modified “...
Highlights
- The relationship between offline life and virtual life social adaptation.
- SNS ...
Requirements-Driven Social Adaptation: Expert Survey
REFSQ 2014: Proceedings of the 20th International Working Conference on Requirements Engineering: Foundation for Software Quality - Volume 8396

[Context and motivation] Self-adaptation empowers systems with the capability to meet stakeholders' requirements in a dynamic environment. Such systems autonomously monitor changes and events which drive adaptation decisions at runtime. Social ...
Real-Time Adaptation of a Robotic Joke Teller Based on Human Social Signals
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems

Humor is an essential element of human-human communication. Consequently, robots in the role of companions should exploit its potential as well to make interactions more enjoyable. Using a robot as an entertainer requires finding out what kind of humor ...

Comments

Information & Contributors

Information

Published In

CAIBDA '24: Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms

June 2024

1206 pages

ISBN:9798400710247

DOI:10.1145/3690407

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CAIBDA 2024

CAIBDA 2024: 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms

June 21 - 23, 2024

Zhengzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
264
Total Downloads

Downloads (Last 12 months)264
Downloads (Last 6 weeks)66

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

1 Introduction

2 Research Methods

2.1 Sample

2.2 Measurements

3 Data Analysis and Results

3.1 Model Parameter Freezing

3.2 Model Parameter Grid Search

3.3 Model Results

4 Discussion

5 Conclusion

References

Index Terms

Recommendations

Social media users and their social adaptation process in virtual environment: Is it easier for Turkish Cypriots to be social but virtual beings?

Requirements-Driven Social Adaptation: Expert Survey

Real-Time Adaptation of a Robotic Joke Teller Based on Human Social Signals

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations