research-article

Open access

Machine Learning-XGBoost Analysis of Chinese Employees’ After-work Subjective well-being

Authors:

Wei Wang,

Jie ZhouAuthors Info & Claims

CAIBDA '24: Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms

Pages 322 - 326

https://doi.org/10.1145/3690407.3690462

Published: 24 October 2024 Publication History

All formats PDF

Abstract

This study employs the Conservation of Resources theory in psychology to investigate the factors that impact Chinese employees’ subjective well-being after work in the digital age. 326 Chinese employees participated in the research survey, and this study utilized the XGBoost machine learning algorithm to analyze the data. The results demonstrate that the algorithm model, trained using 90% of the data, accurately predicts the remaining 10% of the data with a 96.96% accuracy rate. Furthermore, XGBoost can effectively rank the importance of independent variables, facilitating timely and effective prediction and adjustment of subjective well-being based on significant factors.

1 Introduction

With the advent of the Industry 4.0 era, technology has reshaped people's lives and has led to a significant increase in work Stress [1,2]. Facing this greater work pressure, people need leisure and entertainment for respite, they dramatically increased their online leisure activities [3]. Scholars are more than ever concerned about the relationship between recovery from work and subjective well-being among employees [4]. Internet leisure, representing the common life style after work, has become a key topic for exploring subjective well-being in the new era. In previous studies, leisure activities often reported inconsistent results [5]. Subjective well-being mainly comes from passive activities such as watching TV, while activities like exercise and socializing have no contribution to subjective well-being [6]. However, Liu and Da reported that both passive and active activities promote subjective well-being. Some theories have focused on the psychological mechanisms triggered by activities [7]. The DRAMMA model integrates the four post-work recovery experiences from the recovery theory and the three basic psychological needs from the Self-Determination Theory into six psychological needs that make unique contributions to subjective well-being [8,9]. However, recent studies have raised concerns about the DRAMMA model's design, as it does not facilitate the examination of internal levels when analyzing the six mechanisms together. Therefore, a deeper investigation into the relationships between these different needs and theories of DRAMMA, SDT, and the recovery theory is needed [10]. Similarly, in recovery theory research, it has been proposed that the observation of the combined experiences of the four psychological experiences does not clarify which experiences drive recovery [11] and therefore, the combinations of psychological needs should be explored [12] [13]. While the Conservation of Resources (COR) theory serves as a universal explanatory framework for various psychological processes, there is still a need for its systematization. [5], distinguishing the characteristics and mechanisms of resources in leisure[4]. This study attempts to comprehensively explore the panorama of internet leisure mechanisms from a systematic COR perspective, considering the levels of psychological detachment representing the recovery theory and SDT, as well as discussing the paths to achieving subjective well-being through these two strategies for different activities, and uncovering the underlying laws of the nature of activities and resource levels, directly addressing practice.

Traditional psychological approaches are based on researchers’ hypotheses, where questionnaires or experimental procedures are developed, data is collected, and analysis is conducted. The main objective is to study the causal relationships of psychological processes. Once causal relationships are extracted, it enables better prediction of the future. The predictive nature of traditional methods predominantly emphasizes qualitative explanations and cannot be used for quantitative predictions. On the other hand, Artificial intelligence algorithms can mine logical relationships based on a large amount of data, and further improve the predictive accuracy of the models through various types of algorithms and iteration techniques. When researchers obtain the optimal algorithm model, they can quantify the influence of each project based on this model, understand the importance of each project in the research scenario, and achieve more timely and effective prediction and adjustment of Subjective Well-being.

Furthermore, it is worth noting that this paper has amassed over 3500 citations (according to Google Scholar, March 2, 2024) [14]. In the XGBoost algorithm, the objective function in Eq. (1) contains functions as parameters, making it challenging to optimize using traditional methods. As the result, the model is trained in an additive manner using the second-order Taylor approximation. In order to utilize this approximation, the objective function is modified as Eq. (2). By eliminating constant terms, the objective function can be simplified into a sum of quadratic functions with just one variable, as demonstrated in Eq. (3). This simplified form can be conveniently minimized by employing a greedy algorithm.

\begin{equation}{L}^{(t)} = \sum\limits_i {l({y}_i,{{\hat{y}}}_i)} + \sum\limits_k {\Omega ({f}_k)} \end{equation}

(1)

\begin{equation}{L}^{(t)} = \sum\limits_{i = 1}^n {l({y}_i,{{\hat{y}}}_i^{t - 1} + {f}_t({x}_i))} + \Omega ({f}_t)\end{equation}

(2)

\begin{equation}{L}^{(t)} \cong \sum\limits_{i = 1}^n {[{g}_i{f}_t({x}_i) + \frac{1}{2}{h}_if_t^2({x}_i)]} + \Omega ({f}_t)\end{equation}

(3)

In the XGBoost algorithm, with the previous classifiers locking a new weak classifier is added at each iteration to make the performance of the current model better. This process continues, with each new classifier considering areas where the previous ones were not performing well. The general flow of the XGBoost algorithm is illustrated in Figure 1.

Figure 1.

XGBoost's performance and accuracy have been extensively tested and validated in various real-world applications. For instance, it has been successfully applied in sentiment analysis [15], which involves analyzing and understanding emotions, opinions, and attitudes expressed in text data. XGBoost has demonstrated its capability to effectively classify and analyze sentiment, allowing for a deeper understanding of people's reactions and opinions.

Additionally, XGBoost has also been utilized in depression prediction [16] and student performance [17], aiding in the early identification and prediction of depressive symptoms or disorders. By leveraging its advanced algorithms and ensemble learning techniques, XGBoost can analyze various factors and indicators to provide accurate predictions regarding an individual's risk or likelihood of experiencing depression. These practical implementations highlight the robustness and versatility of XGBoost as an efficient tool in diverse domains, showcasing its ability to tackle complex problems and deliver reliable results.

2 Research Methods

2.1 Sample

The study focused on Chinese office workers. The formal survey was conducted from December 2nd to December 30th, 2022, using a professional online research platform, Jianshu, to distribute the questionnaire. The platform maintained quality control measures, such as preventing duplicate responses. The screening questions in the questionnaire included “Are you currently employed?” and “Did you answer the questions seriously?” Respondents who answered “no” to either of these questions were automatically excluded. A total of 593 questionnaires were collected during the survey period, and after screening, 149 questionnaires were automatically excluded. During the preprocessing stage, a total of 113 questionnaires were excluded from the analysis. This exclusion was based on completion times, with questionnaires that took less than 5 minutes or exceeded 1 hour being deemed as incomplete or potentially unreliable. Additionally, 1 questionnaire was removed due to the participant being located outside of China or not providing location details. Another 3 questionnaires were deleted due to incomplete responses resulting from the adjustment of the “offline leisure enjoyment” question. Ultimately, there were 326 valid questionnaires, resulting in an effective response rate of 54.97%.

2.2 Measurements

In this study, the employees’ after-work online activites were divided into three types: Achievement leisure activities, social leisure activities, and timeout leisure activities. To predict subjective well-being in a timely manner, demographic variables, autonomy, master, and relation were also used as independent variables.

3 Data Analysis and Results

The demographic variables include age, marriage, occupation, work years, position level, monthly income, average work hours per day. In the original algorithm model, all data were converted into numerical variables, and the XGBoost algorithm model was used.

3.1 Model parameter freezing

In the XGBoost model, since the dependent variable is continuous, the objective and scoring are set as reg:squarederror and neg_mean_squared_error, respectively. The booster used is the commonly used gbtree, which iteratively trains a series of decision trees for prediction. This tree-based model can handle various types of data and has strong fitting and expressive power.

3.2 Model model parameter grid search

In the design of the model's hyperparameters, we used the GridSearchCV method from sklearn. model_selection to perform a grid search for the hyperparameters as shown in the Table 1.

Table 1.

Parameter	Search area	Beast value	Desrcription
Max depth	[5,10,20,30,40]	5	The maximum depth of a tree refers to the maximum number of levels or splits allowed in a decision tree.
Learning rate	[0.005,0.01,0.05,0.1]	0.01	the step size at each boosting iteration
N estimators	[200,300,400,500]	400	the quantity of iterative enhancements or the total count of decision trees constructed
subsample	[0.7,0.8,0.9]	0.7	the subset proportion of the training data allocated for developing each individual tree
Colsaple Bytree	[0.8,0.85,0.9,0.95,1]	0.8	the percentage of randomly selected attributes for the formation of each tree
Min Child Weight	[1,2,3,4,5,6,7,8,9,10]	3	the least required aggregate of sample weights (second derivative of the loss with respect to the prediction) for a descendant node
eta	[0.001,0.002,0.005,0.01]	0.001	the individual influence of each tree on the overall model's output

Table 1. Confusion matrix of predicted result.

3.3 Model results

After training the model, the test set’ accuracy was 96.96%, with 32 out of 33 predictions being correct. The evaluation of the model's performance is illustrated in Table 2, which presents the confusion matrix displaying the predicted results compared to the actual results. The accuracy of the model's predictions was calculated using Table 3. It should be noted that the accuracy can vary based on the dataset and the specific prediction task.

Table 2.

	Predicted Low	Predicted Medium	Predicted High
True Low	10	1	0
True Medium	0	12	0
True High	0	0	10

Table 2. Confusion matrix of predicted result.

Table 3.

	Precision	Recall	F1 Score
Class 1	1	0.909	0.952
Class 2	1	1	1
Class 3	1	1	1

Table 3. Confusion matrix of predicted result.

Through the application of the technique model. feature’ importance, we can ascertain and prioritize the top ten importance factors among the model's set of independent variables. Figure 2 displays the results of this evaluation, showcasing the ten variables of greatest significance. ① The extent to which my professional growth feels impeded, with an importance of 0.142; ②In internet leisure, I have the ability to learn new knowledge or skills, with an importance of 0.052; ③ In internet leisure, I can make independent decisions about how to choose and engage with activities, with an importance of 0.051; ④ In internet leisure, I can escape from the physical exhaustion caused by work, with an importance of 0.037; ⑤The workload that needs to be managed within the prescribed time period, with an importance of 0.031; ⑥The lack of clarity regarding my professional duties and expectations., with an importance of 0.030; ⑦ In online leisure activities, I have rid myself of the physical exertion of work, with an importance of 0.026; ⑧ If I were to live again, there is hardly anything that I would want to change, with an importance of 0.024; ⑨age, with an importance of 0.023; ⑩education, with an importance of 0.023.

Figure 2.

4 Discussion

In the establishment of the algorithm model, the grid search method was used to determine the optimal solution for hyperparameters. In the grid search, since we have 66 items in our independent variables, and the depth of the decision tree generally does not exceed the number of independent variables, the max_depth was determined to be optimal at 5. After analyzing the values of each parameter in the grid search, the optimal solution was found at learning_rate of 0.01, eta of 0.001, and n_estimators of 400. This indicates that the model used shallow depth and small weights for each decision tree, while employing a strategy of fitting with more decision trees. To prevent overfitting, the model's subsample was set at 0.7, colsample_bytree at 0.8, and min_child_weight as 3, which were determined to be the optimal values.

Within the array of survey elements, the query regarding ‘The extent to which my professional growth feels impeded’ accounted for a 14% share in assessing feature significance. Consistent with established theoretical perspectives, extreme stress can adversely influence one's sense of well-being. Chronic stress and ongoing discomfort have the potential to spawn both psychological and physiological ailments, including but not limited to anxiety, depressive disorders, and various somatic conditions, consequently eroding one's state of contentment [18]. Analysis of occupational stress factors corroborates both the Stressor Detachment Model and the Recovery Paradox [19] [20], reinforcing the notion that the pursuit of strategic resources is predicated upon fulfilling Fundamental Psychological Needs [21]. Additionally, the point ‘During online leisure activities, my capacity to acquire new information or skills’ emerged with a significance rating of 0.052. These observations lend support to the pivotal importance of a positive self-concept as posited by the Self-Determination Theory and to the proactive cultivation of resources as a means to bolster happiness.

5 Conclusion

In this study, demographic variables and questionnaire items were used as independent variables to train an XGBoost model for predicting employees’ after-work Subjective well-being, achieving an accuracy of 96.96% in the three-class classification. This indicates the effectiveness of the XGBoost algorithm in studying this problem. Subsequent researchers can further optimize the algorithm to enable more profound conclusions regarding this issue.

Appendices

Chinese Employee SWB Survey

Please recall your life in the past 1 month, choose the right degree of your true feeling in the following items: [1-5Likert Rating Scale]

1.1 Involvement of online achievement activities, such as study, physical exercises.

1.2 Involvement of online social activities, such as study, physical exercises.

1.3 Involvement of online timeout activities, such as watching short videos, listening to music.

2.1 During virtual engagements, I sense the autonomy to determine my own decision-making process.

2.2 I feel pressure in online activities.

2.3 In online activities, I get along well with people.

2.4 In online activities, I seldom communicate with others.

2.5 In online activities, on average, I feel I can freely express my opinions and ideas.

2.6 In online activities, I have the capability to learn new knowledge or skill.

2.7 In online activities, I can be myself and can expose my true inside.

3.1In online leisure activities, I put aside all work related thinking and ideas.

3.2 In online leisure activities, I put aside all work related emotion.

3.3 In online leisure activities, I have emotionally distanced myself from work.

3.4 In online leisure activities, I have rid myself of the physical exertion of work.

References

[1]

Yang, S. Y., Chen, S. C., Lee, L., & Liu, Y. S. (2021). Employee stress, job satisfaction, and job performance: a comparison between high-technology and traditional industry in Taiwan. The Journal of Asian Finance, Economics and Business, 8(3), 605-618.

Abstract

1 Introduction

2 Research Methods

2.1 Sample

2.2 Measurements

3 Data Analysis and Results

3.1 Model parameter freezing

3.2 Model model parameter grid search

3.3 Model results

4 Discussion

5 Conclusion

Appendices

References

Index Terms

Recommendations

Application of XGBoost algorithm in Social Adaptation of Prisoners

Internet use and Chinese older adults’ subjective well-being (SWB): The role of parent-child contact and relationship

Exploring Blue-and White-Collar Employees' Well-Being at Work System: Differences in Indicators of Physical and Psychosocial Conditions of Occupational Groups

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations