skip to main content
10.1145/3690407.3690462acmotherconferencesArticle/Chapter ViewFull TextPublication PagescaibdaConference Proceedingsconference-collections
research-article
Open access

Machine Learning-XGBoost Analysis of Chinese Employees’ After-work Subjective well-being

Published: 24 October 2024 Publication History

Abstract

This study employs the Conservation of Resources theory in psychology to investigate the factors that impact Chinese employees’ subjective well-being after work in the digital age. 326 Chinese employees participated in the research survey, and this study utilized the XGBoost machine learning algorithm to analyze the data. The results demonstrate that the algorithm model, trained using 90% of the data, accurately predicts the remaining 10% of the data with a 96.96% accuracy rate. Furthermore, XGBoost can effectively rank the importance of independent variables, facilitating timely and effective prediction and adjustment of subjective well-being based on significant factors.

1 Introduction

With the advent of the Industry 4.0 era, technology has reshaped people's lives and has led to a significant increase in work Stress [1,2]. Facing this greater work pressure, people need leisure and entertainment for respite, they dramatically increased their online leisure activities [3]. Scholars are more than ever concerned about the relationship between recovery from work and subjective well-being among employees [4]. Internet leisure, representing the common life style after work, has become a key topic for exploring subjective well-being in the new era. In previous studies, leisure activities often reported inconsistent results [5]. Subjective well-being mainly comes from passive activities such as watching TV, while activities like exercise and socializing have no contribution to subjective well-being [6]. However, Liu and Da reported that both passive and active activities promote subjective well-being. Some theories have focused on the psychological mechanisms triggered by activities [7]. The DRAMMA model integrates the four post-work recovery experiences from the recovery theory and the three basic psychological needs from the Self-Determination Theory into six psychological needs that make unique contributions to subjective well-being [8,9]. However, recent studies have raised concerns about the DRAMMA model's design, as it does not facilitate the examination of internal levels when analyzing the six mechanisms together. Therefore, a deeper investigation into the relationships between these different needs and theories of DRAMMA, SDT, and the recovery theory is needed [10]. Similarly, in recovery theory research, it has been proposed that the observation of the combined experiences of the four psychological experiences does not clarify which experiences drive recovery [11] and therefore, the combinations of psychological needs should be explored [12] [13]. While the Conservation of Resources (COR) theory serves as a universal explanatory framework for various psychological processes, there is still a need for its systematization. [5], distinguishing the characteristics and mechanisms of resources in leisure[4]. This study attempts to comprehensively explore the panorama of internet leisure mechanisms from a systematic COR perspective, considering the levels of psychological detachment representing the recovery theory and SDT, as well as discussing the paths to achieving subjective well-being through these two strategies for different activities, and uncovering the underlying laws of the nature of activities and resource levels, directly addressing practice.
Traditional psychological approaches are based on researchers’ hypotheses, where questionnaires or experimental procedures are developed, data is collected, and analysis is conducted. The main objective is to study the causal relationships of psychological processes. Once causal relationships are extracted, it enables better prediction of the future. The predictive nature of traditional methods predominantly emphasizes qualitative explanations and cannot be used for quantitative predictions. On the other hand, Artificial intelligence algorithms can mine logical relationships based on a large amount of data, and further improve the predictive accuracy of the models through various types of algorithms and iteration techniques. When researchers obtain the optimal algorithm model, they can quantify the influence of each project based on this model, understand the importance of each project in the research scenario, and achieve more timely and effective prediction and adjustment of Subjective Well-being.
Furthermore, it is worth noting that this paper has amassed over 3500 citations (according to Google Scholar, March 2, 2024) [14]. In the XGBoost algorithm, the objective function in Eq. (1) contains functions as parameters, making it challenging to optimize using traditional methods. As the result, the model is trained in an additive manner using the second-order Taylor approximation. In order to utilize this approximation, the objective function is modified as Eq. (2). By eliminating constant terms, the objective function can be simplified into a sum of quadratic functions with just one variable, as demonstrated in Eq. (3). This simplified form can be conveniently minimized by employing a greedy algorithm.
\begin{equation}{L}^{(t)} = \sum\limits_i {l({y}_i,{{\hat{y}}}_i)} + \sum\limits_k {\Omega ({f}_k)} \end{equation}
(1)
\begin{equation}{L}^{(t)} = \sum\limits_{i = 1}^n {l({y}_i,{{\hat{y}}}_i^{t - 1} + {f}_t({x}_i))} + \Omega ({f}_t)\end{equation}
(2)
\begin{equation}{L}^{(t)} \cong \sum\limits_{i = 1}^n {[{g}_i{f}_t({x}_i) + \frac{1}{2}{h}_if_t^2({x}_i)]} + \Omega ({f}_t)\end{equation}
(3)
In the XGBoost algorithm, with the previous classifiers locking a new weak classifier is added at each iteration to make the performance of the current model better. This process continues, with each new classifier considering areas where the previous ones were not performing well. The general flow of the XGBoost algorithm is illustrated in Figure 1.
Figure 1.
Figure 1. Flow of gradient boosting algorithm.
XGBoost's performance and accuracy have been extensively tested and validated in various real-world applications. For instance, it has been successfully applied in sentiment analysis [15], which involves analyzing and understanding emotions, opinions, and attitudes expressed in text data. XGBoost has demonstrated its capability to effectively classify and analyze sentiment, allowing for a deeper understanding of people's reactions and opinions.
Additionally, XGBoost has also been utilized in depression prediction [16] and student performance [17], aiding in the early identification and prediction of depressive symptoms or disorders. By leveraging its advanced algorithms and ensemble learning techniques, XGBoost can analyze various factors and indicators to provide accurate predictions regarding an individual's risk or likelihood of experiencing depression. These practical implementations highlight the robustness and versatility of XGBoost as an efficient tool in diverse domains, showcasing its ability to tackle complex problems and deliver reliable results.

2 Research Methods

2.1 Sample

The study focused on Chinese office workers. The formal survey was conducted from December 2nd to December 30th, 2022, using a professional online research platform, Jianshu, to distribute the questionnaire. The platform maintained quality control measures, such as preventing duplicate responses. The screening questions in the questionnaire included “Are you currently employed?” and “Did you answer the questions seriously?” Respondents who answered “no” to either of these questions were automatically excluded. A total of 593 questionnaires were collected during the survey period, and after screening, 149 questionnaires were automatically excluded. During the preprocessing stage, a total of 113 questionnaires were excluded from the analysis. This exclusion was based on completion times, with questionnaires that took less than 5 minutes or exceeded 1 hour being deemed as incomplete or potentially unreliable. Additionally, 1 questionnaire was removed due to the participant being located outside of China or not providing location details. Another 3 questionnaires were deleted due to incomplete responses resulting from the adjustment of the “offline leisure enjoyment” question. Ultimately, there were 326 valid questionnaires, resulting in an effective response rate of 54.97%.

2.2 Measurements

In this study, the employees’ after-work online activites were divided into three types: Achievement leisure activities, social leisure activities, and timeout leisure activities. To predict subjective well-being in a timely manner, demographic variables, autonomy, master, and relation were also used as independent variables.

3 Data Analysis and Results

The demographic variables include age, marriage, occupation, work years, position level, monthly income, average work hours per day. In the original algorithm model, all data were converted into numerical variables, and the XGBoost algorithm model was used.

3.1 Model parameter freezing

In the XGBoost model, since the dependent variable is continuous, the objective and scoring are set as reg:squarederror and neg_mean_squared_error, respectively. The booster used is the commonly used gbtree, which iteratively trains a series of decision trees for prediction. This tree-based model can handle various types of data and has strong fitting and expressive power.

3.2 Model model parameter grid search

In the design of the model's hyperparameters, we used the GridSearchCV method from sklearn. model_selection to perform a grid search for the hyperparameters as shown in the Table 1.
Table 1.
ParameterSearch areaBeast valueDesrcription
Max depth[5,10,20,30,40]5The maximum depth of a tree refers to the maximum number of levels or splits allowed in a decision tree.
Learning rate[0.005,0.01,0.05,0.1]0.01the step size at each boosting iteration
N estimators[200,300,400,500]400the quantity of iterative enhancements or the total count of decision trees constructed
subsample[0.7,0.8,0.9]0.7the subset proportion of the training data allocated for developing each individual tree
Colsaple Bytree[0.8,0.85,0.9,0.95,1]0.8the percentage of randomly selected attributes for the formation of each tree
Min Child Weight[1,2,3,4,5,6,7,8,9,10]3the least required aggregate of sample weights (second derivative of the loss with respect to the prediction) for a descendant node
eta[0.001,0.002,0.005,0.01]0.001the individual influence of each tree on the overall model's output
Table 1. Confusion matrix of predicted result.

3.3 Model results

After training the model, the test set’ accuracy was 96.96%, with 32 out of 33 predictions being correct. The evaluation of the model's performance is illustrated in Table 2, which presents the confusion matrix displaying the predicted results compared to the actual results. The accuracy of the model's predictions was calculated using Table 3. It should be noted that the accuracy can vary based on the dataset and the specific prediction task.
Table 2.
 Predicted LowPredicted MediumPredicted High
True Low1010
True Medium0120
True High0010
Table 2. Confusion matrix of predicted result.
Table 3.
 PrecisionRecallF1 Score
Class 110.9090.952
Class 2111
Class 3111
Table 3. Confusion matrix of predicted result.
Through the application of the technique model. feature’ importance, we can ascertain and prioritize the top ten importance factors among the model's set of independent variables. Figure 2 displays the results of this evaluation, showcasing the ten variables of greatest significance. ① The extent to which my professional growth feels impeded, with an importance of 0.142; ②In internet leisure, I have the ability to learn new knowledge or skills, with an importance of 0.052; ③ In internet leisure, I can make independent decisions about how to choose and engage with activities, with an importance of 0.051; ④ In internet leisure, I can escape from the physical exhaustion caused by work, with an importance of 0.037; ⑤The workload that needs to be managed within the prescribed time period, with an importance of 0.031; ⑥The lack of clarity regarding my professional duties and expectations., with an importance of 0.030; ⑦ In online leisure activities, I have rid myself of the physical exertion of work, with an importance of 0.026; ⑧ If I were to live again, there is hardly anything that I would want to change, with an importance of 0.024; ⑨age, with an importance of 0.023; ⑩education, with an importance of 0.023.
Figure 2.
Figure 2. Bar chart of variable importance.

4 Discussion

In the establishment of the algorithm model, the grid search method was used to determine the optimal solution for hyperparameters. In the grid search, since we have 66 items in our independent variables, and the depth of the decision tree generally does not exceed the number of independent variables, the max_depth was determined to be optimal at 5. After analyzing the values of each parameter in the grid search, the optimal solution was found at learning_rate of 0.01, eta of 0.001, and n_estimators of 400. This indicates that the model used shallow depth and small weights for each decision tree, while employing a strategy of fitting with more decision trees. To prevent overfitting, the model's subsample was set at 0.7, colsample_bytree at 0.8, and min_child_weight as 3, which were determined to be the optimal values.
Within the array of survey elements, the query regarding ‘The extent to which my professional growth feels impeded’ accounted for a 14% share in assessing feature significance. Consistent with established theoretical perspectives, extreme stress can adversely influence one's sense of well-being. Chronic stress and ongoing discomfort have the potential to spawn both psychological and physiological ailments, including but not limited to anxiety, depressive disorders, and various somatic conditions, consequently eroding one's state of contentment [18]. Analysis of occupational stress factors corroborates both the Stressor Detachment Model and the Recovery Paradox [19] [20], reinforcing the notion that the pursuit of strategic resources is predicated upon fulfilling Fundamental Psychological Needs [21]. Additionally, the point ‘During online leisure activities, my capacity to acquire new information or skills’ emerged with a significance rating of 0.052. These observations lend support to the pivotal importance of a positive self-concept as posited by the Self-Determination Theory and to the proactive cultivation of resources as a means to bolster happiness.

5 Conclusion

In this study, demographic variables and questionnaire items were used as independent variables to train an XGBoost model for predicting employees’ after-work Subjective well-being, achieving an accuracy of 96.96% in the three-class classification. This indicates the effectiveness of the XGBoost algorithm in studying this problem. Subsequent researchers can further optimize the algorithm to enable more profound conclusions regarding this issue.

Appendices

Chinese Employee SWB Survey
Please recall your life in the past 1 month, choose the right degree of your true feeling in the following items: [1-5Likert Rating Scale]
1.1 Involvement of online achievement activities, such as study, physical exercises.
1.2 Involvement of online social activities, such as study, physical exercises.
1.3 Involvement of online timeout activities, such as watching short videos, listening to music.
2.1 During virtual engagements, I sense the autonomy to determine my own decision-making process.
2.2 I feel pressure in online activities.
2.3 In online activities, I get along well with people.
2.4 In online activities, I seldom communicate with others.
2.5 In online activities, on average, I feel I can freely express my opinions and ideas.
2.6 In online activities, I have the capability to learn new knowledge or skill.
2.7 In online activities, I can be myself and can expose my true inside.
3.1In online leisure activities, I put aside all work related thinking and ideas.
3.2 In online leisure activities, I put aside all work related emotion.
3.3 In online leisure activities, I have emotionally distanced myself from work.
3.4 In online leisure activities, I have rid myself of the physical exertion of work.

References

[1]
Yang, S. Y., Chen, S. C., Lee, L., & Liu, Y. S. (2021). Employee stress, job satisfaction, and job performance: a comparison between high-technology and traditional industry in Taiwan. The Journal of Asian Finance, Economics and Business, 8(3), 605-618.
[2]
Nobles, C. (2022). Stress, burnout, and security fatigue in cybersecurity: A human factors problem. HOLISTICA–Journal of Business and Public Administration, 13(1), 49-72.
[3]
Silk, M., Millington, B., Rich, E., & Bush, A. (2016). (Re-) thinking digital leisure. Leisure Studies, 35(6), 712-723.
[4]
Sonnentag, S., Cheng, B. H., & Parker, S. L. (2022). Recovery from work: Advancing the field toward the future. Annual Review of Organizational Psychology and Organizational Behavior, 9, 33-60.
[5]
Sonnentag, S., Venz, L., & Casper, A. (2017). Advances in recovery research: What have we learned? What should be done next?. Journal of occupational health psychology, 22(3), 365.
[6]
Wei, X., Huang, S., Stodolska, M., & Yu, Y. (2015). Leisure time, leisure activities, and happiness in China: Evidence from a national survey. Journal of leisure research, 47(5), 556-576.
[7]
Liu, H., & Da, S. (2022). The relationships between leisure and happiness-A graphic elicitation method. In Leisure and Wellbeing (pp. 111-130). Routledge.
[8]
De Bloom, J., Rantanen, J., Tement, S., & Kinnunen, U. (2018). Longitudinal leisure activity profiles and their associations with recovery experiences and job performance. Leisure Sciences, 40(3), 151-173.
[9]
Macchia, L., & Whillans, A. V. (2021). Leisure beliefs and the subjective well-being of nations. The Journal of Positive Psychology, 16(2), 198-206.
[10]
Kujanpää, M., Syrek, C., Lehr, D., Kinnunen, U., Reins, J. A., & De Bloom, J. (2021). Need satisfaction and optimal functioning at leisure and work: A longitudinal validation study of the DRAMMA model. Journal of Happiness Studies, 22, 681-707.
[11]
Chawla, N., MacGowan, R. L., Gabriel, A. S., & Podsakoff, N. P. (2020). Unplugging or staying connected? Examining the nature, antecedents, and consequences of profiles of daily recovery experiences. Journal of Applied Psychology, 105(1), 19.
[12]
Asaloei, S. I., Wolomasi, A. K., & Werang, B. R. (2020). Work-Related stress and performance among primary school teachers. International Journal of Evaluation and Research in Education, 9(2), 352-358.
[13]
Ouyang, K., Cheng, B. H., Lam, W., & Parker, S. K. (2019). Enjoy your evening, be proactive tomorrow: How off-job experiences shape daily proactivity. Journal of Applied Psychology, 104(8), 1003.
[14]
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., … & Zhou, T. (2015). XGBoost: Extreme gradient boosting. R package version 0.4-2, 1(4), 1-4.
[15]
Wang, S. H., Li, H. T., Chang, E. J., & Wu, A. Y. (2018). Entropy-assisted emotion recognition of valence and arousal using XGBoost classifier. Artificial Intelligence Applications and Innovations: 14th IFIP WG 12.5 International Conference, AIAI 2018, Rhodes, Greece, May 25-27, 2018, Proceedings 14 (pp. 249-260). Springer International Publishing.
[16]
Sharma, A., & Verbeke, W. J. (2020). Improving diagnosis of depression with XGBOOST machine learning model and a large biomarkers Dutch dataset (n= 11,081). Frontiers in Big Data, 3, 15.
[17]
Asselman, A., Khaldi, M., & Aammou, S. (2023). Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning Environments, 31(6), 3360-3379.
[18]
Arslan, G., & Allen, K. A. (2022). Exploring the association between coronavirus stress, meaning in life, psychological flexibility, and subjective well-being. Psychology, Health & Medicine, 27(4), 803-814.
[19]
Sonnentag, S., & Fritz, C. (2015). Recovery from job stress: The stressor‐detachment model as an integrative framework. Journal of organizational behavior, 36(S1), S72-S103.
[20]
Sonnentag, S. (2018). The recovery paradox: Portraying the complex interplay between job stressors, lack of recovery, and poor well-being. Research in Organizational Behavior, 38, 169-185.
[21]
Muhamad Nasharudin, N. A., Idris, M. A., Loh, M. Y., & Tuckey, M. (2020). The role of psychological detachment in burnout and depression: A longitudinal study of Malaysian workers. Scandinavian journal of psychology, 61(3), 423-435.

Index Terms

  1. Machine Learning-XGBoost Analysis of Chinese Employees’ After-work Subjective well-being

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CAIBDA '24: Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms
    June 2024
    1206 pages
    ISBN:9798400710247
    DOI:10.1145/3690407

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2024

    Check for updates

    Author Tags

    1. After-work Subjective well-being
    2. Artificial intelligence algorithm
    3. XGBoost

    Qualifiers

    • Research-article

    Conference

    CAIBDA 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 253
      Total Downloads
    • Downloads (Last 12 months)253
    • Downloads (Last 6 weeks)66
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media