1 Introduction
Social media has emerged as an essential and prevalent platform [
10,
33] in contemporary society, including Bangladesh. A 2017 study found that 50.2% of Asians utilize social media, with Facebook being the most popular platform, while Dhaka has the second-most active Facebook users [
4,
15]. The growing number of social media platforms, including but not limited to Facebook, Instagram, WhatsApp, Twitter, Snapchat, Telegram, and YouTube, has significantly transformed the academic sphere and reshaped the way in which individuals communicate [
1,
3,
9,
21,
23,
39,
43]. The correlation between scholastic achievement and social media has consistently captivated the interest of numerous researchers.
The dependency on social media has increased so much that it crossed the boundary and has become a part of everyday lives, in the era of technological advancement. The modern life of the students can not be imagined without the digital interactions. In the time of COVID-19, we have realized the importance of social media more than ever before [
14]. Therefore, it is necessary to detect, recognize, and address the issues that arise when personal growth and habits conflict with social media usage. However, emerging studies suggest that utilizing social media properly can bring enormous benefits to students, offering new connections, collaborations, skill development suggestions, and knowledge sharing [
42]. In line with this, scholars also tried to figure out social media’s position in college classroom technology [
9,
39].
Though using social media can lead students to mental health problems, academic frustration, sleep deprivation, traumas, and depression.[
20,
25,
40], it can also be used to address the issues that they face and can handle more efficiently. As we know, the focus of the socio-behavioral sciences is on finding the connection between one or more variables. Also, when the bond grows stronger, the number of shared interests increases [
8].
In our study, we used machine learning models to predict the dependency on social media of the students after collecting the contributing factors that drive them towards it. Furthermore, we also tried to find out the contributing factors using XAI
1 while suggesting using the survey questions to detect their problems and try to help them to balance their personal growth along with their social media usage. Our work mainly focuses on these two research questions:
(1)
How can machine learning and explainable AI (XAI) be applied to survey data to detect this relationship and identify contributing factors?
(2)
What methods can be implemented by social media platforms to identify users experiencing academic frustration?
This paper contributes by applying machine learning and Explainable AI (XAI) to identifying the factors that drive social media addiction due to academic dissatisfaction among students. Using models like XGBoost and interpretability techniques like LIME, we pinpoint critical influencers including academic stress, social comparisons, and excessive usage of social media. These findings provide recommendations on how to design focused interventions related to academic frustration that may be linked to social media dependency and highlight the need for such a sensitive issue to be addressed within educational and mental health settings.
2 Literature Review
Existing literature [
32] indicated that social networking sites significantly affect university students’ academic success. Additionally, this work underlines the challenges and surprising factors associated with young people’s use of social media, along with the disparities and competing interests that are also visible in this study. Furthermore, it demonstrates that social media underwent a significant transformation in 2009 when numerous new features, including emoticons, sharing, and comments, were introduced. Another study [
31] used quantitative data to assess the correlation between social media usage and academic accomplishment among ESL students. It revealed that ESL students effectively used social media, leading to higher academic achievement. Another study [
37] examined a popular practice among students at the time: using Twitter for informal discussions and Facebook for study groups. Furthermore, contemporary perspectives suggest that there is an emerging tendency among younger generations to dedicate a significant amount of time to social media interactions, specifically consuming short videos and reels [
41].
Study [
26] found that students spend around 2–4 hours each day on social media, forming a regular habit. Despite assessing students’ social media use and investigating its underlying causes, no relationship was found between academic accomplishment and the level of social media use. This research, with a minimal sample size of just 150 participants, did not adequately explain how social media influences academic performance or suggest strategies for improving academic achievements via social media usage. Another study [
12] found that observing other students’ successes on social media might lead to career dissatisfaction. Utilizing two separate methods—a self-reported survey and a 7-day experience sampling approach—researchers discovered the source of students’ irritation but did not propose or recommend a remedy. Similarly, this research [
27] found an association between academic achievement and social media use, particularly the amount of time students spend on these sites, without adequately explaining how social media influences academic performance or suggesting strategies for improvement.
According to [
20], students who are sad or frustrated may turn to social media as an escape. Another research [
28] investigates the relationship between Facebook use habits and depression symptoms in young people. It also discovered that depressive individuals had smaller social networks in terms of comments and likes but a higher frequency of wall posts and midday online activity, which might be indicative of loneliness. According to a study [
1] on Pakistani students, the negative consequences of social media on their conduct exceed the beneficial elements. To assist students, a group of researchers created a tool demonstrating the preliminary medium evaluation involving 67 real users. Fourteen university students indicated that the tool promoted self-reflection on social media usage, potentially reducing time spent on these platforms and enhancing the overall user experience [
29].
Numerous studies have suggested that social media could serve as an effective platform for helping students [
15,
34]. However, the majority of these studies have not been adequate in providing certain outcomes. An attempt was made to illustrate strategies for the effective utilization of social media in education in a study [
15]. It indicated that students could obtain substantial advantages by utilizing Facebook as an educational platform, and further proposed the endorsement of education-oriented profiles on the platform. Nevertheless, the research needed to provide clear concepts for executing these suggestions. Concurrently, an additional academic investigation [
21] contributed significant insights by advocating for a change in viewpoint concerning social media, advocating for perceiving it not solely as a highly damaging development of the time but rather as a platform with the capacity to enhance educational methodologies.
While previous research has generally focused on the detrimental influence of social media on academic performance, less attention has been paid to how social media might help students reach their goals. To the best of our knowledge, topics relevant to this subject have yet to be adequately studied in the available literature. It is of the utmost importance that E-HCI
2 researchers utilize this reinforcement to their advantage in this age of technological advancement. In addition, social media have influenced every aspect of our lives, including research and education. In contemporary HCI
3 research, there is considerable interest in the application of diverse methodologies to investigate various aspects such as teaching, news, model selection, UI-UX design, and social media [
24]. The utilization of qualitative and quantitative data analysis techniques is one of these. Although qualitative analysis continues to be preferred by HCI researchers, we must reevaluate our methods to manage the work more efficiently [
24]. For example, A prior studies’ authors utilized a combination of quantitative and qualitative methodologies to ascertain the credibility of the data collectors. The results obtained from the qualitative data analysis in this study were consistent with the outcomes generated by the numerous machine learning models that were utilized. To validate the findings of their research, they utilized conflict analysis and learning-based analysis; among all three methodologies, the obtained results were comparable [
35].
Another research investigation [
6] employed a mixed methods design to examine the correlation between social media engagement and Fear of Missing Out (FoMo). The researchers demonstrated a moderately positive correlation between SME and FoMo through quantitative analysis, while the challenges faced by minority students were investigated through qualitative analysis. The researchers in this study employed both categories for distinct objectives. For instance, they might have utilized them to verify their findings, thereby enhancing the strength of their insights from multiple vantage points. The results are presented in this study [
13] through the use of mixed methods. Seventy-two students were involved in the mixed-methods investigation into the correlation between social media usage and academic achievement. Nevertheless, while the sample size for the qualitative research was sufficient, it could have been expanded for the quantitative studies. Once more, for greater clarity, they might have performed some cross-verification.
On the other hand, the objective of the researchers for this study [
23] was to examine the impact of social media on productivity. This study proposed that since the majority of the research papers employed quantitative analysis, they conducted qualitative analysis, specifically focused group interviews, to obtain a more comprehensive understanding. An investigation [
31] utilizing quantitative data and statistical analysis was conducted on ESL students to establish a correlation between social media usage and academic achievements. Scholars ascertained the emotional ramifications of depression, anxiety, and stress on the higher education sector through the implementation of quantitative analysis [
20]. Meanwhile, HCI researchers rely on machine learning models very rarely for the three primary reasons outlined in one study [
24]: they consider ML unnecessary for their work, they are unaware of the potential efficacy of ML in their field, and they have insufficient exposure to success stories involving ML in comparable situations.
3 Methodology
In this section, we outline the procedures and implementation methods used in our work. First, in section
3.1, we describe the data collection process and provide an overview of the dataset. Next, in section
3.2, we explain the dataset preprocessing steps. In section
3.3, we detail our feature engineering techniques, including feature naming and binarization. Finally, in section
3.4, we describe the models employed for prediction and explanation. The overall workflow is illustrated in Figure
1. The diagram outlines the research methodology, beginning with data collection via online surveys, followed by preprocessing, feature engineering, and encoding Likert scale responses into binary values. It includes splitting the dataset into training and testing sets, applying machine learning models for classification, and leveraging Explainable AI (XAI) for model interpretability. The process concludes with interpreting results in the context of HCI theories.
3.1 Survey Data Collection and Dataset Description
For survey data collection we targeted mainly university-going students, and we collected the majority of data from Brac University, Khulna University of Engineering and Technology, and North South University. To collect survey data, we took the help of Google Forms, we circulated the form using social media and Messenger groups. We also collected data by visiting the classroom and lab classes in person. In addition, we took help from students and faculty members from these universities to distribute the Google Form link and collect data. Our data collection took almost 2 months, and we collected a total of 943 data. We started our survey on 15 October 2023 and ended on 12 December 2023. We used a total of 943 data for our analysis, as there were no missing data and all the data looked reliable after doing conflict analysis [
17]. In the data set, we found that most of the data came from undergraduate students (n = 897), but a small amount of data also came from graduate students (n = 46). In terms of gender, the majority of the responses came from male participants (n = 593), there were also female participants (n = 341) and 9 participants chose not to disclose their gender.
3.2 Data Pre-Processing and Data Set Description
Our survey data was divided into three segments. The first segment contained the demographic data about the participants, here we collected data about the respondents’ gender, academic year, and institution. In the second segment, we collected data about student’s academic satisfaction and social media addiction. Finally, in the third segment, we collected three open-ended questions to collect students’ suggestions. The second segment of our survey was collected in Likert scale format. Here, students had to choose between 1 and 5, 1 stood for “Strongly Disagree” and 5 stood for “Strongly Agree”. A detailed description of our survey dataset is shown in Table
1 and Table
2. We validated the reliability of our dataset using Cronbach’s Alpha, obtaining a solid score of 82.07%, which confirms the dataset’s dependability [
2] the correlation is also shown in the heatmap of the dataset in Figure
2. It shows the correlation matrix of the dataset, highlighting moderate to strong positive correlations among variables like
More_SM_to_cope and
SM_more_time_than_in_wanted (0.59) and
SM_Restless_irritable with
SM_Unsuccessful_attempts (0.62). Most other variables, including
Satisfied_res_binary, show weak or negligible correlations with others.
3.3 Feature Engineering
The “Feature Engineering” part focuses on changing raw data into a format that is more suitable for our machine learning algorithms. This involves identifying key features in our data that can help us make accurate predictions. Feature Naming is an important step in this process, since it allows us to give these features meaningful names that make them easier to identify. This stage is essential for understanding our data and ensuring that our models are interpretable.
3.3.1 Feature Naming.
When we conducted our survey, we tried to make our survey as elaborate as possible as that would help the participants understand the purpose of the questions. But when we are using this data for machine learning, explainable AI such as long names can cause problems and make it harder to read. So, we decided to make the names shorter and easier to understand. In Table
3 we have shown how we shortened the feature names. In this study, from now on, we will mostly use the short feature names.
3.3.2 Binarization of Target Column.
For prediction, we converted the target column, ‘Are You Satisfied With Your Academic Result,’ from a five-point Likert scale to a binary format. We applied a median split method [
19], where the column was divided based on its median value. Instances greater than the median were assigned a value of 1, while those less than or equal to the median were assigned a value of 0. The resulting binary output is displayed in the Figure
3.
3.4 Model Selection and Implementation
In this study, we utilized three machine learning models—XGBoost, K-Nearest Neighbors (KNN), and Gradient Boosting—to predict academic dissatisfaction linked to social media addiction, with a specific focus on Class 0 (students dissatisfied with academics and addicted to social media). These models were chosen for their diverse strengths:
•
XGBoost was chosen due to its robustness in handling imbalanced datasets and the possibility of achieving high performance utilizing very efficient boosting techniques.
•
KNN was chosen for its simplicity and capabilities in capturing nonlinear patterns within the data.
•
Gradient Boosting has been chosen because it iteratively minimizes the prediction error and is fairly interpretable, especially when compared to other ensemble methods.
Basic preprocessing of data and feature engineering were performed to clean and structure the dataset appropriately before the application of the machine learning models. The dataset was then split into training and testing sets in a 70-30 ratio, respectively, to ensure that a substantial portion of the data was used for training the model while it was evaluated on unseen examples. Hyperparameter tuning was conducted to optimize the performance of the model.
XGBoost turned out to be the best model for this critical task, offering the highest recall, 95%, for Class 0. In particular, this class was important because missing cases in this class would mean failing to detect socially media-addicted students, which is sensitive.
For interpretability, we prioritized Local Interpretable Model-agnostic Explanations (LIME) over SHAP, as LIME provides focused, instance-level insights into model predictions. This aligns with our goal of understanding individual cases in depth. LIME identified key contributing factors, such as academic stress, social comparisons, and excessive social media usage, offering actionable insights for addressing the issue of academic dissatisfaction driven by social media addiction.
4 Findings
The findings from our study emphasize the critical relationship between academic dissatisfaction and social media addiction. By leveraging machine learning models and Explainable AI, we identified significant factors contributing to this issue, including academic stress, social comparisons, and excessive time spent on social media. These insights validate the effectiveness of our predictive models, particularly XGBoost, in capturing patterns related to Class 0. The focus on this class ensures that individuals most affected by social media addiction due to academic dissatisfaction are accurately identified, providing a robust foundation for designing targeted interventions.
4.1 Machine Learning Prediction
In this research, we are focusing on class 0, which represents individuals dependent on social media due to dissatisfaction with their academic results. Identifying these individuals is crucial because it helps us understand the link between academic dissatisfaction and increased social media usage. Missing cases from this group (class 0) could lead to an underestimation of the problem, making recall for this class particularly important. High recall ensures that we capture as many individuals in this category as possible, thereby enabling us to better address the issue of academic dissatisfaction and its role in driving social media addiction. Prioritizing the accurate prediction of class 0 will help in designing interventions aimed at mitigating the effects of academic dissatisfaction on social media usage.
The performance analysis of the models— XGBoost, KNN, and Gradient Boost—shows that XGBoost performs the best for class 0, which is our focus. XGBoost achieves the highest recall for class 0 at 95%, indicating that it effectively captures nearly all instances of individuals dependent on social media due to academic dissatisfaction. This model also has a precision of 80%, meaning that most predictions made for class 0 are correct, and an F1 score of 87%, reflecting a strong balance between precision and recall.
KNN and Gradient Boost also perform well for class 0, with recall values of 88% and 89%, respectively, and F1 scores of 85%. However, their precision is slightly lower (83% for KNN and 81% for Gradient Boost), making XGBoost the most balanced and effective model overall in predicting class 0.
It’s important to note that our dataset contains more instances of class 0 than class 1, which contributes to the models’ poor performance in predicting class 1 (individuals satisfied with their academic results). This imbalance skews the predictions toward class 0, making it harder for the models to correctly classify the few instances of class 1. For example, XGBoost has a recall of only 21% for class 1. However, since our primary goal is to maximize the detection of class 0 cases, the lower performance for class 1 is less concerning in this context.
A full table detailing the performance metrics for each model, including precision, recall, and F1 scores for both class 0 and class 1, is provided in Table
4. Based on this analysis, XGBoost emerges as the most suitable model for our specific goal of identifying individuals dependent on social media due to academic dissatisfaction.
4.2 Explainable AI (XAI)
The application of LIME (Local Interpretable Model-agnostic Explanations) to interpret predictions from the most promising machine learning model, in our case XGBoost, provided valuable insights into the factors contributing to academic dissatisfaction among individuals addicted to social media which is shown in Figure
4.
The analysis identified academic stress (AC_Stressed ≤ 4.00) as the most significant factor influencing dissatisfaction with academic results. A strong positive correlation was observed between higher levels of stress and academic dissatisfaction, highlighting the detrimental impact that stress has on students’ academic experiences. Conversely, individuals who placed a lower perceived importance on academics (AC_Importance ≤ 4.00) were less likely to experience dissatisfaction. This suggests that those who value academic pursuits less may feel less pressure, resulting in lower stress levels and consequently, greater satisfaction.
Social comparison via social media (SM_Compare ≤ 2.00) emerged as another critical contributor to academic dissatisfaction. The findings indicate that comparing oneself to others exacerbates negative self-perceptions regarding academic performance, leading to decreased satisfaction. Furthermore, neglecting academic responsibilities due to social media use (SM_Neglect_AC_responsibility ≤ 2.00) was identified as a contributing factor, albeit with a less pronounced effect. This underscores how time spent on social media can detract from academic priorities, resulting in increased dissatisfaction.
Emotional responses such as restlessness and irritability
(SM_Restless_irritable ≤ 2.00) associated with social media use were also significant predictors of academic dissatisfaction. Excessive social media engagement can trigger negative emotions that adversely affect one’s academic experience. Additionally, the inability to limit social media usage (SM_Unsuccessful_attempts ≤ 2.00) was linked to higher levels of academic dissatisfaction, indicating a potential lack of control that may contribute to overall dissatisfaction.
Interestingly, the use of social media as a means to escape frustration (\(2.00 \lt \texttt{SM_escape_frust} \le 4.00\)) was associated with a slight decrease in dissatisfaction, suggesting that, for some individuals, social media may serve as a temporary coping mechanism to alleviate academic stress. In contrast, using more social media as a coping strategy (More_SM_to_cope ≤ 3.00) had minimal positive effects on dissatisfaction levels, indicating that while this behavior may provide temporary relief, it does not effectively mitigate overall academic dissatisfaction.
Lastly, spending excessive time on social media
(\(3.00 \lt \texttt{SM_more_time_than_in_wanted} \le 4.00\)) revealed a negligible negative effect on predicting dissatisfaction, suggesting that although excessive use is problematic, its specific impact on dissatisfaction is relatively minor compared to other factors.
In summary, the LIME explanations reveal that academic stress, social media comparison, and the inability to limit social media use are critical factors contributing to academic dissatisfaction. Additionally, variables such as the perceived importance of academics and attempts to escape frustration through social media play nuanced roles, either amplifying or slightly mitigating dissatisfaction. The XAI approach underscores the complex interplay between social media behaviors and academic outcomes, providing a clearer understanding of the factors that significantly influence academic satisfaction among social media users.
5 Discussion
We began our research by trying to correlate social media and academia, while providing some ideas for using the correlation positively. But while we tried finding these connections, we explored that the idea of social media as a negative influence [
1] was not fully right. According to Social Cognitive Theory [
22], a comprehensive framework is discussed for understanding how psychological factors (i.g. self-efficacy, motivation), behavioral patterns (i.g. social media use), and environmental influences (i.g. peer behaviors or academic demands) interconnect to impact social media dependency and academic satisfaction while helping to explain how individuals may become dependent on social media and how this dependency affects their educational lives.
Our analysis describes that students use social media more if they have academic dissatisfaction, stress, or frustration, whereas people with academic satisfaction have less social media dependency. Rather we got to see that students use social media as an escape whereas other studies [
20,
25] found that excessive use of social media can lead them to low self-control affecting their anxiety level. However, our findings can help the communities to modify their perspective towards social media usage. To illustrate, the result has succeeded in showing that social media usage does not hamper studies, rather academic disappointment leads them to social media.
Though many studies showed the negative aspects of social media [
11,
20,
26,
32], a study [
41] examined that increased exposure to social media, especially reels and short videos, leads to a growing concern about distraction among younger people, impacting their efficiency and productivity. On the contrary, from some prior studies [
5,
16,
25,
37] we got to know that if SNS can be used properly it can make students productive. If these ideas can be explored further, that can be a great exploration for today’s generation. In line with this, we present rational proposals for addressing this dilemma in countries like Bangladesh using machine learning models and analyzing survey queries.
In our study, we explored the students who use social media and how they use it according to the factors. After analyzing our data, we predicted that using survey questions could be a great solution to predict their problems. Furthermore, using it can also help the students to keep the balance of their daily lives and social media. Also, using our prediction model and XAI can predict the academically dissatisfied and addicted users perfectly, alongside XAI can also find out the factors which can be used to help the users to address their addiction.
Our XAI (Explainable AI) methodology is based upon the analysis of quantitative data, which mainly focuses on the contributing factors of the correlation between academic dissatisfaction and social media. In our analysis, we found that stress is one of the main contributors to their dissatisfaction. Furthermore, we also found that those who are proactive on social media compare themselves with others more, and stay dissatisfied. As previous studies showed that using social media properly can make them productive [
16,
25,
37]. In line with this, from our findings, we see that machine learning models such as KNN, XGBoost, Gradient Boost, and Explainable AI such as LIME yielded identical insights in a shorter duration [
7,
36]. Also according to prior study, we can ensure our data credibility and reduce biases easily by using quantitative methods like Cronbach alpha [
38] and quantitative analysis can provide us more credibility as it is free of human errors [
30].