Abstract
The advent of large language models (LLMs) has marked a new era in the transformation of computational social science (CSS). This paper dives into the role of LLMs in CSS, particularly exploring their potential to revolutionize data analysis and content generation and contribute to a broader understanding of social phenomena. We begin by discussing the applications of LLMs in various computational problems in social science including sentiment analysis, hate speech detection, stance and humor detection, misinformation detection, event understanding, and social network analysis, illustrating their capacity to generate nuanced insights into human behavior and societal trends. Furthermore, we explore the innovative use of LLMs in generating social media content. We also discuss the various ethical, technical, and legal issues these applications pose, and considerations required for responsible LLM usage. We further present the challenges associated with data bias, privacy, and the integration of these models into existing research frameworks. This paper aims to provide a solid background on the potential of LLMs in CSS, their past applications, current problems, and how they can pave the way for revolutionizing CSS.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Social media has become an integral part of modern society (Bruning et al. 2020). They are not merely networks for connecting friends, family, and associates, but have evolved into dynamic ecosystems for the dissemination of information, cultural exchange, and mobilization of social movements (Olan et al. 2022). As individuals engage in sharing moments from their daily lives, discussing global issues, or advocating for causes, these interactions form a complex mosaic of sentiments that influence public discourse. This multifaceted dialogue has the potential to go beyond the virtual world of social media and impact real-world scenarios, affecting everything from cultural attitudes to policy-making and societal standards. Social media content is highly variable, ranging from trivial updates on daily routines to significant announcements from world leaders. Social platforms serve as beacons of public opinion, battlegrounds for ideological clashes, sources for health information during crises such as pandemics, and marketplaces for businesses and influencers (Wong et al. 2021; Kizgin et al. 2020). These disparate use cases underscore the versatile nature of social media, illustrating the integral role these platforms play in contemporary communication infrastructure.
The surge in user-generated content presents the formidable challenge of navigating vast data volumes to uncover trends, sentiments, and patterns – a task beyond manual human analysis with the sheer volume and complexity of information. Additionally, social media platforms often become arenas for spreading hateful content, putting human moderators under significant distress due to the toxic nature of such content (Rauniyar et al. 2023). Hence, the necessity for automated data analysis cannot be overstated, given the explosion of content that exceeds human capacity for scrutiny. This is where advanced computational techniques come in, employing algorithms and analytics to process and interpret these data at an unprecedented scale. The integration of machine learning (ML), natural language processing (NLP), and big data analytics has enabled researchers to decode complex social phenomena in real time. The convergence of computational tools with social science methodologies has given maturity to the field of Computational Social Science (CSS).
CSS utilizes computational methods such as ML, NLP, network analysis, and data mining, to study social phenomena and human behavior in online environments. Using large-scale data sets from social media platforms, CSS researchers can investigate various sociocultural phenomena, including sentiment dynamics, information diffusion, community formation, and collective decision-making processes. CSS not only enriches our understanding of social structures, but also aids in the development of more inclusive and responsible social media platforms. Through CSS, designing algorithms that promote healthy discourse, flag harmful content, and foster inclusive communities becomes possible. The advent of Large Language Models (LLMs) has shown significant promise in various textual and multimodal applications. Thus, it also shows promises to advance CSS research. Recent advancements in Large Language Models (LLMs) have unlocked transformative possibilities in CSS by automating and enhancing social media data analysis. Models such as GPT-4, BERT, and T5 are trained on enormous volumes of data, allowing them to generate, classify, and interpret language in a way that mirrors human communication nuances. The linguistic and contextual understanding LLMs exhibit have proven invaluable for detecting patterns in sentiment, tracking ideological shifts, and even predicting social movement trajectories on platforms like Twitter, Reddit, and Facebook (Shah et al. 2024). By analyzing vast quantities of text in near-real-time, these models facilitate deeper insights into social phenomena, streamlining what would traditionally be complex and resource-intensive tasks. This ability to process and interpret social media data at scale, combined with their contextual intelligence, marks LLMs as essential tools in CSS research, enabling both granular and macro-level analyses. Unlike traditional computational models, LLMs possess the capability to understand and generate human-like text at scale, providing nuanced insights into complex social phenomena. While discussing the advancements of CSS research over time with traditional methods, this paper aims to explore the transformative potential of LLMs in CSS, highlighting their central role in automating and enhancing social media data analysis, content generation, and understanding of social behaviors.
Moreover, LLMs play a crucial role in mitigating issues such as misinformation and online toxicity, which are pervasive in social media environments. Given their sophisticated natural language understanding, LLMs can be fine-tuned to detect hateful speech, misinformation, and propaganda with increasing precision. Models like ChatGPT and Bard, when carefully optimized, can flag problematic content, provide context for misinformation, and even generate balanced summaries of contentious issues, aiding platforms in maintaining healthier digital ecosystems (Kreps et al. 2022). Additionally, the recent integration of multimodal capabilities in LLMs has enhanced their applicability in CSS by allowing them to analyze not only text but also images and videos in conjunction with contextual metadata (Alayrac et al. 2022). This advancement opens new frontiers in studying the multimodal nature of online discourse, enabling a more holistic understanding of the dynamics that shape public opinion and behavior on social media platforms. Figure 1 presents the popular subdomains within CSS that are impacted by the advent of LLMs in CSS.
In this paper, we examine the role of LLMs in CSS, highlighting their capabilities, challenges, and implications in various aspects of social media data analysis and content generation. We will cover methodologies, case studies, and potential future directions for utilizing LLMs in CSS research. We begin with a review of related works in the literature (Sect. 2), focusing on traditional methods used in CSS studies. We then discuss the fundamentals of LLMs (Sect. 3) and essential techniques for data collection and processing to identify the potential of LLMs in CSS. We explore LLM applications at the utterance level (Sect. 4), as well as at the discourse and network levels (Sect. 5), and in document-level analysis (Sect. 6). Furthermore, we investigate the role of LLMs in social media content generation (Sect. 7) and address considerations for LLM adoption (Sect. 8). By formalizing insights from these sections, we aim to contribute to a deeper understanding of how LLMs shape and analyze social media discourse, paving the way for future research in this rapidly evolving field of CSS.
2 Automated systems in computational social science
In recent years, CSS has greatly benefited from automated analysis techniques applied to vast digital content across various platforms (Ruths and Pfeffer 2014). While social media remains a central data source, CSS also analyzes content from e-commerce sites, news outlets, and forums to gain insights into public opinion, consumer behavior, and societal trends. These diverse sources enable CSS to examine human behavior beyond social media alone. This section discusses applications of automated analysis in CSS, highlighting its relevance across the broader digital landscape.
2.1 Sentiment analysis
Sentiment analysis, the computational process of identifying and categorizing emotional tone in text, has garnered significant attention from researchers in both computer science and social psychology. It aims to determine whether a piece of text conveys a positive, negative, or neutral sentiment, offering valuable insights into public opinion, emotional responses, and social dynamics. Early attempts at sentiment analysis were rudimentary, that primarily focused on bag-of-words models. Early models identified positive words and negative words, that lacked nuance and context. Soon the algorithms and techniques improved and the results appeared to be used in non-academic settings as well. The value of sentiment analysis has been evident with their use in UN’s world happiness report and in the US Gallup and Healthways. Studies have used frequency based methods (Frank et al. 2013; Nasim et al. 2022),word-shift graphs (Dodds et al. 2015), and machine learning methods to understand human sentiments in the online sphere. Researchers have also explored sentiments in languages other than English, particularly during population level events. Kaya et al. (2012) analyzed sentiment classification in Turkish political news, comparing Naïve Bayes (Rish et al. 2001), Maximum Entropy, Support Vector Machines (SVM) (Cortes and Vapnik 1995), and the character-based N-Gram Language Model (Pauls and Klein 2011). Their empirical findings revealed that Maximum Entropy and the N-Gram Language Model outperformed SVM and Naïve Bayes, achieving accuracies in the range of 65% to 77%. Similarly, Batra et al. (2020)used various machine learning algorithms on Tweets to predict election outcomes through sentiment analysis, and were able to achieve an accuracy of 86% with the decision tree classifier. Various studies have deep-dived into the sentiment of political communication in social media platforms (Haselmayer and Jenny 2017). Apart from the mainstream languages, researchers have also focused on various low-resource languages and code-switched languages to study political sentiment. For example, Rauniyar et al. (2023) introduced the Nepal Anti Establishment Discourse Tweets (NAET) dataset. Their comprehensive dataset of 4,445 annotated tweets enables evaluation of relevance, sentiment, satire, hate speech, and hope speech, while also establishing NLP baselines for these tasks, thus facilitating in-depth analysis and furthering NLP research in Nepali discourse.
2.2 Misinformation and disinformation detection
In the today’s digital era, information disorder is a major problem, especially the synthetic media generated disinformation. Detection of misinformation(false information) and disinformation(information deliberately intended to mislead) have become crucial in CSS Shah et al. (2024). Efforts have been made to address this problem by curation of various datasets and methodologies. Researchers have used simple models based on conditional probabilities, clustering (Weber et al. 2022; Weber and Neumann 2021), entropy(Nasim et al. 2018) to more sophisticated deep learning based models to analyse misinformation. Seddari et al. (2022) introduced a hybrid fake news detection system merging linguistic and knowledge-based approaches, leveraging features such as title, sentiment, and fact-verification metrics like website reputation and fact-checking opinions. With only eight features, their method outperforms state-of-the-art methods, achieving a 94.4% accuracy on a fake news dataset, showcasing the effectiveness of combining linguistic and fact-verification features in detecting fake news. Similarly, Devarajan et al. (2023) proposed a novel framework for fake news detection leveraging deep NLP, focusing on social features rather than just linguistic content analysis. The model operates across four layers, incorporating data acquisition, information retrieval, NLP-based data processing, and deep learning (DL) classification, achieving impressive accuracy and F1 scores of 99.72% and 98.33%, respectively, across three datasets including Buzzface, FakeNewsNet, and Twitter. Additionally, Wang et al. (2020) addressed the surge in social media medical misinformation by developing an automatic anti-vaccine message detector, particularly through visual content on platforms like Instagram. Introducing a DL network integrating both visual and textual information, alongside a new attention mechanism and ensemble method, the proposed model achieves over 97% testing accuracy on a dataset of more than 30,000 Instagram posts from January 2016 to October 2019, surpassing existing detection systems and demonstrating its efficacy in combating antivaccine messaging. Similarly, Hossain et al. (2020) focused on the need for tools to combat COVID-19 misinformation on social media, particularly Twitter, given the unique language and rapidly evolving nature of pandemic-related information. They introduce COVIDLies, a dataset of 6761 expert-annotated tweets covering 86 distinct COVID-19 misinformation topics, aimed at facilitating research on misinformation detection. By evaluating existing NLP systems on this dataset, they establish initial benchmarks and highlight key challenges for future model enhancements.
2.3 Hate speech analysis
Given the diverse nature of content in social media, hate speech can also be prevalent (Parihar et al. 2021). Therefore, understanding and addressing hate speech is crucial for maintaining a safe and inclusive online environment. Hate speech analysis primarily focuses on identifying and understanding harmful, discriminatory, or offensive language in digital content. By NLP techniques, researchers aim to detect patterns of hate speech across online platforms, helping to mitigate its spread and impact. This area of study is crucial for developing algorithms that can flag toxic content, promote healthier online discourse, and inform policy decisions related to online behavior and moderation. Chaudhari et al. (2020) present a research tool aimed at raising awareness about hate speech prevalent in online platforms such as blogs, forums, and newspapers. Utilizing a Convolutional Neural Network(CNN) architecture combined with NLP techniques, their tool accurately identifies hate speech with an 80.15% accuracy and an F1-score of 80.35%. Similarly, Sajjad et al. (2019) proposed a system to classify tweets into categories of racism, sexism, or none. Their approach integrates deep features from a CNN trained on semantic word embeddings with syntactic and word n-gram features, achieving superior performance compared to existing methods on a standard dataset of 16k manually annotated tweets. Moreover, Zhang et al. (2018) developed a new method based on combining CNN and Gated Recurrent Networks (GRU) for the detection of hate speech. Through extensive evaluation against several baselines and state-of-the-art approaches on a large collection of Twitter datasets, they demonstrate the method’s effectiveness in capturing word sequence and order information in short texts, setting new benchmarks by outperforming previous results on six out of seven datasets between 1 and 13% in F1 score. Additionally, they contribute to the field by expanding the dataset collection with a new dataset covering various topics related to hate speech detection. Similarly, Mozafari et al. (2020) addressed the challenge of identifying hateful content on social media by introducing a novel transfer learning approach based on the BERT language model. By fine-tuning BERT on annotated datasets for racism, sexism, hate, or offensive content on Twitter, they demonstrate considerable performance improvements in terms of precision and recall compared to existing approaches. Their method shows promise in capturing biases in data annotation and collection processes, potentially leading to more accurate hate speech detection models. Additionally, the detection of hate among certain groups such as transphobia and homophobia is also an important task in CSS. Chakravarthi (2023) present a new dataset of 15,141 comments for detecting online homophobia and transphobia, annotated by experts for English, Tamil, and both languages. They conducted experiments with various machine learning models and DL architectures, with the best systems achieving average macro F1 scores of 0.570 for Tamil, 0.870 for English, and 0.610 for the combination of Tamil and English languages. They also launched a shared task at the LTEDI-ACL 2022 workshop (Chakravarthi et al. 2022) to further the research on homophobia and transphobia.
Apart from detecting hateful content in textual data, research has been done widely to understand multimodal data like memes and text-embedded images (Thapa et al. 2023). Perifanos and Goutsos (2021) propose a novel multimodal approach to detect hateful and abusive speech on Twitter, specifically targeting racist and xenophobic content aimed at refugees and migrants in Greece. By combining computer vision with NLP models such as BERT and Residual Neural Networks, they achieve an impressive accuracy score of 0.970 and an F1-score of 0.947 in their best model. Additionally, they contribute to the field by creating a new dataset for hate speech classification and releasing a pre-trained language model trained on Greek tweets. Similarly, Lee et al. (2021) introduce DisMultiHate, a novel framework for classifying multimodal hateful content, particularly focusing on memes. Their approach disentangles target entities within memes to improve classification accuracy and explainability. Through extensive experiments on publicly available datasets, they demonstrate that DisMultiHate outperforms both unimodal and multimodal baselines in classifying hateful memes, showcasing its effectiveness in handling multimodal content and providing insights into its classification decisions. Moreover, Karim et al. (2022) address the lack of resources for hate speech detection in under-resourced languages like Bengali, particularly focusing on analyzing multimodal memes and texts from social media. They introduce the first multimodal hate speech dataset of 4,500 memes in Bengali and train state-of-the-art neural architectures, achieving the best performance with Conv-LSTM and XLM-RoBERTa (Conneau et al. 2020) models for textual analysis, and ResNet-152 (He et al. 2016) and DenseNet-161 (Huang et al. 2017) models for analyzing memes, with the multimodal fusion of XLM-RoBERTa and DenseNet-161 yielding the highest F1 score of 0.83. Bhandari et al. (2023) curated a dataset spanning 4,700 images from the Russia-Ukraine crisis. They found that multimodal methods greatly surpassed unimodal models for hate detection and the CLIP model performed the best among the multimodal models.
2.4 Humor and stance detection
Alongside understanding sentiment and hate speech, understanding humor and stance in social media is crucial for accurately interpreting communication, identifying users’ intentions, and analyzing sentiment trends. Humor plays a significant role in shaping social interactions, fostering engagement, and reflecting cultural nuances. Additionally, analyzing stances allows researchers to gauge individuals’ attitudes and perspectives on various topics, providing valuable insights into public sentiment. It also allows decision-makers and policy-makers to gain a deeper understanding of public opinion dynamics and identify areas where policy changes may be needed to effectively address people’s needs. Augenstein et al. (2016) tackle the challenging task of stance detection by proposing a conditional LSTM encoding approach, which builds tweet representations dependent on the target, outperforming independent encoding methods. By augmenting this model with bidirectional encoding, they achieve state-of-the-art results on the SemEval 2016 Task 6 Twitter Stance Detection corpus (Mohammad et al. 2016), when weak supervision is added. Similarly, Darwish et al. (2020) proposed an unsupervised framework for detecting the stance of prolific Twitter users on controversial topics, leveraging dimensionality reduction and clustering techniques. Their approach also eliminates the need for prior user labeling and domain-specific knowledge, demonstrating robustness against data skewness and achieving high-quality user clusters with over 98% purity, making it suitable for downstream classifier training. Furthermore, Li et al. (2019) propose a two-channel CNN-GRU fusion network to address challenges in stance detection, overcoming issues of information loss and inadequate feature extraction from text of varying lengths. Their method achieves significantly improved accuracy and F1 scores compared to SVM, CNN, GRU, and hybrid models, outperforming existing approaches while maintaining runtime efficiency. Similarly, Dey et al. (2018) introduce T-PAN, a novel framework for topical stance detection, employing a two-phase solution incorporating Long Short-Term Memory (LSTM) with attention mechanisms. By classifying subjectivity and sentiment in tweets concerning a given topic, T-PAN achieves superior performance on the SemEval 2016 stance detection Twitter dataset. In the domain of stance detection on social media, where most research has been conducted for high-resource languages, Tran et al. (2021) explore stance detection for low-resource languages. They introduce the first annotated Vietnamese corpus for stance detection on social issues, comprising 11,253 pairs of claims-comments labeled with four stances. Through thorough experimentation, they demonstrate that a DL model, notably attentive BiLSTM enriched with character embeddings and stance knowledge, outperforms traditional methods, achieving a macro F1 score of 66.71% and an accuracy score of 66.32%.
Humor detection is another important CSS task in social media. Annamoradnejad and Zoghi (2020) proposed a novel approach for automatic humor detection in short texts, leveraging BERT embeddings and a neural network architecture. Their method achieves impressive accuracy and an F1-score of 98.2% on a newly created dataset consisting of 200k formal short texts, outperforming baseline models and highlighting the significance of incorporating linguistic structure into machine learning models for humor detection. Similarly, Wu et al. (2021) introduce MUMOR, a novel dataset comprising multimodal dialogues in both English and Chinese extracted from TV sitcoms. With 29,585 annotated utterances across 1,298 dialogues, this corpus facilitates research in humor detection, humor generation, and multi-task learning on emotion and humor analysis, marking the first dataset of its kind to include Chinese conversations for humor detection. On the other hand, (Bellamkonda et al. 2022) address the challenge of automatic humor detection in low-resource languages like Telugu by collecting and annotating 2,649 Telugu tweets. They experiment with transformer models including Multilingual BERT, Multilingual DistillBERT (Sanh et al. 2019), and XLM-RoBERTa, showing that XLM-RoBERTa achieves the best performance with an F1-score of 0.82 and 81.5% accuracy. Alongside the works in low-resource language, Mao and Liu (2019) utilize BERT, a bidirectional transformer encoder, fine-tuned on a mostly Spanish corpus of crowd-annotated tweets (Chiruzzo et al. 2020) from the 2019 HAHA task (Chiruzzo et al. 2019) for humor detection. They achieve competitive results with an F-Score of 0.784 for joke detection and an RMSE of 0.910 for funniness score prediction. They also highlighted the applicability of their approach to multilingual text classification tasks. While textual modality is widely used, with the recent adoption of memes and text-embedded images, audio, and video, multimodal learning has also become popular. Pramanick et al. (2022) delve into multimodal sarcasm and humor detection, proposing MuLOT, a novel multimodal learning system that leverages self-attention for intra-modal correspondence and optimal transport for cross-modal correspondence, achieving significant accuracy improvements over the state-of-the-art on three benchmark datasets - MUStARD (Castro et al. 2019), UR-FUNNY (Hasan et al. 2019), and MST (Cai et al. 2019).
2.5 Social network understanding and analysis
The three main characteristics of social networks are structure, content and function (Wasserman and Faust 1994).In the context of understanding social networks, CSS focuses on analyzing the structure and dynamics of social networks within online communities. This involves studying the connections between individuals, identifying influential nodes, and examining patterns of information diffusion and interaction (Romero et al. 2013; Lee et al. 2013; Nasim et al. 2016). Using computational methods, CSS researchers can gain insight into how social networks evolve, how information spreads within them, and how communities form and interact (Nasim et al. 2022), contributing to a deeper understanding of social dynamics in digital spaces. Baumann et al. (2020) propose a model that incorporates the dynamics of radicalization to explain the emergence of echo chambers and polarization of opinions on social media. Through empirical validation against polarized debates on Twitter, the model demonstrates how social influence and controversial topics contribute to the transition from global consensus to radicalized states. Similarly, Nagarajan et al. (2020) conducted a social network analysis of SARS-CoV-2 contact tracing data from a large state in India, aiming to assess individual-level variations in disease transmission. Their analysis identified influential patients and patient components that significantly contributed to transmission, suggesting that network metrics could complement existing contact-tracing indicators and improve contact-tracing activities in combating the pandemic. Within the scope of the COVID-19 pandemic, Hung et al. (2020) conducted a study analyzing COVID-19 discussions on Twitter from March 20 to April 19, 2020. They found that sentiments varied from positive to negative across different themes, with prevalent topics including healthcare environment, emotional support, business economy, social change, and psychological stress. Geographic analysis revealed regional differences in sentiment expression, providing insights into public’s response to the pandemic.
Furthermore, Walter et al. (2023) present a novel approach to studying parliamentary speechmaking by examining Members of the European Parliament (MEPs) debate interactions as a dynamic relational network phenomenon. Through analysis of over 11,000 debate interactions, they reveal patterns of inclusiveness, power dynamics, and transnational policy alliances, demonstrating that male, senior, and influential MEPs from powerful member states tend to receive more attention, with evidence of a self-reinforcing effect over time. Additionally, they find that factors such as seniority and nationality play significant roles in shaping debate coalitions, with seniority having a greater impact than leadership positions and nationality being more influential than political leaning. With an increasing concern over climate change among today’s youth and their significant presence on social media platforms, particularly TikTok, Sun et al. (2024) aim to investigate how climate-related news and disasters are portrayed on the platform. Analyzing fifty TikTok accounts focused on climate content, they utilize social network analysis to evaluate the influence of various entities, revealing that internet influencers have the most substantial impact on disseminating climate change news, while the government plays a significant role in addressing climate disasters. Their findings highlight TikTok as a valuable platform for gauging public sentiment on global warming, underscoring the importance of credible experts in delivering scientifically sound information within the platform’s constraints.
Furthermore, exposure to e-cigarette marketing on social media, particularly Instagram, has been linked to increased e-cigarette use among US adolescents. Vassey et al. (2023) conducted a social network analysis to identify the central influencers and e-cigarette brands on Instagram, revealing a highly interconnected network of influencers collaborating with over 600 e-cigarette brands in 2020. They found that a significant portion of influencers did not restrict youth access to their promotional content, emphasizing the importance of understanding these associations to inform public health communication and potential regulatory policies. Similarly, social network analysis was employed to assess the relationships among greenspace actors in Tehran, aiming to inform a management plan based on social monitoring. Jazayeri et al. (2023) found that while the social network of local stakeholders exhibited weak to moderate cohesion and social capital, there was a moderate level of stability. Additionally, the study emphasized the importance of strengthening information exchange and participation among stakeholders to enhance social capital and achieve successful sustainable management of urban greenspaces in Tehran. Thus, social network understanding and analysis play a crucial role in various domains, including public health, climate change communication, political discourse, and urban management. By leveraging computational methods and social network analysis techniques, researchers can gain valuable insights into the dynamics of social networks, informing policies and interventions to address pressing societal challenges and promote positive outcomes.
2.6 Event related information analysis
In the domain of event-related information analysis, CSS focuses on extracting valuable insights from social media data about significant events, such as natural disasters (Ramakrishnan et al. 2014), elections, civil unrest (Tuke et al. 2020), or cultural phenomena. This involves identifying relevant events, tracking their progression, and analyzing public reactions and discussions surrounding them. By employing advanced computational techniques, researchers can uncover patterns, trends, and sentiments related to these events, contributing to a deeper understanding of their impact on society. Xia et al. (2015) present a novel framework for real-time city event detection and extraction from social media platforms like Instagram and Twitter. Their approach combines bursty detection (Li et al. 2014) to identify candidate event signals, integrates post-streams to classify true events, and utilizes text, image, and geolocation information to retrieve relevant photos for detected events, demonstrating improved event detection accuracy and relevance in experiments on a large dataset. Similarly, Alsaedi and Burnap (2015) conduct an in-depth comparison of temporal and textual features for identifying disruptive events on Twitter. They find that disruptive events can be detected regardless of user influence, and temporal features are crucial for event detection, while textual features contribute to overall performance enhancement. Furthermore, Nguyen et al. (2013) present a novel approach to studying user psychological behaviors through sentiment analysis of blog posts, with the objective of automatically infer significant real-world events. They devise a temporal sentiment index function based on valence values of affective bearing words, which correlates well with major world events, and propose a stochastic burst detection model to identify sentiment bursts occurring within specific moods. Their experiments on a large-scale dataset demonstrate the effectiveness of their approach in capturing both significant global events and finer-grained sentiment fluctuations. Additionally, Benson et al. (2011) propose a novel method for record extraction from social media streams like Twitter, addressing the challenges posed by short, colloquial messages and incomplete relations. Their graphical model simultaneously learns a latent set of records and record-message alignment, leading to accurate event record extraction and significant error reduction compared to baseline methods. In a parallel direction, there have been tools developed for event extraction. Ritter et al. (2012) introduce TwiCal, the first open-domain event extraction and categorization system tailored for Twitter. They demonstrate the feasibility of accurately extracting a calendar of significant events from Twitter and propose a novel approach for discovering important event categories and classifying extracted events, achieving a 14% increase in maximum F1 over a supervised baseline.
2.7 Additonal CSS tasks in literature
Beyond the tasks highlighted earlier, there are various other important tasks in CSS, such as the detection of hope speech. Chakravarthi (2022) developed a multilingual dataset of 59,354 YouTube comments incorporating English, Tamil, and Malayalam languages, and a custom deep network architecture aimed at recognizing and fostering positivity, particularly hope speech. Their proposed model, incorporating T5-Sentence embeddings (Raffel et al. 2020) and a CNN-based approach, outperforms existing methods, achieving macro F1-scores of 0.75 for English, 0.62 for Tamil, and 0.67 for Malayalam. Similarly, Balouchzahi et al. (2023) introduce a novel hope speech dataset of 8,256 tweets for social media analysis, categorizing tweets into ‘Hope’ and ‘Not Hope’, and further into three fine-grained hope categories. Their study includes detailed annotation guidelines and explores various learning approaches, showing that contextual embedding models achieve higher performance in hope speech detection compared to simple machine learning classifiers. Islam et al. (2020), on the other hand, aims to tackle the pervasive issue of online abuse and bullying by leveraging NLP and machine learning techniques for detection tasks. Their research assesses the effectiveness of Bag-of-Words and TF-IDF features combined with four machine learning algorithms to identify abusive messages on social media platforms accurately. Additionally, in efforts to discern the multiple aspects within social media discourse, Shiwakoti et al. (2024) introduced a multi-aspect dataset of 15,309 tweets related to climate change discourse on Twitter. The aspects incorporated relevance, stance, hate speech, the direction of hate, target of hate, and humor. Further, they employed data analysis techniques like topic modeling and sentiment analysis in their dataset and performed baseline classification with BERT-based models across all 6 tasks. Further, Thapa et al. (2024) launched a shared task co-located with EACL 2024 using this dataset, to further the research in automated systems for social media climate change discourse analysis.
Apart from these, research has also focused on mental health analysis in social media (Naseem et al. 2023; Sawhney et al. 2023). Li et al. (2020) tackle the impact of the COVID-19 pandemic on mental health by employing NLP techniques to analyze tweets. They introduce the EmoCT dataset, consisting of 1,000 English tweets labeled with emotions, and propose an approach to identify the reasons behind sadness and fear while studying emotion trends at both keyword and topic levels. Similarly, social media platforms like Twitter serve as valuable sources of data for monitoring public health, particularly in detecting personal health mentions (PHMs). Karisani and Agichtein (2018) propose WESPAD, a method combining lexical, syntactic, and context-based features, to effectively identify PHMs in social media posts, overcoming challenges such as inventive spelling and figurative language. Their approach outperforms existing methods, especially in scenarios with limited training data, offering a promising solution for detecting health-related mentions on social media platforms. Apart from these, there are additional tasks within CSS such as study of figurative language (Pilar Salas-Zárate et al. 2020), persuasion detection (Piskorski et al. 2023), crisis analysis (Nguyen et al. 2017; Alam et al. 2021), automated fact-checking‘ (Guo et al. 2022), propaganda detection (Yu et al. 2021), etc. Table 1 discusses the various methods used in CSS. Similarly, Table 2 gives the various datasets discussed in the literature.
3 Large language models
LLMs mark a significant breakthrough in the field of NLP. They are capable of understanding and generating human-like text with remarkable accuracy and fluency. The advent of LLMs has generated waves in many subdomains of CSS, among many other disciplines. This section explores the capabilities and promises of LLMs and their potential applications and benefits.
3.1 Background
The roots of LLMs can be traced back to the dawn of artificial intelligence research in the 1950 s. Early language models were statistical in nature, able to analyze large amounts of text to predict the next word in a sequence. One of the seminal examples was ELIZA (Weizenbaum 1966), developed by Joseph Weizenbaum in 1966. ELIZA mimicked conversation through pattern matching and keyword recognition, creating the illusion of language understanding. These early models, however, lacked the sophistication to handle complex language tasks. The arrival of DL in the 1990 s marked a turning point. DL algorithms, inspired by the structure and function of the human brain, allowed for the creation of much more powerful language models. These models were able to learn complex relationships between words and capture intricate patterns within language data. A significant breakthrough in NLP arrived with the Transformer architecture in 2017 (Vaswani et al. 2017). This architecture allowed models to efficiently process entire sequences of words at once, leading to notable improvements in language understanding and generation. Coupled with the ever-increasing availability of massive text datasets, this paved the way for the development of truly large language models.
LLMs are essentially artificial neural networks with billions or even trillions of parameters. These parameters are trained on massive amounts of text data, allowing the model to learn the syntactic and semantic relationships between words and sentences. During training, the model is presented with sequences of text and trained to predict the next word in the sequence. The model predictions are compared with the actual text, and the errors are used to adjust the network parameters. This iterative process continues until the model learns to accurately predict the next word in a sequence based on the preceding context in the sentence (Egami et al. 2024; Min et al. 2023). There are two main training paradigms for LLMs:
-
Supervised learning: In this approach, the training data is labeled with specific information, such as the sentiment of a sentence or the topic of a document. This allows the model to learn specialized and specific tasks.
-
Self-supervised learning: In this approach, the model learns from unlabeled text data by identifying patterns and relationships within the data itself. This approach is particularly useful for training LLMs on massive datasets where the labeling of data points may not be feasible.
3.2 The abilities and promises
LLMs have emerged as powerful tools in natural language processing (NLP), demonstrating state-of-the-art capabilities in understanding and generating human-like text (Wei et al. 2022; Zhao et al. 2024). These models become proficient due to their training on massive amounts of text data, which enables them to capture intricate linguistic patterns and semantic nuances (Thirunavukarasu et al. 2023; Kasneci et al. 2023). LLMs are typically built on transformer-based architectures, such as OpenAI’s GPT (Generative Pre-trained Transformer) series (Brown et al. 2020), which leverage the self-attention mechanism to process input sequences and generate output sequences with remarkable fluency and coherence while retaining semantic knowledge between word types (Kalyan 2023). One of the key strengths of LLMs is their ability to perform a wide range of NLP tasks. Some of the subdomains of NLP where LLMs have shown remarkable abilities are:
-
Text generation: LLMs can generate human-quality text in various styles and formats (Yuan et al. 2022; Kaddour et al. 2023). They can create poems, code, scripts, musical pieces, emails, letters, different writing styles from news articles to blog posts, and even realistic dialogue, making them valuable tools for creative writing, content generation, and copywriting.
-
Language comprehension: LLMs go beyond simple word prediction. They can understand complex nuances of human language such as sarcasm, sentiment, and context. This allows them to perform tasks like question answering, where they can grasp the intent behind a question and provide a relevant answer, or sentiment analysis, where they can predict the emotional tonality of a sequence of text.
-
Machine translation: LLMs enable seamless communication across languages, pushing the boundaries of machine translation. They can translate text between languages with high accuracy, preserving the meaning and style of the original text. This has the potential to break down language barriers and foster global collaboration.
-
Knowledge acquisition: Through their training data, LLMs acquire a vast amount of knowledge about the world. This knowledge encompasses a wide range of topics, from history and science to contemporary events and pop culture. This makes them valuable tools for tasks such as information retrieval, where they can search through vast amounts of text data to find relevant information, and question answering, where they can harness their knowledge base to answer complex questions comprehensively and informatively.
In addition to their proficiency in NLP tasks, LLMs demonstrate significant promise in other areas. These models are increasingly adept at analyzing data beyond text, making them valuable tools for various multimodal applications. Multimodal LLMs go beyond text and are trained by using various data modalities, including:
-
Images: Multimodal LLMs can process and understand the content of images. They can generate captions describing images, classify objects within these images, and translate visual information into text descriptions.
-
Audio: Multimodal LLMs can analyze and understand audio data. This enables them to perform tasks like speech recognition, sentiment analysis of spoken language, and even music generation.
-
Video: Multimodal LLMs can process and understand the complex information within videos. This enables applications such as video summarization, where LLMs can generate captions or summaries of video content, and video question answering, where they can answer questions based on the information presented in a video.
By incorporating these different modalities, multimodal LLMs aim to achieve a more comprehensive understanding of the world and interact with it in richer ways. This advancement unlocks exciting prospects across multiple related domains such as computer vision and affective computing.
3.3 Why LLMs in CSS
LLMs offer unparalleled precedences in CSS, addressing several challenges inherent in traditional model training and decision-making processes:
-
Resource and time efficiency: Training domain-specific models with proprietary data demands significant resources and time. LLMs, pre-trained on vast datasets, alleviate this burden by providing a starting point for CSS tasks, reducing the need for extensive data collection and model training from scratch.
-
Data availability and labeling constraints: Obtaining comprehensive datasets with labeled data points for training traditional models can be impractical or infeasible, particularly in CSS domains with subjective, nuanced, or dynamic content. LLMs, trained on diverse datasets, mitigate this issue by harnessing pre-existing knowledge and patterns from extensive text corpora.
-
Opacity of model decision-making: Traditional models often operate as black boxes, making it a challenge to understand the underlying mechanisms driving their decisions. In contrast, LLMs offer transparency and interpretability, enabling researchers to probe model outputs and prompt explanations for their decision-making process. This facilitates a deeper understanding of the factors influencing model predictions and enhances the trustworthiness and reliability of LLMs in CSS applications.
Moreover, the adoption of LLMs introduces novel avenues for research and innovation in CSS. This particularly includes:
-
Semantic understanding and contextualization: LLMs excel at capturing semantic nuances and contextual information within text data, enabling more intricate analyses of social phenomena and discourse dynamics. This enhanced understanding facilitates more accurate sentiment analysis, opinion mining, and content classification in CSS tasks.
-
Multimodal integration: With the evolution of Multimodal LLMs, incorporating diverse modalities such as text, images, and audio, CSS researchers can explore complex interactions between different data types in social media environments. This integration enriches analyses and insights, paving the way for a more comprehensive understanding of social behavior and communication patterns.
-
Ethical and responsible AI: LLMs provide opportunities for addressing ethical considerations in CSS research, including bias mitigation, fairness, and responsible data usage. By examining model outputs and understanding decision-making processes, researchers can identify and mitigate potential biases, ensuring more equitable and ethical analyses of social media data.
Furthermore, abilities in LLMs such as Chain-of-Thought (CoT) within itself can effectively address many CSS problems. CoT refers to a sequential reasoning paradigm in which LLMs are guided through logical reasoning step by step. In CoT, the model follows a structured chain of reasoning, building upon previous steps to arrive at a logical conclusion or response. This approach is effective for tasks that involve following a logical sequence of steps, such as solving puzzles or answering questions based on given information. Similarly, Leap-of-Thought (LoT) has also become increasingly popular in CSS tasks. It refers to a non-sequential, creative paradigm involving strong associations and knowledge leaps within the context of problem-solving or generating novel ideas. Unlike CoT, which guides reasoning step-by-step, LoT emphasizes out-of-the-box thinking and making unconventional connections between seemingly unrelated concepts. It is particularly relevant for tasks that require creativity, innovation, and the ability to generate unexpected solutions.
Thus, the integration of LLMs in CSS not only addresses existing challenges but also opens up new avenues for research and innovation. Beyond their capabilities in traditional NLP tasks, LLMs enable researchers to dive deeper into semantic understanding, context-aware analysis, and multimodal integration, enriching the breadth and depth of insights gathered from social media data. Furthermore, LLMs democratize access to advanced machine-learning techniques, allowing even those without DL expertise to leverage their power through zero-shot pre-trained models. This democratization fosters interdisciplinary collaborations and empowers researchers across domains to utilize the potential of LLMs in exploring complex social phenomena. As the field of CSS continues to evolve, the adoption of LLMs promises to drive transformative advancements, shaping the future of social science research and societal understanding in the digital age. In the upcoming sections, we discuss various applications of LLMs in CSS. A brief overview of the general application of LLM in CSS is given in Fig. 2. We partially adapt the taxonomy of CSS tasks provided by Ziems et al. (2024).
Overall popular research areas of LLM when applied to CSS. Data Annotation and Augmentation focuses on preparing and enriching CSS data using LLMs. Utterance-level analysis encompasses individual forms of communication. Discourse and Network Level Analysis examine larger contexts in CSS such as social media, discussion forums, and other information networks where users are interconnected in a network. Document-level Analysis analyzes texts in documents, where LLMs have to tackle texts with a very long context. Considerations in LLM Adoption shift the focus from types of analysis to the practical and ethical aspects of incorporating LLMs into CSS research
3.4 CSS data augmentation with LLM
In traditional machine learning models, data imbalance can cause the models to develop bias towards the majority class, leading to poor generalization and performance on minority classes. This imbalance skews the model’s learning process, potentially resulting in inaccurate predictions, misrepresentation of underrepresented groups, and a lack of fairness in outcomes. Such skewed models may overlook critical insights or patterns present in less prevalent data, undermining the effectiveness of data-driven decisions and analyses across various applications. LLMs can help mitigate the challenges posed by data imbalance through data augmentation. By generating synthetic examples of underrepresented classes or scenarios, LLMs can enrich datasets, ensuring a more balanced representation of different groups or events. This augmentation process not only helps in addressing the issue of data scarcity for minority classes but also aids in enhancing the model’s ability to learn from a more diverse set of examples, leading to improved accuracy and fairness in predictions. LLMs, with their deep understanding of context and language nuances, can create high-quality, realistic text data that closely matches the distribution of real-world datasets. This capability allows for the expansion of datasets in a controlled manner, where researchers can specify certain attributes or characteristics that need more representation. Furthermore, LLM-generated data can be used to simulate rare events or emerging trends not yet present in existing datasets, providing valuable insights for predictive modeling and future scenario analysis.
Chowdhury and Chadha (2023) investigate the impact of using generative models to augment question-answering (QA) datasets, for enhancing model’s robustness against natural distribution shifts. By experimenting with four datasets closely related to CSS under varying degrees of distribution shift and augmenting them with contexts and QA pairs generated ‘in-the-wild’, their study demonstrates that such data generation approaches significantly improve domain generalization and robustness of QA models to these shifts. Similarly, Bui et al. (2024) address the challenge of data scarcity in legal text analysis by employing data augmentation techniques and leveraging a state-of-the-art language model to enhance the training dataset for the Competition on Legal Information Extraction and Entailment (COLIEE) (Goebel et al. 2023). Their approach significantly improves the model’s ability to generalize across various legal contexts and demonstrates its effectiveness through competitive performance on COLIEE datasets, showcasing advancements in entailment predictions and information retrieval. In addition to the English language, Zhang et al. (2024) explore the enhancement of Chinese dialogue-level dependency parsing through three innovative data augmentation strategies using LLM, specifically targeting word-level, syntax-level, and discourse-level augmentations. Their experimental validation on a benchmark dataset demonstrates significant improvements in parsing performance, especially in dependencies among elementary discourse units (EDUs), highlighting the efficacy of LLMs in enriching training data and optimizing dependency parsing accuracy. Furthermore, Whitehouse et al. (2023) highlight the importance of enhancing multilingual common-sense reasoning capabilities in low-resource scenarios by leveraging LLM for data augmentation across diverse languages. Utilizing models such as Dolly-v2,Footnote 1 StableVicuna,Footnote 2 ChatGPT, and GPT-4, the study augments multilingual datasets including XCOPA (Ponti et al. 2020), XWinograd (Tikhonov and Ryabinin 2021), and XStoryCloze (Lin et al. 2022), and demonstrates significant improvements in model performance, notably a 13.4 accuracy score increase in the best-case scenario. The research also presents findings from a human evaluation on the naturalness and logical coherence of LLM-generated text, showing strengths in most languages but identifying challenges in generating meaningful content for languages with fewer resources, like Tamil.
4 LLMs in utterance-level analysis
In CSS, the analysis of utterances, or individual units of speech or text, is of significant importance in understanding various aspects of human communication and interaction. The premise of examining utterances at a granular level lies in the recognition that they serve as fundamental building blocks of discourse, conveying nuanced information about sentiment, intent, and social dynamics. In this section, we discuss various analyses like misinformation, sentiment, hate speech, humor, and stance detection through the use of LLMs.
4.1 Misinformation and fake news detection
In misinformation and fake news analysis, understanding at the utterance level plays a crucial role in identifying subtle linguistic cues, patterns, and contextual nuances that may indicate the presence of deceptive or misleading information within social media conversations or online discourse. As shown in Fig. 3, LLMs can help identify those nuances to detect whether the given content is potentially misinformation or not. Various researchers have explored the abilities of LLMs in misinformation and fake news detection. Leite et al. (2023) explore the use of LLMs to extract credibility signals from online content without relying on large annotated datasets. They propose a novel approach that leverages weak supervision to aggregate potentially noisy labels generated by prompting LLMs with a set of 18 credibility signals. Their method outperforms existing classifiers in predicting content veracity on two misinformation datasets, offering insights into the effectiveness of individual credibility signals in detecting misinformation. Similarly, Cao et al. (2024) tackle the challenge of automatically detecting misinformation in scientific reporting, particularly in popular press articles. They introduce a new labeled dataset called SciNews, comprising both human-written and LLM-generated news articles paired with related scientific abstracts. They explore various LLM-based architectures and prompting strategies, and demonstrate the effectiveness of LLMs in detecting human-written misinformation with over 80% accuracy while highlighting the greater challenge in identifying LLM-generated misinformation. Additionally, they showcase the capacity of LLMs to provide explanations for their classifications. Moreover, Wang et al. (2024) introduce MMIDR, a framework aimed at teaching LLMs to provide fluent and high-quality textual explanations for their assessments of multimodal misinformation. The framework employs data augmentation techniques to convert multimodal misinformation into an instruction-following format, utilizes LLMs for rationale extraction, and employs knowledge distillation to transfer the capability of proprietary LLMs to open-source ones. Experimental results demonstrate that MMIDR achieves sufficient detection performance and can generate compelling rationales, though there is room for improvement in distilling crucial information to student models compared to the teacher model. Additionally, Choi and Ferrara (2024) introduce FACT-GPT, a system utilizing LLM to automate the claim-matching stage of fact-checking, addressing the challenge of rampant misinformation harming public health and trust. Their evaluation demonstrates that FACT-GPT achieves better accuracy comparable to larger models in identifying related claims, providing an efficient automated solution, and supporting fact-checkers in combating misinformation. These studies highlight the effectiveness of leveraging LLMs to extract credibility signals, detect misinformation in scientific reporting, provide explanations for classification decisions, and automate claim matching in fact-checking processes. However, further research is needed to enhance the interpretability, robustness, and scalability of LLM-based approaches, paving the way for more effective tools and strategies to combat misinformation and promote digital trust in society.
4.2 Sentiment and opinion mining
LLMs can be useful for sentiment analysis and opinion mining. Using LLMs, researchers can harness advanced natural language processing techniques to analyze individual utterances and extract nuanced sentiment polarity, opinion polarity, and subjective viewpoints expressed within social media posts, online reviews, or user comments. Deng et al. (2023) address the challenge of analyzing market sentiment in social media content by employing semi-supervised learning with a large language model (LLM), with a focus on the Reddit platform due to its diverse topics and content. Their approach involves generating weak financial sentiment labels using an LLM and training a small model for production. They achieve competitive performance with existing supervised models by prompting the LLM with Chain-of-Thought summaries and employing regression loss during training. Similarly, within the application of financial sentiment analysis, Zhang et al. (2023) propose a retrieval-augmented framework for financial sentiment analysis to overcome the limitations faced by traditional NLP models and LLM in this domain. Their approach combines an instruction-tuned LLM module with a retrieval-augmentation module, leveraging external context to improve sentiment prediction accuracy. Through benchmarking against traditional textual models and LLMs such as ChatGPT and LLaMA, their framework demonstrates significant performance gains, achieving up to a 48% improvement in accuracy and F1 score.
Furthermore, Sun et al. (2023) introduce a multi-LLM negotiation framework for sentiment analysis to address the limitations of relying on a singular LLM in making decisions. Their framework involves a reasoning-infused generator providing decisions with rationales and an explanation-deriving discriminator to evaluate the credibility of the generator’s output, iterating until a consensus is reached. Experimental results across various sentiment analysis benchmarks like SST-2 (Socher et al. 2013), Movie Review (Zhang et al. 2015), Twitter (Rosenthal et al. 2019), Yelp-Binary (Zhang et al. 2015), IMDB (Maas et al. 2011), and Amazon-Binary (Zhang et al. 2015) demonstrate the effectiveness of the proposed approach, consistently outperforming the single LLM baseline and even surpassing supervised baselines on certain datasets. Similarly, Xing (2024) introduce a novel design framework, HAD, for financial sentiment analysis (FSA) leveraging heterogeneous LLM agents. They evaluate the framework on five existing datasets and observe consistent improvements in accuracy and F1 score, particularly with substantial discussions. The framework’s effectiveness is evident in enhancing the performance of base models, such as GPT\(-\)3.5, with improvements ranging from +2.24% to +9.46% for accuracy and from +0.35% to +13.72% for F1-score. Notably, HAD significantly bridges the gap between instruction-based learning and fine-tuning, showcasing its potential for advancing LLM-based FSA without the need for extensive labeled data or fine-tuning efforts. Moreover, Negi et al. (2024) propose a hybrid approach for Aspect-Based Sentiment Analysis (ABSA) using transfer learning, which combines the strengths of LLM and traditional syntactic dependencies to generate weakly-supervised annotations. Their approach addresses the challenge of expensive and domain-specific manual annotation by leveraging syntactic dependency structures to complement LLM annotations, improving the performance of aspect term extraction (ATE) and aspect sentiment classification (ASC) tasks. Experimental results demonstrate that the hybrid annotation method achieves balanced performance across domains, outperforming models trained solely on syntactic dependency annotations or LLM-generated annotations, thus enhancing the overall efficacy of ABSA. These research advancements demonstrate that LLMs offer promising avenues for advancing sentiment and opinion-mining tasks across various domains, including market sentiment analysis, financial sentiment analysis, and aspect-based sentiment analysis.
4.3 Hate speech detection
With the proliferation of hate speech on social media platforms and online forums, the premise of employing LLMs lies in their ability to discern subtle linguistic cues and contextual nuances indicative of hateful or inflammatory language. By applying LLMs to analyze utterances within digital discourse, researchers can identify and categorize instances of hate speech, enabling the development of more effective detection algorithms and moderation strategies to mitigate the spread of harmful content and foster safer online environments. Hong et al. (2024) explore the generation of counterspeech to combat online hate speech using LLMs, focusing on constraining the generation process with potential conversation outcomes. They propose four methods, including Prompt with Instructions, Prompt and Select, LLM finetuning, and LLM transformer reinforcement learning (TRL), to incorporate desired outcomes such as low conversation incivility and non-hateful hater reentry. Evaluation results demonstrate effective strategies for generating outcome-constrained counterspeech, with certain methods showing higher rates of valid responses and success in achieving desired conversation outcomes. Additionally, human evaluation within their work highlights variations in the suitability, relevance, and effectiveness of generated counterspeech across different methods, emphasizing the need for careful consideration of both outcome constraints and linguistic attributes in generating effective counterspeech. Similarly, Nirmal et al. (2024) propose a framework leveraging LLMs to extract interpretable features from social media text, enhancing the interpretability of hate speech classifiers. Through comprehensive evaluation on various social media hate speech datasets, they demonstrate the effectiveness of LLM-extracted rationales and the retained performance of hate speech detection models, ensuring both interpretability and interoperability. Moreover, Saha et al. (2024) conduct a comprehensive analysis of four LLMs - GPT-2, DialoGPT (Zhang et al. 2020), ChatGPT, and FlanT5 (Chung et al. 2022) - in zero-shot settings for counterspeech generation, a novel approach in the field. They propose three different prompting strategies and evaluate their impact on counterspeech generation performance across various metrics, including generation quality, engagement prediction, quality measurement, and readability. Their findings indicate improvements in generation quality for two datasets, with GPT-2 and FlanT5 models showing superior counterspeech generation capabilities but also higher toxicity compared to DialoGPT while ChatGPT outperforms other models in generating counterspeech across all metrics. Furthermore, Guo et al. (2024) investigate the efficacy of LLMs, particularly ChatGPT, in detecting hate speech, proposing four different prompting strategies to optimize LLM performance. They compare ChatGPT’s performance with baseline models like BERT (Devlin et al. 2019) and RoBERTa (Liu et al. 2019), finding ChatGPT to outperform or compete with these baselines across various hate speech datasets. Additionally, they analyze the impact of different prompting strategies on hate speech detection performance, revealing that a Chain-of-Thought Reasoning Prompt (CoT) significantly enhances ChatGPT’s understanding and detection of hate speech compared to other prompting approaches. Additionally, they explore ChatGPT’s effectiveness in detecting multilingual hate speech, observing challenges in recognizing hate speech in languages other than English, indicating the need for further research in this area to improve LLM performance across diverse linguistic contexts. These studies collectively show the potential of leveraging LLMs in various facets of combating hate speech online, offering valuable insights and strategies to enhance the detection, and generation of counterspeech, and foster safer digital discourse environments.
4.4 Humor and stance detection
Humor and stance detection are critical tasks in NLP, enabling the analysis of text for both entertainment and understanding the author’s viewpoint. LLMs can aid by finding the cues that contribute to humor or stance in social media content. Inácio and Oliveira (2024) investigate the integration of humor-related numerical features with LLM representations to enhance baseline models for Humor Recognition in Portuguese. They explore three multimodal transformer methods and find that for BERTimbau-large, the inclusion of humor-related features led to a notable 15.5% point increase in F1-Score. However, this improvement was not consistent across all models tested, suggesting the need for further refinement in feature selection, combination methods, or hyperparameter tuning.
As shown in Fig. 4, LLMs have also been used in stance detection tasks. Šuppa et al. (2024) use LLMs in the CASE 2024 Shared Task on Climate Activism Stance and Hate Event Detection (Thapa et al. 2024), utilizing LLMs, particularly GPT-4, with retrieval augmentation and re-ranking for Tweet classification. Their models achieved significant performance improvements over baselines suggesting the effectiveness of LLMs in zero- or few-shot settings for hate speech detection and stance classification. Similarly, Aiyappa et al. (2024) investigate the performance of FlanT5-XXL, an instruction-tuned LLM, for zero-shot stance detection on tweets using the SemEval 2016 Tasks 6A (Mohammad et al. 2016), 6B (Mohammad et al. 2016), and P-Stance (Li et al. 2021) datasets. They explore various prompts, decoding strategies, and instructions to understand the model’s sensitivity and potential biases, finding that FlanT5-XXL (Chung et al. 2022; Wei et al. 2021) can match or outperform state-of-the-art benchmarks without fine-tuning. Their analysis reveals significant differences in performance based on the type of prompt and decoding strategy used, with certain prompts showing better overall performance. Additionally, they observe minimal impact of instruction wording on performance but find that pre-processing techniques such as expanding abbreviations and splitting hashtags can enhance model performance. In a similar direction, Cruickshank and Ng (2023) also explore the use of LLMs for stance detection, aiming to reduce or eliminate the need for manual annotations. They investigate 10 LLMs and 7 prompting schemes, finding that while LLMs are competitive with supervised models, their performance varies across datasets and prompting schemes. Additionally, they observe that larger model sizes do not necessarily lead to better performance, and certain prompting methods like Few-Shot Prompting (FSP) and Zero-Shot CoT tend to perform better overall. However, they note challenges such as incomplete or lengthy responses from LLMs, prompting the need for further investigation into optimizing LLMs for stance detection. Furthermore, Lan et al. (2023) introduce COLA, a three-stage framework for stance detection that leverages LLMs to collaboratively analyze text. By assigning distinct roles to LLMs in each stage, COLA addresses challenges such as multi-aspect knowledge requirements and advanced reasoning needed for stance inference. Their experiments across multiple datasets demonstrate that COLA achieves state-of-the-art performance in stance detection, and its versatility extends to other text classification tasks like aspect-based sentiment analysis and persuasion prediction, showcasing its usability and effectiveness.
4.5 Other utterance-level analysis
Alongside the usage of LLMs in the aforementioned subsections, LLMs can be used in other utterance-level analysis CSS tasks as well. Yakura (2023) investigate the capability of recent LLMs in understanding metaphor and sarcasm, particularly in the context of differentiating Asperger syndrome from other behavioral symptoms. While LLMs show improvement in metaphor comprehension with increased model parameters, they struggle significantly with understanding sarcasm, akin to individuals with Asperger syndrome. The study suggests alternative approaches to enhance LLMs’ comprehension of sarcasm, such as supplementing training data to include emotional intelligence aspects and exploring strategies to balance bias suppression with the inference of human subjective judgments. These insights from developmental psychology could inform future advancements in LLMs’ understanding of nuanced human communication. Similarly, Kim et al. (2024) introduce KoCoSa, a new dataset for Korean context-aware sarcasm detection, consisting of 12.8K daily Korean dialogues. They propose an efficient dataset generation pipeline involving LLMs, automatic filtering, and human annotations. Experimental results demonstrate that their baseline system outperforms strong baselines like GPT\(-\)3.5 in the Korean sarcasm detection task, emphasizing the importance of context in sarcasm detection. Additionally, they observe consistent detection performance across different dialogue topics, suggesting the dataset’s utility for advancing research in Korean sarcasm detection and potentially for exploring sarcasm detection in low-resource languages.
Furthermore, Cageggi et al. (2023) compare three state-of-the-art LLM-based approaches for multilabel emotion classification, including fine-tuned multilingual T5 and two few-shot prompting approaches: plain FLAN and ChatGPT. Their experimental analysis reveals that FLAN T5 performs the worst, while their fine-tuned MT5 performs best on the development dataset and outperforms ChatGPT3.5 on the test set of EVALITA 2023 shared task (Lai et al. 2023). They demonstrate that MT5 and ChatGPT3.5 exhibit complementary performance on different emotions and introduce A2C-best, a system that combines their best performer models for each emotion, which achieves a macro F1 score of 0.02 greater than the competition winner in the out-of-domain benchmark. Similarly, Venkatakrishnan et al. (2023) investigate the efficacy of pre-trained transformer-based LMs, specifically GPT3.5 and RoBERTa, for emotion detection in NLP, focusing on responses to significant events like the murder of Zhina Amini in Iran and an earthquake in Turkey and Syria. They find that RoBERTa, with its fine-tuning abilities and pre-training for emotions, outperforms GPT3.5 in fine-grained emotion classification, detecting a common subset of emotions more accurately. While RoBERTa identifies 14 emotions, GPT3.5 identifies 64, with RoBERTa demonstrating better accuracy in detecting Negative, Anger, and Sadness emotions. Additionally, they explore the effect of explicit and implicit translation on emotion detection, noting challenges with GPT3.5’s consistency in adhering to English output and the need for translation to improve results. These works show that there can be various applications of LLMs in CSS.
5 LLMs in discourse and network-level analysis
LLMs have emerged as powerful tools for analyzing discourse at various levels, from individual utterances to broader social networks. In this section, we explore the applications of LLMs in discourse and network-level analysis within CSS, highlighting their potential to uncover insights into communication dynamics, community formation, and information propagation in online environments.
5.1 Social network analysis
Social network analysis is crucial for understanding the structure, dynamics, and interactions within complex social systems, providing insights into various phenomena such as information flow, influence propagation, and community formation. LLMs can improve our understanding of social networks by leveraging the vast information they have, helping to understand behavior, evolution, and trends in social networks. Gao et al. (2023) present the \(S^3\) system (Social network Simulation System), which leverages LLMs to construct a social network simulation system capable of emulating human-like behavior. The system’s applications span prediction, reasoning, explanation, pattern discovery, theory construction, and policy-making support within social science. They identify areas for improvement in both individual-level and population-level simulations, emphasizing the integration of agent-based and system dynamics-based methods, consideration of a broader range of social phenomena, and enhancements to system architecture for efficiency and policymaker interaction. In general, their work represents a significant advancement in social network simulation, with potential implications across various domains beyond social science. Similarly, Jiang and Ferrara (2023) propose Social-LLM, a novel approach integrating LLMs with social network interactions for user detection tasks. They conducted a comprehensive evaluation of seven real-world social network datasets, comparing their method with state-of-the-art baseline models. The results indicate that Social-LLM outperforms baseline methods in most cases, demonstrating its robustness and effectiveness in modeling social network data for various user detection tasks. Similarly, Zhang et al. (2024) propose an Influencer Dynamics Simulator (IDS) that uses LLMs to aid in the selection of influencers for product marketing. The IDS includes a pre-selection module, LLM-based simulation, and a ranking metric to identify influencers likely to drive product purchases based on influencee feedback. Through ablation studies, they demonstrate the critical role of each component in enhancing the framework’s effectiveness. Additionally, they visualize the development of influencer interests and evaluate performance across different LLMs, emphasizing the framework’s broad applicability. A case study illustrates the IDS’s ability to predict influencee responses and distinguish between top influencers and others, highlighting its effectiveness in simulating and predicting influencer dynamics. Additionally, they highlight the importance of considering edge types and directions, as well as the inclusion of tweet content embeddings for improved performance. Finally, they demonstrate the utility of Social-LLM embeddings to visualize complex social networks. Additionally, Li et al. (2023) conduct an exploratory study on the behavioral characteristics of LLM-driven social bots within the Chirper platform, analyzing their activity logs from April 2023 to June 2023. They identify enhanced individual-level camouflage and toxic behaviors exhibited by these bots, while also releasing the Masquerade-23 dataset to address the data void in this subfield. However, limitations include the lack of information on the establishment of social relationships and the absence of detailed prompt instructions for the LLMs behind social bots, which hinders a comprehensive understanding of their behavior.
5.2 Discourse analysis
Discourse analysis is crucial for understanding the underlying structures and patterns of communication in various contexts, allowing insights into social interactions, power dynamics, and decision-making processes. LLMs can help in discourse analysis to extract nuanced linguistic features from textual data, thus facilitating a deeper understanding of human communication patterns and behaviors. Barić et al. (2024) investigate the challenging task of identifying political actors in discourse networks, crucial for analyzing societal debates, and comparing traditional NLP pipelines with LLMs. Contrary to expectations, their evaluation on a German newspaper corpus reveals that the LLMs perform worse than the traditional pipelines due to difficulties in generating correct canonical forms, despite their proficiency in identifying the right reference. A hybrid model combining LLMs with a classifier for normalization outperforms both initial models, highlighting the underlying issue in LLMs with controlling generated output. This study underscores the complexities involved in actor identification tasks and suggests avenues for improving LLMs’ performance in discourse analysis. Similarly, Fan and Jiang (2023) investigate ChatGPT’s performance in discourse analysis tasks, specifically topic segmentation and discourse parsing, across various dialogue datasets. They find that ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations but struggles in specific-domain conversations, particularly in recognizing hierarchical rhetorical structures. Additionally, they explore the impact of in-context learning and conduct an ablation study on prompt components, shedding light on avenues for improving ChatGPT’s performance in discourse understanding tasks. Despite ChatGPT’s capabilities, challenges remain in understanding complex discourse structures and adhering to specified output formats, highlighting areas for future research and development.
6 LLMs in document-level analysis
LLMs are not just transforming how we interact with language, they are also revolutionizing the way we analyze vast amounts of textual data. By leveraging their ability to understand and process complex information, LLMs have proven to be powerful tools for document-level analysis. This section explores how LLMs improve various tasks within this domain, from extracting specific events and information to identifying underlying themes and trends.
6.1 Event and information extraction
LLMs can significantly enhance event and information extraction by accurately identifying and contextualizing specific events, entities, and relationships from vast amounts of unstructured text data, streamlining the process of converting text into structured, actionable insights. Gao et al. (2023) explored the feasibility of using ChatGPT for event extraction without any task-specific fine-tuning or data. Through experiments on the ACE 2005 dataset, they find that while ChatGPT can achieve reasonable performance in simple scenarios, it lags significantly with task-specific models like EEQA in complex and long-tail cases involving multiple events or infrequent event types. They also find that ChatGPT’s performance is highly sensitive to the style of the prompt provided, indicating potential usability challenges. On the other hand, Chen et al. (2024) propose a novel approach to leverage LLMs for event extraction. They evaluate the direct use of LLMs for event detection and event argument extraction, finding a notable performance gap compared to fine-tuned models. To bridge this gap, they employ LLMs as expert annotators to generate labeled data aligned with existing datasets, and then fine-tune specialized event extraction models using this augmented data. Experimental results demonstrate that this approach can improve the performance of fine-tuned models by mitigating data scarcity and imbalance. Similarly, Huang et al. (2023) propose a novel three-stage framework, ThreeEERE, which combines an enhanced automatic chain of thought prompting (Auto-CoT) technique with LLMs to improve precision in tasks such as event extraction (EE), event temporal relation extraction (ETRE), and event-causal relation extraction (ECRE). This framework involves constructing category-specific examples, federating local knowledge for extracting event relationships, and selecting the optimal answers to maximize event and relation extraction accuracy. Their experimental results on various extraction tasks indicate that this approach not only competes with but in some cases surpasses, the performance of supervised models, marking a significant advancement in the field. Additionally, Bakker et al. (2024) propose a novel pipeline designed to automatically extract processing timelines from decision letters of Dutch FOIA (Freedom of Information Act) requests, aiming to address delays by identifying bottlenecks in the information request process. Their method demonstrates high accuracy in extracting dates (0.94 accuracy), event phrases (mean ROUGE-L F1 score of 0.80), and classifying events (macro F1 score of 0.79) from the decision letters. Zhong et al. (2023) delve into exploring the LoT capabilities within LLMs by investigating their performance in the Oogiri game, a context requiring creative and associative thinking. To facilitate this study, they introduce a multimodal and multilingual Oogiri-GO dataset comprising over 130,000 samples, incorporating both images and text prompts. Their research aims to enhance LLMs’ LoT abilities through a creative Leap-of-Thought (CLoT) paradigm, which involves formulating LoT-oriented instruction tuning data and implementing an explorative self-refinement process to encourage the generation of more creative responses.
6.2 Topic modeling
LLMs can improve topic modeling by leveraging their advanced NLP capabilities to identify and categorize complex themes within large data sets with more accuracy and efficiency. Mu et al. (2024) investigate the potential of using LLMs as an alternative approach to traditional topic modeling techniques. Through a series of experiments with different prompting strategies and manual constraints, they demonstrate that LLMs can effectively generate relevant and interpretable topics from text corpora. They propose evaluation metrics to assess the quality of LLM-generated topics and showcase the capability of LLMs in analyzing dynamic datasets, such as tracking the temporal evolution of topics related to COVID-19 vaccine hesitancy. The authors argue that LLMs offer a viable and adaptable method for topic extraction and summarization, providing a fresh perspective compared to classic topic modeling approaches. Chang et al. (2024) in a different direction, as opposed to directly mining topics from short texts, propose a novel ‘Topic Refinement’ approach that leverages LLMs to improve the semantic coherence of initially extracted topics. Their mechanism iteratively constructs prompts requesting LLMs to identify and replace semantically incoherent words within each topic. Extensive experiments across multiple datasets and base models demonstrate the effectiveness of this LLM-based refinement approach in enhancing topic coherence metrics. Thus, the capability of LLMs to process and analyze large volumes of textual data through enhanced topic modeling techniques introduces a new dimension of understanding the granularity and depth of social science research.
7 LLMs in social media data/content ceneration
Social media platforms serve as rich sources of data and content, reflecting diverse human interactions, opinions, and trends. However, non-human-generated content like the ones by LLMs has revolutionized content generation, offering sophisticated tools for text generation, dialogue systems, and multimedia content creation. In this section, we explore the capabilities and implications of LLMs in social media data/content generation.
7.1 Textual data/content generation
LLMs have significantly advanced automated textual data/content generation on social media platforms. Using vast amounts of training data and sophisticated language modeling techniques, LLMs can generate text that closely mimics human writing style and behavior.
-
Post generation: LLMs can generate social media posts on various topics, ranging from personal updates to news commentary. These generated posts could be indistinguishable from those written by users, allowing for automated content creation at scale. An example of post generation in social media (Facebook) can be seen in Fig. 5.
-
Comment generation: LLMs can provide comments and replies for existing posts, facilitating engagement and interaction within social media communities. Whether providing feedback, expressing opinions, or engaging in discussions, LLM-generated comments can contribute to the dynamics of online conversations.
-
Question answering: LLMs excel at providing informative and contextually relevant responses to user questions posted on social media platforms. By understanding the nuances of natural language queries, LLMs can retrieve and summarize information from diverse sources, enhancing the utility of social media as a knowledge-sharing platform.
-
Content curation: LLMs can curate personalized content feeds for social media users based on their preferences and interests. By analyzing user interactions and content consumption patterns, LLMs can recommend articles, videos, and other media tailored to users’ tastes, fostering engagement and retention.
-
Hashtag and caption generation: LLMs can generate hashtags and captions for social media posts, enhancing discoverability and accessibility. By analyzing the content and context of posts, LLMs can suggest relevant hashtags and captions that increase the visibility and impact the content. An example of hashtag generation for a LinkedIn post can be seen in Fig. 5.
-
Brand messaging and advertising: LLMs can assist in brand messaging and advertising for social media campaigns. By understanding the demographics of the target audience and market trends, LLMs can generate compelling ad copy that resonates with consumers and drives engagement and conversions.
7.2 Image-based and multimodal data/content generation
In addition to textual content, LLMs, particularly those with vision abilities (LVLMs) and other modalities understanding abilities, have also advanced multimodal content generation on social media platforms. By integrating textual, visual, and auditory modalities, LLMs can produce rich and engaging multimedia content, enhancing user experiences and interactions. Key applications of multimodal data/content generation by LLMs include:
-
Image captioning: LLMs can generate descriptive captions for images uploaded to social media platforms. By analyzing the visual content of images and their contextual relevance, LLMs can produce captions that provide additional context and meaning, improving accessibility and engagement. An example of LLM (ChatGPT-DALLE) providing a caption to an image can be found in Fig. 6.
-
Image generation: Various LVLMs have the ability to generate images for social media. For example, ChatGPT can generate images using the DALLEFootnote 3 generation model. Examples of such generated images for social media can be seen in Fig. 6.
-
Video summarization: LLMs excel in summarizing videos shared on social media platforms. By identifying key moments, themes, and highlights within videos, LLMs can generate concise summaries that facilitate content consumption and sharing among users.
-
Emoji and sticker generation: LLMs can generate emoji sequences and stickers to express emotions and sentiments in social media conversations. By analyzing textual inputs and user interactions, LLMs can suggest emoji and stickers that complement the tone and context of discussions, enhancing expressiveness and engagement.
-
Memes generation: LLMs and LVLMs can generate memes that can enhance social media engagement. For example, Wang and Lee (2024) propose an innovative meme generator that combines LLMs and visual language models (VLMs) to create memes supporting specific social movements, such as Climate Action and Gender Equality. It features an end-to-end pipeline for generating multimodal memes from user prompts and includes a safety mechanism to prevent the creation of divisive content. Evaluation of MemeCraft demonstrates its effectiveness in producing humorous yet advocacy-supportive memes, showcasing the potential of generative AI in promoting social good.
-
Audio transcription and description: LLMs can transcribe audio content shared on social media platforms and provide descriptive summaries or captions (Wu et al. 2023; Lyu et al. 2023). By leveraging speech recognition and NLU capabilities, LLMs can make audio content accessible to users with hearing impairments and facilitate content consumption in unprecedented contexts (Ma et al. 2024).
-
Multimodal fusion and synthesis: LLMs can integrate information from multiple modalities to generate cohesive and coherent multimodal content (Rotstein et al. 2024). By combining textual, visual, and auditory cues, LLMs can create immersive experiences that engage users across different sensory channels, fostering deeper levels of interaction and engagement.
The generation of multimodal data/content by LLMs introduces unique challenges and considerations, including ensuring alignment and coherence across different modalities, addressing potential biases and stereotypes in visual and auditory content, and maintaining accessibility for users with disabilities. It is essential to develop robust evaluation metrics and guidelines to assess the quality and effectiveness of multimodal content generated by LLMs and to promote responsible and inclusive practices in their deployment.
8 Considerations in LLM adoption
Adopting LLM in domains such as CSS requires careful consideration of many important factors. This section examines the ethical, technical, legal, social, cultural, and operational considerations that need to be taken with the adoption of LLMs. Figure 7 summarizes the considerations that need to be made before applying LLMs in CSS.
8.1 Ethical considerations
There are various considerations that stem from the intrinsic nature of LLMs, the data they are trained on, and the potential impacts of their application.
-
Bias and fairness: LLMs can inherit and amplify biases present in their training data. This can manifest in various forms, including gender, racial, and socio-economic biases. In the domain of CSS, this could skew research findings, reinforce stereotypes, and potentially lead to discriminatory outcomes. For example, if an LLM is used to analyze social media posts or historical texts, it might perpetuate the biases contained in those sources, leading to biased interpretations of social phenomena.
-
Representation and inclusion: The data used to train LLMs might not fairly represent all groups within society, leading to issues with inclusion. This can result in models that perform well for majority groups but poorly for underrepresented groups, further marginalizing these populations. In CSS, this could impact the accuracy and generalizability of research findings, potentially overlooking critical characteristics of marginalized communities.
-
Transparency and accountability: The "black box" nature of LLMs makes it challenging to understand their decision-making process. This lack of transparency can be problematic in scientific settings, where understanding the causality and reasoning behind findings is crucial. Additionally, it raises questions about accountability, especially in an era when LLM-generated insights may inform policy-making or other significant decisions.
-
Political and eonomic leaning: There is a risk that LLMs might exhibit political and economic leanings towards one ideology or another, whether unintentionally based on their training data or intentionally by those who develop and deploy them. Specifically for CSS problems, this could skew research in ways that favor certain viewpoints, potentially influencing public opinion and policy decisions.
-
Ethical use and misuse: The application of LLMs in CSS research opens up ethical dilemmas related to consent, privacy, and the potential misuse of the technology. For instance, using LLMs to analyze user data on social media raises questions about the consent of the individuals whose data is being analyzed. Moreover, there is the risk that the insights generated by LLMs could be used for unethical purposes, such as manipulating public opinion or infringing on privacy.
-
Dependence and de-skilling: Relying on LLMs for computational analysis in social science might lead to a dependence on these tools, potentially de-skilling researchers over time. This reliance could also make the field more vulnerable to errors or biases embedded in the models, impacting the quality and integrity of research findings.
-
Societal impact and long-term consequences: Since LLMs are a recent development, the broader societal impacts of applying LLMs in computational social science, including long-term consequences, are still not fully understood. These technologies could reshape social research methodologies and alter public discourse in unforeseen ways.
Keeping these ethical considerations in mind, the advanced capabilities of LLMs can be leveraged in CSS. Ensuring fairness, transparency, and accountability while mitigating bias and safeguarding against misuse is very important to harness the benefits of LLMs without compromising ethical standards or social well-being.
8.2 Technical considerations
The adoption of LLMs in CSS requires the recognition and implementation of various technical considerations.
-
Data quality and availability: The effectiveness of LLMs in CSS research is highly dependent on the quality and availability of data. High-quality, diverse, and representative datasets are essential for training models that can understand and analyze complex social phenomena accurately. Researchers need to consider the sources and biases of their data, as well as privacy and ethical issues related to data collection and usage.
-
Model selection and customization: There is a wide array of LLMs available, each with its own strengths, weaknesses, and specialties. Choosing the right model for a specific research question involves considering factors such as the model’s size, training data, and the type of tasks it excels at (e.g., text generation, sentiment analysis, etc.). Although most LLMs can perform multiple tasks, some are trained to directly excel at a few specialized tasks. In some cases, customizing or fine-tuning a model with domain-specific data can significantly improve its performance and relevance to particular research problems.
-
Interpretability and explainability: Understanding how LLMs make their decisions is crucial for validating their use in research. Techniques for improving model interpretability and explainability can help researchers assess the reliability of the insights generated by LLMs and identify potential biases or errors. This is particularly important in social science research, where findings often inform policies and interventions.
-
Generalizability and reproducibility: Ensuring that findings from LLM-based research are generalizable and reproducible is a key technical challenge. Researchers must carefully document their methodologies, including model versions, training data, parameters, and experimental settings to enable other researchers to validate and build upon their work.
-
Monitoring and evaluation: Continuous monitoring and evaluation of LLM applications are necessary to assess their performance and refine methodologies over time. This includes developing metrics and benchmarks relevant to the specific goals of the research and adapting strategies based on ongoing findings and future technological advancements.
8.3 Legal and regulatory compliance
The application of LLMs in CSS raises significant legal and regulatory compliance issues. These concerns are driven by the expanding capabilities of LLMs, their impact on privacy, intellectual property rights, and the potential for misuse. Ensuring legal and regulatory compliance involves understanding various local and international laws, which can vary widely based on the jurisdiction. Several key legal and regulatory considerations need to be taken.
-
Data protection and privacy laws: Many countries have enacted data protection and privacy laws that impact the use of LLMs in social science research. Regulations such as the General Data Protection Regulation (GDPR) in the European Union and the California Consumer Privacy Act (CCPA) in the United States set strict guidelines for the collection, storage, and processing of personal data. Researchers must ensure that their use of LLMs complies with these laws, which may include obtaining consent from individuals whose data is being analyzed, ensuring data anonymization, and implementing robust data security measures.
-
Intellectual property (IP) rights: The training of LLMs often involves the use of large datasets that may contain copyrighted material, such as books, articles, and online content. Researchers must navigate intellectual property laws to ensure they do not infringe on the copyright of the data used in training models. This may involve securing permissions, using data under fair use provisions, or relying on publicly available or specially licensed datasets.
-
Algorithmic accountability and transparency: Some jurisdictions have started to introduce regulations that address algorithmic accountability and transparency. These laws aim to mitigate the risks associated with automated decision-making systems, which include LLMs. Compliance might involve explaining the algorithms’ decision-making processes, demonstrating fairness and non-discrimination, and providing individuals with explanations of algorithmic decisions that affect them.
-
Research ethics approval: Many CSS projects require approval from institutional review boards (IRBs) or ethics committees, especially when they involve human subjects. These bodies assess the ethical implications of research projects, focusing on privacy, consent, and potential harm to participants. While not a strict legal requirement, obtaining ethics approval is a critical step in ensuring that the use of LLMs adheres to acceptable ethical standards.
-
Accessibility and non-discrimination: Legal frameworks such as the Americans with Disabilities Act (ADA) in the United States and similar laws in other countries require that digital services, including those provided by LLMs, be accessible to individuals with disabilities. Furthermore, ensuring that LLM applications do not result in discriminatory outcomes is not only a matter of ethical responsibility but also of legal compliance, as laws in many jurisdictions prohibit discrimination based on race, gender, age, and other protected characteristics.
-
Record-keeping and reporting requirements: Depending on the jurisdiction and the specific application of LLMs, researchers might be subject to record-keeping and reporting requirements. This could include documenting data sources, processing activities, model training procedures, and measures taken to protect privacy and ensure compliance with relevant laws.
Addressing and understanding these regulatory considerations requires a multidisciplinary approach to combine legal expertise, awareness of the latest regulatory developments, and a commitment to ethical research practices. By addressing these issues proactively, researchers can harness the vast capabilities offered by LLMs in CSS applications while ensuring compliance with legal and regulatory requirements.
8.4 Social and cultural implications
The application of LLMs in CSS extends beyond technical and ethical considerations, with influence on profound social and cultural dynamics. These implications encompass the way LLMs influence the alignment of technology with human values and ethics. Therefore, it becomes imperative to consider several pivotal aspects.
-
Shaping public discourse and opinion: LLMs, especially when used in analyzing social media data or generating content, can significantly influence public discourse and opinion. The way these models interpret and amplify certain narratives over others can shape societal norms and values, potentially leading to shifts in public opinion, cultural norms, and even political landscapes.
-
Algorithmic determinism and human agency: There’s a risk that reliance on LLMs in social science research and decision-making could lead to algorithmic determinism, where computational models are seen as definitive authorities on social phenomena. This can undermine human agency, reducing complex social, cultural, and ethical decisions to outputs determined by algorithms.
-
Digital divide and access to technology: The benefits and insights offered by LLMs in social science research might not be evenly distributed, potentially exacerbating the digital divide. As LLMs pose extreme computational expenses, individuals with access to advanced computational resources and the skills to use LLMs can gain deeper insights into social phenomena, while those without such access may be left behind, reinforcing existing inequalities.
-
LLM alignment with human values: Ensuring that LLMs align with human values and ethics is a critical challenge. This involves developing models that not only understand and generate human-like text, but also reflect ethical principles, cultural sensitivities, and a commitment to fairness and justice. Achieving this alignment requires dialogue between social scientists, policymakers, ethicists, and the broader public to define the values LLMs should uphold.
-
Cultural homogenization vs. diversity: The widespread use of LLMs, particularly those trained on predominantly English-language internet data, could contribute to cultural homogenization, where dominant ones overshadow diverse cultural expressions and languages. Promoting cultural diversity by using LLMs in CSS involves conscientiously including diverse data sources and perspectives in model training and application.
-
Ethical considerations in representation and intervention: The application of LLMs in social science research can also influence social and cultural dynamics. This raises ethical considerations regarding representation, i.e. who gets to speak or is spoken for, and intervention i.e. how insights generated by LLMs influence social policies, cultural practices, and interventions.
In addressing these social and cultural implications, it is crucial for researchers and practitioners to engage with communities, adopt inclusive practices, enforce fairness, and work towards the responsible development and application of LLMs in CSS. This includes striving for diversity in training data, involving stakeholders in the development process, and being transparent and spreading awareness of the limitations and potential biases of these models. Another direction of ensuring cultural fairness is probing LLMs for their common-sense abilities e.g. geographical common-sense (Yin et al. 2022).
8.5 Operational considerations
The application of LLMs in CSS involves various operational considerations to ensure effective and efficient use of these technologies. These considerations encompass the practical aspects of integrating LLMs into social science research workflows, from the initial planning stages through to execution and analysis. There are several key operational aspects that must be considered.
-
Computational resources and scalability: The application of LLMs can be resource-intensive, requiring significant computational power, especially for models with billions of parameters and extensive web-scale datasets. Researchers must consider the availability of computational resources, the scalability of their methods, and the cost implications. Efficient use of resources, such as through cloud computing services or specialized hardware, can help mitigate these challenges.
-
Data management: Successful application of LLMs in social science research relies on comprehensive data management strategies. This includes collecting, storing, and processing large datasets, ensuring data quality, and managing data privacy and security. Effective data management also involves addressing challenges related to data volume, variety, and velocity, ensuring that the datasets used are representative and free of biases that could skew research outcomes.
-
Integration with existing tools and workflows: Integrating LLMs into existing social science research workflows can be challenging. Researchers need to consider how these models fit into their analytical frameworks, including the integration with statistical or deep learning analysis tools, visualization software, and other computational methods. This might require developing custom scripts to bridge the gap between LLMs and other research tools.
-
Ongoing monitoring and maintenance: Once deployed, LLM-based projects require ongoing monitoring to ensure that models continue to perform as expected and that data and algorithms remain relevant and accurate. This includes updating models with new data, adjusting to changes in computational infrastructure, and responding to novel and emerging ethical, legal, and social considerations.
By carefully addressing these operational considerations, researchers can effectively integrate LLMs into CSS projects, enhancing their ability to analyze complex social phenomena and generate meaningful insights. Successful operational planning and execution also lay the foundation for addressing broader ethical, social, and technical challenges associated with the use of advanced AI technologies in social science research.
9 Conclusion
The integration of LLMs into CSS represents a significant stride in the field of CSS by introducing groundbreaking advancements in data analysis, content generation, and discourse understanding. This paper has explored the multifaceted role of LLMs in CSS, demonstrating their potential to transform traditional research methodologies and contribute to a deeper understanding of complex social phenomena. Through applications ranging from sentiment analysis and misinformation detection to social network analysis and document-level insights, LLMs have shown their ability to analyze vast amounts of data with a level of nuance and depth unattainable by previous methods. Further, this paper has highlighted the innovative use of LLMs in generating different forms of social media content, from textual posts and comments to multimodal content, enhancing engagement and interaction within online communities. These capabilities underscore the transformative potential of LLMs in not only analyzing but also contributing to social communication, offering tools for more dynamic and responsive communication strategies. However, the adoption of LLMs in CSS is not without its challenges. Ethical considerations around bias, fairness, representation, and the potential for misuse underscore the need for responsible use of these technologies. Operational considerations, including computational resource requirements, data management, and integration of LLMs into existing workflows, present practical hurdles for practitioners that must be navigated. Despite these challenges, the future of LLMs in CSS is promising. By addressing these issues and leveraging the capabilities of LLMs responsibly, researchers can use these advanced technologies to gain deeper insights into social dynamics, influence public discourse, and contribute to the development of more informed and equitable policies and interventions. In the future, the continued evolution of LLMs and their integration into CSS will likely drive further innovation, fostering a richer understanding of the digital and social landscapes that shape our world. The journey of integrating LLMs into CSS is in its infancy, and its potential to reshape the field is largely untapped and undermined. Through collaborative efforts across various disciplines, the responsible and innovative use of LLMs can significantly advance social science research and play a role in addressing the definitive social challenges of our time.
Data availability
No datasets were generated or analysed during the current study.
References
Aiyappa R, Senthilmani S, An J, Kwak H, Ahn Y-Y (2024) Benchmarking zero-shot stance detection with flant5-xxl: Insights from training data, prompting, and decoding strategies into its near-sota performance. arXiv preprint arXiv:2403.00236
Alam F, Sajjad H, Imran M, Ofli F (2021) CrisisBench: benchmarking crisis-related social media datasets for humanitarian information processing. In: Proceedings of the international AAAI conference on web and social media, vol 15, pp 923–932
Alayrac J-B, Donahue J, Luc P, Miech A, Barr I, Hasson Y, Lenc K, Mensch A, Millican K, Reynolds M et al (2022) Flamingo: a visual language model for few-shot learning. Adv Neural Inf Process Syst 35:23716–23736
Alsaedi N, Burnap P (2015) Feature extraction and analysis for identifying disruptive events from social media. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015. pp 1495–1502
Annamoradnejad I, Zoghi G (2020) Colbert: Using BERT sentence embedding for humor detection. arXiv preprint arXiv:2004.12765 1(3)
Augenstein I, Rocktäschel T, Vlachos A, Bontcheva K (2016) Stance detection with bidirectional conditional encoding. In: Su J, Duh K, Carreras X (eds.) Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp 876–885. https://doi.org/10.18653/v1/D16-1084 . https://aclanthology.org/D16-1084
Bakker F, Van Heusden R, Marx M (2024) Timeline extraction from decision letters using ChatGPT. In: Hürriyetoğlu A, Tanev H, Thapa S, Uludoğan G (eds.) Proceedings of the 7th workshop on challenges and applications of automated extraction of socio-political events from text (CASE 2024). Association for Computational Linguistics, St. Julians, Malta, pp. 24–31. https://aclanthology.org/2024.case-1.3
Balouchzahi F, Sidorov G, Gelbukh A (2023) PolyHope: two-level hope speech detection from tweets. Expert Syst Appl 225:120078
Barić A, Papay S, Padó S (2024) Actor identification in discourse: a challenge for LLMS? arXiv preprint arXiv:2402.00620
Batra PK, Saxena A, Goel C et al (2020) Election result prediction using twitter sentiments analysis. In: 2020 sixth international conference on parallel, distributed and grid computing (PDGC), IEEE, pp 182–185
Baumann F, Lorenz-Spreen P, Sokolov IM, Starnini M (2020) Modeling echo chambers and polarization dynamics in social networks. Phys Rev Lett 124(4):048301
Bellamkonda S, Lohakare M, Patel S (2022) A dataset for detecting humor in Telugu social media text. In: Chakravarthi BR, Priyadharshini R, Madasamy AK, Krishnamurthy, P, Sherly E, Mahesan S (eds.) Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics, Dublin, Ireland, pp 9–14. https://doi.org/10.18653/v1/2022.dravidianlangtech-1.2 . https://aclanthology.org/2022.dravidianlangtech-1.2
Benson E, Haghighi A, Barzilay R (2011) Event discovery in social media feeds. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 389–398
Bhandari A, Shah SB, Thapa S, Naseem U, Nasim M (2023) CrisisHateMM: Multimodal analysis of directed and undirected hate speech in text-embedded images from Russia-Ukraine conflict. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1993–2002
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Bruning PF, Alge BJ, Lin H-C (2020) Social networks and social media: understanding and managing influence vulnerability in a connected society. Bus Horiz 63(6):749–761
Bui M-Q, Do D-T, Le N-K, Nguyen D-H, Nguyen K-V-H, Anh TPN, Le Nguyen M (2024) Data augmentation and large language model for legal case retrieval and entailment. Rev Socionetw Strateg 18:49–74
Cageggi G, Di Rosa E, Uboldi A (2023) App2check at emit: large language models for multilabel emotion classification
Cai Y, Cai H, Wan X (2019) Multi-modal sarcasm detection in Twitter with hierarchical fusion model. In: Korhonen A, Traum D, Màrquez L (eds.) Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy. pp 2506–2515. https://doi.org/10.18653/v1/P19-1239 . https://aclanthology.org/P19-1239
Cao Y, Nair AM, Eyimife E, Soofi NJ, Subbalakshmi K, Wullert II, JR, Basu C, Shallcross D (2024) Can large language models detect misinformation in scientific news reporting? arXiv preprint arXiv:2402.14268
Castro S, Hazarika D, Pérez-Rosas V, Zimmermann R, Mihalcea R, Poria S (2019) Towards multimodal sarcasm detection (an _obviously_ perfect paper). arXiv preprint arXiv:1906.01815
Chakravarthi BR, Bharathi B, Mccrae JP, Zarrouk M, Bali K, Buitelaar P (2022) Proceedings of the second workshop on language technology for equality, diversity and inclusion. In: Proceedings of the second workshop on language technology for equality, diversity and inclusion
Chakravarthi BR (2022) Hope speech detection in YouTube comments. Soc Netw Anal Min 12(1):75
Chakravarthi BR (2023) Detection of homophobia and transphobia in YouTube comments. Int J Data Sci Anal 18:49–68
Chang S, Wang R, Ren P, Huang H (2024) Enhanced short text modeling: leveraging large language models for topic refinement. arXiv preprint arXiv:2403.17706
Chaudhari A, Parseja A, Patyal A (2020) CNN based hate-o-meter: a hate speech detecting tool. In: 2020 third international conference on smart systems and inventive technology (ICSSIT). IEEE, pp 940–944
Chen R, Qin C, Jiang W, Choi D (2024) Is a large language model a good annotator for event extraction? In: Proceedings of the AAAI conference on artificial intelligence, vol 38, pp 17772–17780
Chiruzzo L, Castro S, Etcheverry M, Garat D, Prada JJ, Rosá A (2019) Overview of haha at iberlef 2019: humor analysis based on human annotation. In: IberLEF@ SEPLN, pp 132–144
Chiruzzo L, Castro S, Rosá A (2020) HAHA 2019 dataset: a corpus for humor analysis in Spanish. In: Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, Goggi S, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S (eds.) Proceedings of the twelfth language resources and evaluation conference. European Language Resources Association, Marseille, France. pp 5106–5112. https://aclanthology.org/2020.lrec-1.628
Choi EC, Ferrara E (2024) Fact-GPT: fact-checking augmentation via claim matching with LLMS. arXiv preprint arXiv:2402.05904
Chowdhury AG, Chadha A (2023) Generative data augmentation using LLMs improves distributional robustness in question answering. arXiv preprint arXiv:2309.06358
Chung HW, Hou L, Longpre S, Zoph B, Tay Y, Fedus W, Li Y, Wang X, Dehghani M, Brahma S et al (2022) Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416
Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the association for computational linguistics. association for computational linguistics
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Cruickshank IJ, Ng LHX (2023) Use of large language models for stance classification. arXiv preprint arXiv:2309.13734
Darwish K, Stefanov P, Aupetit M, Nakov P (2020) Unsupervised user stance detection on twitter. In: Proceedings of the international AAAI conference on web and social media, vol 14, pp 141–152
Deng X, Bashlovkina V, Han F, Baumgartner S, Bendersky M (2023) LLMS to the moon? reddit market sentiment analysis with large language models. In: Companion proceedings of the ACM web conference 2023, pp 1014–1019
Devarajan GG, Nagarajan SM, Amanullah SI, Mary SSA, Bashir AK (2023) AI-assisted deep NLP-based approach for prediction of fake news from social media users. IEEE Trans Comput Soc Syst
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds.) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423 . https://aclanthology.org/N19-1423
Dey K, Shrivastava R, Kaushik S (2018) Topical stance detection for twitter: Aa two-phase LSTM model using attention. In: Advances in information retrieval: 40th European conference on IR research, ECIR 2018, Grenoble, France, March 26-29, 2018, proceedings 40. Springer, pp 529–536
Dodds PS, Clark EM, Desu S, Frank MR, Reagan AJ, Williams JR, Mitchell L, Harris KD, Kloumann IM, Bagrow JP et al (2015) Human language reveals a universal positivity bias. Proc Natl Acad Sci 112(8):2389–2394
Egami N, Hinck M, Stewart B, Wei H (2024) Using imperfect surrogates for downstream inference: design-based supervised learning for social science applications of large language models. Adv Neural Inf Process Syst 36
Fan Y, Jiang F (2023) Uncovering the potential of ChatGPT for discourse analysis in dialogue: an empirical study. arXiv preprint arXiv:2305.08391
Frank MR, Mitchell L, Dodds PS, Danforth CM (2013) Happiness and the patterns of life: a study of geolocated tweets. Sci Rep 3(1):2625
Gao C, Lan X, Lu Z, Mao J, Piao J, Wang H, Jin D, Li Y (2023) S\(^3\): Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984
Gao J, Zhao H, Yu C, Xu R (2023) Exploring the feasibility of ChatGPT for event extraction. arXiv preprint arXiv:2303.03836
Goebel R, Kano Y, Kim M-Y, Rabelo J, Satoh K, Yoshioka M (2023) Summary of the competition on legal information, extraction/entailment (Coliee) 2023. In: Proceedings of the nineteenth international conference on artificial intelligence and law, pp 472–480
Guo Z, Schlichtkrull M, Vlachos A (2022) A survey on automated fact-checking. Trans Assoc Comput Linguis 10:178–206
Guo K, Hu A, Mu J, Shi Z, Zhao Z, Vishwamitra N, Hu H (2024) An investigation of large language models for real-world hate speech detection. arXiv preprint arXiv:2401.03346
Hasan MK, Rahman W, Zadeh AB, Zhong J, Tanveer MI, Morency L-P, Hoque ME (2019) Ur-funny: a multimodal language dataset for understanding humor. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 2046–2056
Haselmayer M, Jenny M (2017) Sentiment analysis of political communication: Combining a dictionary approach with crowdcoding. Qual Quant 51:2623–2646
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hong L, Luo P, Blanco E, Song X (2024) Outcome-constrained large language models for countering hate speech. arXiv preprint arXiv:2403.17146
Hossain T, Logan IV RL, Ugarte A, Matsubara Y, Young S, Singh S (2020) COVIDLies: detecting COVID-19 misinformation on social media. In: Verspoor K, Cohen KB, Conway M, Bruijn B, Dredze M, Mihalcea R, Wallace B (eds.) Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2020.nlpcovid19-2.11 . https://aclanthology.org/2020.nlpcovid19-2.11
Huang F, Huang Q, Zhao Y, Qi Z, Wang B, Huang Y, Li S (2023) A three-stage framework for event-event relation extraction with large language model. In: International conference on neural information processing. Springer, pp 434–446
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Hung M, Lauren E, Hon ES, Birmingham WC, Xu J, Su S, Hon SD, Park J, Dang P, Lipsky MS (2020) Social network analysis of COVID-19 sentiments: application of artificial intelligence. J Med Internet Res 22(8):22590
Inácio M, Oliveira HG (2024) Exploring multimodal models for humor recognition in Portuguese. In: Gamallo P, Claro D, Teixeira A, Real L, Garcia M, Oliveira HG, Amaro R (eds.) Proceedings of the 16th international conference on computational processing of Portuguese. Association for Computational Lingustics, Santiago de Compostela, Galicia/Spain, pp 568–574. https://aclanthology.org/2024.propor-1.62
Islam MM, Uddin MA, Islam L, Akter A, Sharmin S, Acharjee UK (2020) Cyberbullying detection on social networks using machine learning approaches. In: 2020 IEEE Asia-Pacific conference on computer science and data engineering (CSDE). IEEE, pp 1–6
Jazayeri SH, Poursaeed A, Najafabadi MO (2023) Social network analysis of green space management actors in Tehran. Int J Geoheritage Parks 11(2):276–285
Jiang J, Ferrara E (2023) Social-LLM: Modeling user behavior at scale using language models and social network data. arXiv preprint arXiv:2401.00893
Kaddour J, Harris J, Mozes M, Bradley H, Raileanu R, McHardy R (2023) Challenges and applications of large language models. arXiv preprint arXiv:2307.10169
Kalyan KS (2023) A survey of GPT-3 family large language models including ChatGPT and GPT-4. Nat Lang Process J 6:100048
Karim MR, Dey SK, Islam T, Shajalal M, Chakravarthi BR (2022) Multimodal hate speech detection from Bengali memes and texts. In: International conference on speech and language technologies for low-resource languages. Springer, pp 293–308
Karisani P, Agichtein E (2018) Did you really just have a heart attack? Towards robust detection of personal health mentions in social media. In: Proceedings of the 2018 world wide web conference, pp 137–146
Kasneci E, Seßler K, Küchemann S, Bannert M, Dementieva D, Fischer F, Gasser U, Groh G, Günnemann S, Hüllermeier E et al (2023) ChatGPT for good? On opportunities and challenges of large language models for education. Learn Individ Differ 103:102274
Kaya M, Fidan G, Toroslu IH (2012) Sentiment analysis of turkish political news. In: 2012 IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology, vol 1, IEEE, pp 174–180
Kim Y, Suh H, Kim M, Won D, Lee H (2024) KoCoSa: Korean context-aware sarcasm detection dataset. arXiv preprint arXiv:2402.14428
Kizgin H, Dey BL, Dwivedi YK, Hughes L, Jamal A, Jones P, Kronemann B, Laroche M, Peñaloza L, Richard M-O et al (2020) The impact of social media on consumer acculturation: current challenges, opportunities, and an agenda for research and practice. Int J Inf Manage 51:102026
Kreps S, McCain RM, Brundage M (2022) All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation. J Exp Political Sci 9(1):104–117
Lai M, Menini S, Polignano M, Russo V, Sprugnoli R, Venturi G et al (2023) Evalita 2023: Overview of the 8th evaluation campaign of natural language processing and speech tools for Italian. In: Proceedings of the eighth evaluation campaign of natural language processing and speech tools for Italian. Final Workshop (EVALITA 2023), CEUR. Org, Parma, Italy
Lan X, Gao C, Jin D, Li Y (2023) Stance detection with collaborative role-infused LLM-based agents. arXiv preprint arXiv:2310.10467
Lee RK-W, Cao R, Fan Z, Jiang J, Chong W-H (2021) Disentangling hate in online memes. In: Proceedings of the 29th ACM international conference on multimedia, pp 5138–5147
Lee C, Nick B, Brandes U, Cunningham P (2013) Link prediction with social vector clocks. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 784–792
Leite JA, Razuvayevskaya O, Bontcheva K, Scarton C (2023) Detecting misinformation with LLM-predicted credibility signals and weak supervision. arXiv preprint arXiv:2309.07601
Li W, Xu Y, Wang G (2019) Stance detection of microblog text based on two-channel CNN-GRU fusion network. IEEE Access 7:145944–145952
Li I, Li Y, Li T, Alvarez-Napagao S, Garcia-Gasulla D, Suzumura T (2020) What are we depressed about when we talk about COVID-19: mental health analysis on tweets using natural language processing. In: Artificial intelligence XXXVII: 40th SGAI international conference on artificial intelligence, AI 2020, Cambridge, UK, December 15–17, 2020, proceedings 40. Springer, pp 358–370
Lin XV, Mihaylov T, Artetxe M, Wang T, Chen S, Simig D, Ott M, Goyal N, Bhosale S, Du J, Pasunuru R, Shleifer S, Koura PS, Chaudhary V, O’Horo B, Wang J, Zettlemoyer L, Kozareva Z, Diab M, Stoyanov V, Li X (2022) Few-shot learning with multilingual generative language models. In: Goldberg Y, Kozareva, Z., Zhang, Y. (eds.) Proceedings of the 2022 conference on empirical methods in natural language processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp 9019–9052. https://doi.org/10.18653/v1/2022.emnlp-main.616 . https://aclanthology.org/2022.emnlp-main.616
Li Y, Sosea T, Sawant A, Nair AJ, Inkpen D, Caragea C (2021) P-stance: a large dataset for stance detection in political domain. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp 2355–2365
Li J, Tai Z, Zhang R, Yu W, Liu L (2014) Online Bursty event detection from microblog. In: 2014 IEEE/ACM 7th international conference on utility and cloud computing. IEEE. pp 865–870
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Li S, Yang J, Zhao K (2023) Are you in a masquerade? Exploring the behavior and impact of large language model driven social bots in online social networks. arXiv preprint arXiv:2307.10337
Lyu C, Wu M, Wang L, Huang X, Liu B, Du Z, Shi S, Tu Z (2023) Macaw-LLM: Multi-modal language modeling with image, audio, video, and text integration. arXiv preprint arXiv:2306.09093
Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 142–150
Mao J, Liu W (2019) A bert-based approach for automatic humor detection and scoring. In: IberLEF@ SEPLN, pp. 197–202
Ma Z, Yang G, Yang Y, Gao Z, Wang J, Du Z, Yu F, Chen Q, Zheng S, Zhang S et al (2024) An embarrassingly simple approach for LLM with strong ASR capacity. arXiv preprint arXiv:2402.08846
Min B, Ross H, Sulem E, Veyseh APB, Nguyen TH, Sainz O, Agirre E, Heintz I, Roth D (2023) Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput Surv 56(2):1–40
Mohammad S, Kiritchenko S, Sobhani P, Zhu X, Cherry C (2016) Semeval-2016 task 6: Detecting stance in tweets. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 31–41
Mozafari M, Farahbakhsh R, Crespi N (2020) A bert-based transfer learning approach for hate speech detection in online social media. In: Complex networks and their applications VIII: volume 1 proceedings of the eighth international conference on complex networks and their applications complex networks 2019. Springer, pp 928–940
Mu Y, Dong C, Bontcheva K, Song X (2024) Large language models offer an alternative to the traditional approach of topic modelling. arXiv preprint arXiv:2403.16248
Nagarajan K, Muniyandi M, Palani B, Sellappan S (2020) Social network analysis methods for exploring SARS-CoV-2 contact tracing data. BMC Med Res Methodol 20:1–10
Naseem U, Thapa S, Zhang Q, Rashid J, Hu L, Nasim M (2023) Temporal tides of emotional resonance: a novel approach to identify mental health on social media. In: Proceedings of the 11th international workshop on natural language processing for social media, pp 1–8
Nasim M, Charbey R, Prieur C, Brandes U (2016) Investigating link inference in partially observable networks: friendship ties and interaction. IEEE Trans Comput Soc Syst 3(3):113–119
Nasim M, Nguyen A, Lothian N, Cope R, Mitchell L (2018) Real-time detection of content polluters in partially observable twitter networks. In: Companion proceedings of the the web conference 2018, pp 1331–1339
Nasim M, Sharif N, Bhandari P, Weber D, Wood M, Falzon L, Kashima Y (2022) Investigating language use by polarised groups on twitter: a case study of the bushfires. In: Proceedings of the 26th Australasian document computing symposium, pp 1–7
Nasim M, Weber D, South T, Tuke J, Bean N, Falzon L, Mitchell L (2022) Are we always in strife? a longitudinal study of the echo chamber effect in the Australian Twittersphere. arXiv preprint arXiv:2201.09161
Negi G, Sarkar R, Zayed O, Buitelaar P (2024) A hybrid approach to aspect based sentiment analysis using transfer learning. arXiv preprint arXiv:2403.17254
Nguyen T, Phung D, Adams B, Venkatesh S (2013) Event extraction using behaviors of sentiment signals and burst structure in social media. Knowl Inf Syst 37:279–304
Nguyen D, Al Mannai KA, Joty S, Sajjad H, Imran M, Mitra P (2017) Robust classification of crisis-related data on social networks using convolutional neural networks. In: Proceedings of the international AAAI conference on web and social media, vol 11, pp 632–635
Nirmal A, Bhattacharjee A, Sheth P, Liu H (2024) Towards interpretable hate speech detection using large language model-extracted rationales. arXiv preprint arXiv:2403.12403
Olan F, Jayawickrama U, Arakpogun EO, Suklan J, Liu S (2022) Fake news on social media: the impact on society. Inf Syst Front 26:443–458
Parihar AS, Thapa S, Mishra S (2021) Hate speech detection using natural language processing: applications and challenges. In: 2021 5th international conference on trends in electronics and informatics (ICOEI). IEEE, pp 1302–1308
Pauls A, Klein D (2011) Faster and smaller n-gram language models. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 258–267
Perifanos K, Goutsos D (2021) Multimodal hate speech detection in Greek social media. Multimodal Technol Interact 5(7):34
Pilar Salas-Zárate M, Alor-Hernández G, Sánchez-Cervantes JL, Paredes-Valverde MA, García-Alcaraz JL, Valencia-García R (2020) Review of English literature on figurative language applied to social networks. Knowl Inf Syst 62(6):2105–2137
Piskorski J, Stefanovitch N, Nikolaidis N, Da San Martino G, Nakov P (2023) Multilingual multifaceted understanding of online news in terms of genre, framing, and persuasion techniques. In: Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), pp 3001–3022
Ponti EM, Glavaš G, Majewska O, Liu Q, Vulić I, Korhonen A (2020) XCOPA: a multilingual dataset for causal commonsense reasoning. In: Webber B, Cohn T, He Y, Liu Y (eds.) Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics. pp 2362–2376. Online. https://doi.org/10.18653/v1/2020.emnlp-main.185. https://aclanthology.org/2020.emnlp-main.185
Pramanick S, Roy A, Patel VM (2022) Multimodal learning using optimal transport for sarcasm and humor detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3930–3940
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
Ramakrishnan N, Butler P, Muthiah S, Self N, Khandpur R, Saraf P, Wang W, Cadena J, Vullikanti A, Korkmaz G et al (2014) ’beating the news’ with embers: forecasting civil unrest using open source indicators. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1799–1808
Rauniyar K, Poudel S, Shiwakoti S, Thapa S, Rashid J, Kim J, Imran M, Naseem U (2023) Multi-aspect annotation and analysis of Nepali tweets on anti-establishment election discourse. IEEE Access 11:143092–143115
Rish I et al (2001) An empirical study of the Naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, Citeseer, pp 41–46
Ritter A, Mausam Etzioni O, Clark S (2012) Open domain event extraction from twitter. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1104–1112
Romero D, Tan C, Ugander J (2013) On the interplay between social and topical structure. In: Proceedings of the international AAAI conference on web and social media, vol 7, pp 516–525
Rosenthal S, Farra N, Nakov P (2019) Semeval-2017 task 4: sentiment analysis in twitter. arXiv preprint arXiv:1912.00741
Rotstein N, Bensaïd D, Brody S, Ganz R, Kimmel R (2024) FuseCap: Leveraging large language models for enriched fused image captions. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 5689–5700
Ruths D, Pfeffer J (2014) Social media for large studies of behavior. Science 346(6213):1063–1064
Saha P, Agrawal A, Jana A, Biemann C, Mukherjee A (2024) On zero-shot counterspeech generation by LLMS. arXiv preprint arXiv:2403.14938
Sajjad M, Zulifqar F, Khan MUG, Azeem M (2019) Hate speech detection using fusion approach. In: 2019 international conference on applied and engineering mathematics (ICAEM). IEEE, pp 251–255
Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108
Sawhney R, Neerkaje A, Habernal I, Flek L (2023) How much user context do we need? Privacy by design in mental health NLP applications. In: Proceedings of the international AAAI conference on web and social media, vol 17, pp 766–776
Seddari N, Derhab A, Belaoued M, Halboob W, Al-Muhtadi J, Bouras A (2022) A hybrid linguistic and knowledge-based analysis approach for fake news detection on social media. IEEE Access 10:62097–62109
Shah SB, Shiwakoti S, Chaudhary M, Wang H (2024) Memeclip: Leveraging clip representations for multimodal meme classification. arXiv preprint arXiv:2409.14703
Shah SB, Thapa S, Acharya A, Rauniyar K, Poudel S, Jain S, Masood A, Naseem U (2024) Navigating the web of disinformation and misinformation: large language models as double-edged swords. IEEE Access
Shiwakoti S, Thapa S, Rauniyar K, Shah A, Bhandari A, Naseem U (2024) Analyzing the dynamics of climate change discourse on twitter: a new annotated corpus and multi-aspect classification. In: Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), pp 984–994
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Sun Y, Jia R, Razzaq A, Bao Q (2024) Social network platforms and climate change in china: evidence from TikTok. Technol Forecast Soc Chang 200:123197
Sun X, Li X, Zhang S, Wang S, Wu F, Li J, Zhang T, Wang G (2023) Sentiment analysis through LLM negotiations. arXiv preprint arXiv:2311.01876
Šuppa M, Skala D, Jašš D, Sučík S, Švec A, Hraška P (2024) Bryndza at climateactivism 2024: stance, target and hate event detection via retrieval-augmented GPT-4 and llama. arXiv preprint arXiv:2402.06549
Thapa S, Jafri F, Hürriyetoğlu A, Vargas F, Lee RK-W, Naseem U (2023) Multimodal hate speech event detection-shared task 4, case 2023. In: Proceedings of the 6th workshop on challenges and applications of automated extraction of socio-political events from text, pp 151–159
Thapa S, Rauniyar K, Jafri F, Shiwakoti S, Veeramani H, Jain R, Kohli GS, Hürriyetoğlu A, Naseem U (2024) Stance and hate event detection in tweets related to climate activism—shared task at CASE 2024. In: Hürriyetoğlu A, Tanev H, Thapa S, Uludoğan G (eds.) Proceedings of the 7th workshop on challenges and applications of automated extraction of socio-political events from text (CASE 2024). Association for Computational Linguistics, St. Julians, Malta, pp. 234–247. https://aclanthology.org/2024.case-1.33
Thapa S, Rauniyar K, Jafri F, Shiwakoti S, Veeramani H, Jain R, Kohli GS, Hürriyetoğlu A, Naseem U (2024) Stance and hate event detection in tweets related to climate activism-shared task at case 2024. In: Proceedings of the 7th workshop on challenges and applications of automated extraction of socio-political events from text (CASE 2024), pp 234–247
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940
Tikhonov A, Ryabinin M (2021) It’s all in the heads: using attention heads as a baseline for cross-lingual transfer in commonsense reasoning. In: Zong C, Xia F, Li W, Navigli R (eds.) Findings of the association for computational linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, pp 3534–3546. Online. https://doi.org/10.18653/v1/2021.findings-acl.310 . https://aclanthology.org/2021.findings-acl.310
Tran OT, Dao TT, Dang YN (2021) Stance detection on Vietnamese social media. In: International conference on soft computing and pattern recognition. Springer, pp 75–85
Tuke J, Nguyen A, Nasim M, Mellor D, Wickramasinghe A, Bean N, Mitchell L (2020) Pachinko prediction: A Bayesian method for event prediction from social media data. Inf Process Manage 57(2):102147
Vassey J, Valente T, Barker J, Stanton C, Li D, Laestadius L, Cruz TB, Unger JB (2023) E-cigarette brands and social media influencers on Instagram: a social network analysis. Tob Control 32(e2):184–191
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Venkatakrishnan R, Goodarzi M, Canbaz MA (2023) Exploring large language models’ emotion detection abilities: use cases from the middle east. In: 2023 IEEE conference on artificial intelligence (CAI). IEEE, pp 241–244
Walter S, Kinski L, Boda Z (2023) Who talks to whom? using social network models to understand debate networks in the European parliament. Eur Union Politics 24(2):410–423
Wang Z, Yin Z, Argyris YA (2020) Detecting medical misinformation on social media using multimodal deep learning. IEEE J Biomed Health Inform 25(6):2193–2203
Wang H, Lee RK-W (2024) MemeCraft: Contextual and stance-driven multimodal meme generation. arXiv preprint arXiv:2403.14652
Wang L, Xu X, Zhang L, Lu J, Xu Y, Xu H, Zhang C (2024) MMIDR: teaching large language model to interpret multimodal misinformation via knowledge distillation. arXiv preprint arXiv:2403.14171
Wasserman S, Faust K (1994) Social network analysis: methods and applications
Weber D, Neumann F (2021) Amplifying influence through coordinated behaviour in social networks. Soc Netw Anal Min 11(1):111
Weber D, Falzon L, Mitchell L, Nasim M (2022) Promoting and countering misinformation during Australia’s 2019–2020 bushfires: a case study of polarisation. Soc Netw Anal Min 12(1):64
Wei J, Bosma M, Zhao VY, Guu K, Yu AW, Lester B, Du N, Dai AM, Le QV (2021) Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652
Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, Yogatama D, Bosma M, Zhou D, Metzler D et al (2022) Emergent abilities of large language models. Transactions on Machine Learning Research
Weizenbaum J (1966) Eliza-a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45
Whitehouse C, Choudhury M, Aji AF (2023) LLM-powered data augmentation for enhanced cross-lingual performance. In: The 2023 conference on empirical methods in natural language processing
Wong A, Ho S, Olusanya O, Antonini MV, Lyness D (2021) The use of social media and online communications in times of pandemic COVID-19. J Intensive Care Soc 22(3):255–260
Wu S, Fei H, Qu L, Ji W, Chua T-S (2023) Next-GPT: Any-to-any multimodal LLM. arXiv preprint arXiv:2309.05519
Wu J, Lin H, Yang L, Xu B (2021) Mumor: A multimodal dataset for humor detection in conversations. In: Natural language processing and chinese computing: 10th CCF international conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, proceedings, Part I 10. Springer, pp 619–627
Xia C, Hu J, Zhu Y, Naaman M (2015) What is new in our city? A framework for event extraction using social media posts. In: Advances in knowledge discovery and data mining: 19th Pacific-Asia conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, proceedings, Part I 19. Springer, pp 16–32
Xing F (2024) Designing heterogeneous llm agents for financial sentiment analysis. arXiv preprint arXiv:2401.05799
Yakura H (2023) Evaluating large language models’ ability to understand metaphor and sarcasm using a screening test for Asperger syndrome. arXiv preprint arXiv:2309.10744
Yin D, Bansal H, Monajatipoor M, Li LH, Chang KW (2022) Geomlama: Geo-diverse commonsense probing on multilingual pre-trained language models. In: Proceedings of the 2022 conference on empirical methods in natural language processing, pp 2039–2055 (2022)
Yuan A, Coenen A, Reif E, Ippolito D (2022) Wordcraft: story writing with large language models. In: 27th international conference on intelligent user interfaces, pp 841–852
Yu S, Da San Martino G, Mohtarami M, Glass J, Nakov P (2021) Interpretable propaganda detection in news articles. In: Proceedings of the international conference on recent advances in natural language processing (RANLP 2021), pp 1597–1605
Zhang X, Chen X, Liu Y, Wang J, Hu Z, Yan R (2024) LLM-driven agents for influencer selection in digital advertising campaigns. arXiv preprint arXiv:2403.15105
Zhang M, Jiang G, Liu S, Chen J, Zhang M (2024) LLM–assisted data augmentation for Chinese dialogue–level dependency parsing. Comput Linguist 1–24
Zhang Z, Robinson D, Tepper J (2018) Detecting hate speech on twitter using a convolution-gru based deep neural network. In: The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15, Springer, pp 745–760
Zhang Y, Sun S, Galley M, Chen Y-C, Brockett C, Gao X, Gao J, Liu J, Dolan WB (2020) Dialogpt: Large-scale generative pre-training for conversational response generation. In: Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations, pp 270–278
Zhang B, Yang H, Zhou T, Ali Babar M, Liu X-Y (2023) Enhancing financial sentiment analysis via retrieval augmented large language models. In: Proceedings of the fourth ACM international conference on AI in finance, pp 349–356
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28
Zhao H, Chen H, Yang F, Liu N, Deng H, Cai H, Wang S, Yin D, Du M (2024) Explainability for large language models: a survey. ACM Trans Intell Syst Technol 15(2):1–38
Zhong S, Huang Z, Gao S, Wen W, Lin L, Zitnik M, Zhou P (2023) Let’s think outside the box: Exploring leap-of-thought in large language models with creative humor generation. arXiv preprint arXiv:2312.02439
Ziems C, Held W, Shaikh O, Chen J, Zhang Z, Yang D (2024) Can large language models transform computational social science? Comput Linguist 50:237–291
Funding
NA
Author information
Authors and Affiliations
Contributions
S.T. wrote the first draft of the paper and conceptualized the paper. S.S., S.B.S., S.A., H.V. revised and edited the paper. M.N. and U.N. supervised the project and contributed to reviewing and editing the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethics Approval
NA
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Thapa, S., Shiwakoti, S., Shah, S.B. et al. Large language models (LLM) in computational social science: prospects, current state, and challenges. Soc. Netw. Anal. Min. 15, 4 (2025). https://doi.org/10.1007/s13278-025-01428-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-025-01428-9