A comparative study of machine learning models for sentiment analysis of transboundary rivers news media articles

Wang, Jiale; Wei, Jing; Tian, Fuqiang; Wei, Yongping

doi:10.1007/s00500-024-10357-2

A comparative study of machine learning models for sentiment analysis of transboundary rivers news media articles

Application of soft computing
Open access
Published: 06 December 2024

Volume 28, pages 13331–13347, (2024)
Cite this article

Download PDF

You have full access to this open access article

Soft Computing Aims and scope Submit manuscript

A comparative study of machine learning models for sentiment analysis of transboundary rivers news media articles

Download PDF

Jiale Wang¹,
Jing Wei¹,
Fuqiang Tian ORCID: orcid.org/0000-0001-9414-7019¹ &
…
Yongping Wei²

494 Accesses
Explore all metrics

Abstract

Sentiment analysis of news media articles is essential for understanding the dynamics of conflict and cooperation in transboundary rivers. However, it is not known which machine learning model(s) can best meet the requirement of sentiment analysis for transboundary rivers. This study presents a comparative examination of ten machine learning models commonly used in the field of text sentiment analysis, including K-Nearest Neighbors, Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, Gradient Boosting Decision Tree, Extreme Gradient Boosting, Multilayer Perceptron, Long Short-Term Memory and Bidirectional Encoder Representations from Transformers, for five-class sentiment classification of 9382 news articles (1977–2022) attending to transboundary water conflict and cooperation. By evaluating their performance in terms of accuracy, precision, recall and F1-score, the Bidirectional Encoder Representations from Transformers (BERT) model demonstrated good overall performance and prediction capabilities for news articles with conflictive sentiments. By comparing with the AFINN sentiment dictionary, BERT showed superior performance in the prediction and identification of conflictive sentiment labels. And by validating against historical water events in the three river basins, BERT performed best in the Indus River basin. The findings of this study hold significant implications for government agencies in transboundary rivers, allowing them to promptly assess and respond to public sentiment, thereby preventing water conflict and promoting water cooperation.

Cross-Domain Sentiment Analysis of the Natural Romanian Language

BERT: a sentiment analysis odyssey

Article 26 February 2021

Lexical Resource Creation and Evaluation: Sentiment Analysis in Marathi

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Water is the source of all life on Earth and water moves often crossing political boundaries. Latest statistics show that there are 310 transboundary river basins globally, which are shared by 150 countries, covering 47.1% of the Earth’s land surface, including 52% of the world’s population, and accounting for almost 60% of the world’s freshwater flow (McCracken and Wolf 2019). With population growth, economic development and climate change, transboundary water resources management has emerged as a crucial global challenge in the twenty-first century as it could spark conflicts and, in extreme cases, even wars (Biswas and Tortajada 2019; Baranyai 2020; Iyer 2020; Gökçekuş and Bolouri 2023). Therefore, understanding the dynamics of conflict and cooperation in transboundary river water management is significant for addressing this global challenge (Uitto and Duda 2002; Phillips et al. 2006; Rai et al. 2017; Karim 2020; Honkonen 2022).

News media plays a crucial role in both mirroring and influencing public opinions regarding key issues (Miles and Morse 2007). It has a substantial impact on shaping policy agendas and providing a public sphere for deliberating and legitimizing policy options, thus influencing the formulation of policy alternatives (Steffek 2009; Kleinschmit 2012). In transboundary river issues, news media articles can reflect the attitudes and values of the country or public towards water events occurring within the region. News coverage from non-riparian countries can provide insights into the international public opinion on water events that occurred in the specific transboundary river basin.

Sentiment analysis, also known as opinion mining, plays a crucial role in new media data research (Bukovina 2016). It encompasses a range of computational techniques aimed at identifying, extracting, and analyzing human emotions, feelings, or opinions from textual data (Sadegh et al. 2012; Fang and Zhan 2015; Elnagar et al. 2020). Sentiment analysis can be categorized into either binary or multi classes. In binary sentiment analysis, the text is divided into two classes, encompassing positive and negative sentiments, while multi-class sentiment analysis involves the classification of the text into multiple levels or fine-grained labels. There are two kinds of methods for sentiment analysis: machine-learning methods and dictionary-based methods (Medhat et al. 2014). Machine learning methods can be further classified into supervised and unsupervised approaches, with a predominant reliance on supervised classification approaches. This entails the need for annotated data to train the classifiers (Gonçalves et al. 2013). In the case of supervised methods, it is necessary to have two separate sets of labeled data, one for training the model and another for testing its performance (Vinodhini and Chandrasekaran 2012; Ravi and Ravi 2015). Dictionary-based methods rely on pre-defined lists of words, lexicons, or dictionaries, where each word is assigned a specific sentiment.

Revealing the dynamics of conflict/cooperation in transboundary water management has high requirements for sentiment analysis in news media. Firstly, it requires grasping many stakeholders’ delicate opinions. Transboundary rivers such as the Lancang-Mekong River, the Indus River and the Nile River involve thousands of organizations including IGOs, NGOs, River Basin Organizations (RBOs), government ministries/agencies of each country, industries, financial institutions, civil groups and research institutes/universities (Wei et al. 2021). It is well argued that conflict/cooperation in transboundary water management is an important part of international politics, thus delicate or multi-level classification of sentiments is required. Most previous studies with manual coding methods adopted from 3 up to 15 levels (Azar 1980; Yoffe and Larson 2001; Grünwald et al. 2020b). Secondly, conflict/cooperation in transboundary water management requires not only an understanding of historical sentiment patterns on conflict and cooperation dynamics (Turton 2005; Zeitoun and Mirumachi 2008; Wei et al. 2021) but also the capability for timely monitoring and prediction of public sentiment surrounding such water conflicts (Warner 2023). This involves the early detection and resolution of potential water conflicts before they escalate, facilitating a proactive and pre-emptive approach to transboundary river water management. Thirdly, given the exponential growth of news text data accessible relevant to transboundary rivers (Fesseha et al. 2020), manual examination and classification have become impractical and unworkable. To address all the requirements, there is a critical need for the automated classification and processing of these news data (Bobichev and Cherednichenko 2017). The recent advances in machine learning have provided strong support for the development of automated text categorization systems (Hartmann et al. 2019; Kadhim 2019; Barberá et al. 2021) and public opinion monitoring (Meng et al. 2022; Chen and Du 2023; Duan et al. 2023). However, different machine learning models exhibit variations in performance across diverse research domains (movie reviews, product development, restaurant reviews, etc.) in sentiment analysis tasks (Maulana et al. 2020; Zahoor et al. 2020; Dashtipour et al. 2021; Giannakis et al. 2022; Yang et al. 2023). The lack of a clear consensus on which machine learning model is more suitable for specific domains in sentiment analysis tasks restricts the effective application of machine learning methods in transboundary water conflict dynamics.

The study aims to compare and evaluate the performance of different machine learning models in simulating transboundary water conflict and cooperation dynamics. Ten commonly used machine learning models (K-Nearest Neighbors, Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, Gradient Boosting Decision Tree, Extreme Gradient Boosting, Multilayer Perceptron, Long Short-Term Memory and Bidirectional Encoder Representations from Transformers) will be assessed on a corpus of global transboundary rivers news media articles and annotated each news article with five sentiment categories: − 2, − 1, 0, 1, and 2 according to the level of their conflict and cooperation. The best-performed model will be compared with dictionary-based approaches and validated using historical water events in the Lancang-Mekong River, the Nile River, and the Indus River basins. By identifying the most effective models for detecting nuanced sentiments related to transboundary water conflict and cooperation, the findings from this study are expected to advance the capabilities of policymakers, stakeholders, and researchers for early detection and proactive management of potential water conflicts, improving cooperation of transboundary rivers.

2 Methods

The flowchart in Fig. 1 outlines the steps of the research methods employed in this study. The process begins with data acquisition, followed by data labeling, data processing in the third step, model training in the fourth step, and finally model evaluation.

2.1 Data acquisition

New media data collection is one of the major challenges in assessing the dynamics of transboundary water conflict and cooperation (Grünwald et al. 2020a). Since English is the most commonly used language in international communication, and the majority of countries have English-language news media, English news articles served as the primary focus of this study. The news articles were collected from the LexisNexis database, which is globally recognized and comprises full-text English news articles from more than 6000 newspapers worldwide, making it one of the extensively utilized news repositories in social science research (Weaver and Bimber 2008; Racine et al. 2010; Jiang et al. 2017). The components of search terms significantly influence the coverage and relevance of the news articles to be retrieved. Guo et al. (2022)’s search terms developed for the LexisNexis database were adopted in this study which was developed based on Transboundary Freshwater Dispute Database (TFDD) (Yoffe and Larson 2001). These search terms consist of five components, denoted as Basin Name, Riparian Countries, Theme Terms, Conflict/Cooperation Terms and Excluded Terms. The five sections aid in narrowing down the search to the intended range. Additionally, the utilization of a list of terms to exclude helps eliminate irrelevant topics. Totally, 9,382 relevant news articles were collected and analyzed, covering 105 out of 310 transboundary river basins, which were published by 759 news media agencies from 84 countries. The time frame was from 1977 to 2022. These news articles comprised the corpus used in this study.

2.2 Data labeling

In the sentiment classification of text, a binary (positive or negative) classification is predominantly adopted. This classification reduces complexity, leading to high predictive accuracy. However, this approach fails to capture subtle sentimental nuances and might overlook public sentiments that require special attention. Conversely, multi-level classification offers more comprehensive sentiment categorization, enabling the classifier to distinguish different sentiment states more accurately. Nevertheless, the classifier needs to predict in a larger decision space, which may lead to confusion between categories. Considering the dual requirements of classification and data characteristics, the number of labels was determined to be 5 in this study. Building on the characteristics of news media articles in this study and previous studies on defining the intensity of conflict or cooperation in transboundary water events (Azar 1980; Yoffe and Larson 2001; Grünwald et al. 2020b), the public sentiment polarities reflected in news media articles were categorized into five classes, corresponding to Cooperative response for actions (2), Oral expression of cooperative response (1), Neutrality (0), Oral expression of conflictive response (− 1) and Conflictive response for actions (− 2). Cooperative response for actions (2) signifies substantive collaborative actions in various fields jointly taken by the public or officials to achieve cooperation on water management, such as meetings, signing cooperation agreements/treaties, etc. While conflictive response for actions (− 2) represents public or official actions such as protest marches, various forms of hostile behavior across different domains, court arbitration, military conflicts, etc. Oral expression of cooperative response (1) indicates verbal support from the public or authorities towards the associated water event. Analogously, oral expression of conflictive response (− 1) refers to verbal expressions displaying negativity, discord, opposition and hostility. Neutrality (0) means that the relevant news media articles have no sentimental preference and primarily offer an objective description of the water event.

The sentiment polarities of collected news articles were classified by human reading. Apart from the authors of this paper, this study also invited several assistants to aid in judging the sentiment polarity of the news articles. All members were trained to judge the sentiment polarity of news articles with strict consistency. The sentiment polarity of each news article was judged by at least two group members. In the case of inconsistent judgment of a news article, the sentiment polarity of this article was determined by the consensus of the whole team. The result of data labelling is summarized in Table 1.The distribution of label values in the corpus aligned with the previous study by Yoffe and Larson (2001).

Table 1 News articles distribution across different labels

Full size table

2.3 Data processing

Data processing is an essential prerequisite for cleaning the corpus and improving the results. To prepare the news text data for analysis, several processing steps were implemented. Firstly, all news text content was converted into lowercase to ensure uniformity and reduce the impact of case sensitivity. Numbers, URL and special symbols were systematically removed from the corpus. In addition, a list of predefined stop words commonly occurring but semantically less informative was employed for elimination. Furthermore, punctuation marks were stripped from the news text content. Next, feature extraction was performed on the cleaned news text data. The purpose of text feature extraction is to transform the processed news text into numerical features that can be utilized by machine learning models. The Term Frequency-Inverse Document Frequency (TF-IDF) method was mainly utilized to extract features of the news text data (Baeza-Yates and Ribeiro-Neto 1999; Li et al. 2022). The essential idea underlying TF-IDF is that words with higher frequency in one news article and less in others should be more important as they contribute more to classification (Chiny et al. 2021; Liu et al. 2022a). After text feature extraction, each news media article was represented by a vector suitable for machine learning algorithms.

2.4 Machine learning models

Many widely known machine learning models have been used to solve text classification problems. In this study, 10 machine learning models commonly used in the field of text sentiment analysis were selected to classify the news articles (Table 2).

Table 2 The strengths and weaknesses of the machine learning models used in this study

Full size table

K-Nearest Neighbors (KNN), a supervised learning algorithm (Cover and Hart 1967), classifies a new text sample based on the class labels of its nearest K neighbors. The class with the highest number of votes among the K neighbors determines the sentiment category of the news text. Naive Bayes (NB) is a simple and fast probabilistic classifier (Maron 1961). It is based on Bayes’ theorem and assumes that the features are conditionally independent given the class label. NB calculates the probability of a text belonging to a particular sentiment class by the occurrence of its features in the training data. Support Vector Machine (SVM) is a widely used and robust supervised classifier (Cortes and Vapnik 1995; Morovati et al. 2021). For the five-class classification task in this research, the model utilizes multiple binary classifiers to achieve multi-class classification. Decision Tree (DT) is a classification algorithm that resembles a tree structure (Mitchell 1997), with each node of leaf representing a feature/attribute and equivalently a result or condition. It recursively reaches a conclusion by partitioning a tree (Fesseha et al. 2020). Random Forest (RF) is a classifier that relies on an ensemble of decision trees (Breiman 2001). The final class is determined by aggregating the outcomes of the individual trees. Gradient Boosting Decision Tree (GBDT) is an ensemble learning algorithm. It constructs multiple decision trees iteratively, where each new tree focuses on correcting the errors made by the previous ones. By combining the predictions of all trees, GBDT improves the classification accuracy and effectively handles complex relationships in the input text data, making it suitable for sentiment analysis tasks. Extreme Gradient Boosting (XGBoost) is an advanced implementation of the Gradient Boosting Decision Tree (GBDT) algorithm (Chen and Guestrin 2016). It optimizes the GBDT algorithm with enhancements in tree construction and regularization techniques, resulting in better performance and faster training, which can be a powerful choice for sentiment analysis tasks. Multilayer Perceptron (MLP) is a type of feedforward neural network (Khalil Alsmadi et al. 2009). It consists of multiple layers of interconnected neurons, including an input layer, one or more hidden layers, and an output layer. It learns to extract relevant features from the input text and maps them to the corresponding sentiment class through a series of nonlinear transformations and weight adjustments during training. Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) designed to handle sequential data (Hochreiter and Schmidhuber 1997; Li et al. 2023), like news media articles and other textual data. It can capture long-range dependencies and context information from the text, making it effective for understanding and classifying the sentiment of the input text. It processes sequential input step by step, updating its memory cell and hidden state to retain relevant information and make accurate sentiment predictions. Finally, Bidirectional Encoder Representations from Transformers (BERT) is a widely used and powerful language representation model used for text classification (Devlin et al. 2018). It employs a transformer-based architecture to pre-train on vast amounts of text data, enabling it to capture rich contextual information. Fine-tuning BERT on sentiment analysis allows it to understand the nuances of text and make accurate predictions for sentiment classification, achieving high performance in natural language processing tasks. As shown in Table 2, each of these 10 machine learning models has its own strengths and weaknesses when performing text sentiment analysis tasks (Ting and Zhang 2003; Gupte et al. 2014; Bhavitha et al. 2017; Yang et al. 2017; Hemmatian and Sohrabi 2019; Prabha and Srikanth 2019; Srivastava et al. 2020; Mohammadi and Shaverizade 2021; Saifullah et al. 2021; Hariguna and Ruangkanjanases 2023; Syriopoulos et al. 2023).

In our study, the generalization of the modeling process was achieved through meticulously designed training conditions and validation methods. Firstly, the ten models were run under the same training condition, with 80% of the news text data in the corpus used as the training dataset and 20% as the testing dataset. Secondly, recognizing that the performance of machine learning models can be highly sensitive to the choice of hyperparameters, we conducted extensive hyperparameter tuning process using the method of GridSearchCV, which performed an exhaustive search by trying all possible combinations of hyperparameter values from the parameter grid, such as the different combinations of parameter C, kernel type and parameter gamma for the Support Vector Machine (SVM). During each iteration of the grid search, the model's performance was evaluated using cross-validation. We performed fivefold cross-validation on the training dataset (80% of the news articles in the corpus). The training dataset was divided into five equal subsets, of which four sets were used for training, while the remaining one was used as the validation set. This process was then repeated five times, each time selecting a different subset as the validation set, which helps in assessing the models’ performance more reliably across different subsets of data.

2.5 Model evaluation metrics

The performance of the training models for analyzing the sentiment polarities of news media articles in this study was firstly assessed by using the metrics commonly used in machine learning. The four metrics—accuracy, precision, recall and F1-Score were chosen for comparing the ten models in this study (Hemmatian and Sohrabi 2019; Wang et al. 2019). Accuracy (Acc) is the proportion of correctly predicted samples to all samples. Precision (Pre) is the proportion of samples that actually belong to a specific category out of all samples predicted as that category. Recall (Rec) is the proportion of samples belonging to a specific category that are correctly predicted by the classifier out of all samples belonging to that category. F1-Score (F1) is a metric that considers both precision (Pre) and recall (Rec), the weighted average of precision (Pre) and recall (Rec). Precision, recall and F1-Score are vital metrics for unbalanced test datasets. In terms of evaluating classifiers, accuracy and F1-Score stand out as the primary metrics employed to assess the text classification methods (Wadud et al. 2022; Al Mahmoud et al. 2023). In this study, the precision, recall and F1-Score of each label were calculated separately to analyze the performance of not only the whole but also the individual label. In addition, the evaluation metrics of the prediction results can be influenced by the proportion of the label values of news articles in the training dataset (Ertekin 2013). Both under-sampling and over-sampling can tackle the issue of imbalanced label distribution (Amin et al. 2016), but over-sampling can have a more substantial positive impact on classification performance compared to under-sampling when dealing with intricate data types (Ertekin 2013; Chen et al. 2020). Given that the number of collected news articles with extreme conflictive label values (− 2) is significantly less than others in this study, the over-sampling method was adopted to achieve the balanced distribution of the news articles with different label values by replicating existing news text data from the minority class.

Then, the training models’ performance was assessed by comparing the results obtained from the conventional dictionary-based approaches. Dictionary-based approaches are widely employed in the field of text sentiment analysis (Hardeniya and Borikar 2016; Zhang et al. 2018; Catelli et al. 2023). A sentiment dictionary contains a list of words, with each word being associated with a sentiment polarity label. By matching the vocabulary in preprocessed text with the words in the sentiment dictionary, the sentiment score is calculated for each matched sentiment word based on its sentiment label and weight in the dictionary. The sentiment scores of all matched words are then aggregated to obtain the overall score of the entire text. We compared the performance of machine learning methods and dictionary-based methods for the 5-level sentiment classification, highlighting the possible advantages and disadvantages of the two approaches.

As there is no specialized sentiment dictionary available for this study, the widely adopted AFINN lexicon (Nielsen 2011; Huang et al. 2017; Shuvo et al. 2023) was selected to represent the dictionary-based methods. AFINN encompasses 2477 attached word sentiments, with each word scoring ranging from − 5 (very negative) to + 5 (very positive) (Nielsen 2011). As a result, the sentiment score of the text is also confined to the range of − 5 to + 5. The sentiment polarities of news articles calculated by AFINN fell within the range − 2 to + 3, not aligning with the label categories defined in this study. According to the mapping relationship between AFINN prediction scores and the true label values, we corresponded AFINN score − 2 with the defined label value − 2. Similarly, AFINN score − 1 corresponded to label value − 1, AFINN score 0 corresponded to label value 0, AFINN score 1 corresponded to label value 1, and AFINN scores 2 and 3 corresponded to label value 2. In this way, the sentiments calculated by AFINN were transformed into the five labels defined in this study. We chose the best-performed model as the representative of all machine learning models to compare with the results from the dictionary-based approaches.

Finally, we validated the training models’ performance with the historical water events in three case transboundary rivers. Among the 77 transboundary river basins covered by news media articles in the testing dataset, the Lancang-Mekong River Basin in South-East Asia, the Nile River Basin in North-East Africa and the Indus River Basin in South Asia were selected as case study areas, all of which are important and representative transboundary river basins in the world. The Lancang-Mekong River is the longest transboundary river in Southeast Asia. It flows from the Tibetan Plateau in China, then flows through Myanmar, Laos, Thailand, Cambodia, and Vietnam, and finally flows into the South China Sea. The Lancang-Mekong River system is relied upon by over 70 million people for water supply, food production and transportation (Junlin et al. 2021; Lu et al. 2021; Liu et al. 2022b). Riparian countries in the Lancang-Mekong River basin share divergent interests in the development and conservation of the basin. In the face of impacts from geopolitical shifts, hydrological changes and socio-economic development, the Lancang-Mekong River basin is undergoing fluctuations in water conflict and cooperation. The Nile River is the longest transboundary river in the world, measured at 6670 km. It originates from the plateau in Burundi and flows northward through Rwanda, Tanzania, Kenya, Uganda, the Democratic Republic of the Congo, South Sudan, Sudan, Ethiopia and Egypt, then flows into the Mediterranean Sea. 300 million people are relying on the waters of the Nile River, and the population within the basin continues to grow rapidly (Paisley and Henshaw 2013). The upstream and downstream countries in the Nile River basin face significant disputes over the allocation of water resources (Alemu and Dinar 2000; Whittington 2022). The Indus River has a total length of 3200 km, originating from the Tibetan Plateau in China, then flowing through India and Pakistan, finally flowing into the Arabian Sea (Abro et al. 2020). It is one of the main rivers and an important source of agriculture irrigation in Pakistan. The long-standing disputed territorial borders along the river flows between India and Pakistan and continuous domestic over-development of the shared water resources from the two countries have resulted in a tense situation in the Indus River basin (Yaqoob 2019; Rigi and Warner 2020; Janjua et al. 2021). The predictions of the best-performed model were validated against the most conflictive water events occurring within these three basins.

3 Results

3.1 Model performance assessment with the machine learning metrics

The evaluation of model performance is essential for any machine learning classifier as it evaluates the predictive capacity of the classifier (Guisan et al. 2017; Khanal 2022). Table 3 summarizes the accuracy results of the ten machine learning models. The best accuracy recorded by BERT model reached 72.7%, while conversely, the lowest scored by DT model marked 57.8%. The accuracy of the remaining eight models ranged from 58.6% to 71.4%. The average accuracy was 66.5%. One shocking observation was that the accuracy of the LSTM model was only 62.2%.

Table 3 The accuracy of each model

Full size table

Figure 2 clearly shows the precision of all models across each label. For the sentiment label -2, BERT and GBDT achieved precision rates of 79.3% and 77.6% respectively, both of which demonstrated strong performance in predicting extreme conflictive sentiment tendencies. KNN exhibited a significantly higher precision rate of 88.3% compared to other models in the label value of − 1 category, depicting excellent performance in predicting conflictive sentiment. RF outperformed other models in terms of precision when it came to sentiment label 0, signifying its good performance in predicting neutral emotions. For the sentiment value of 1, KNN and SVM attained relatively high precision rates of 72.7% and 70.1% respectively, underscoring their effectiveness in predicting news media articles characterized by a moderately cooperative sentimental inclination. BERT and SVM showcased the highest precision rates in the case of sentiment label 2, both at 80.1%, emphasizing their exceptional capability in predicting cooperative sentiments.

Further, Fig. 3 presents a comparative view of the recall performance among all models for varying label values. On sentiment label − 2, the recall rates across all models were relatively low, with the highest being KNN and MLP at 66.7%, suggesting that all models had difficulty in identifying extreme conflictive sentiment. This could be attributed to the limited representation of label − 2 in the corpus. With a recall rate of 85.4% on sentiment label − 1, SVM demonstrated excellent capability in identifying milder conflictive sentiment. Despite the lower accuracy, LSTM stood out with a significantly higher recall on sentiment label 0, achieving 71.5%, which highlighted that LSTM was more accurate in recognizing neutral news media articles. GBDT and BERT displayed relatively higher recall rates on sentiment label 1, reaching 69.1% and 68.4% correspondingly. It implied that these two models excelled in identifying articles with a lighter cooperative sentiment. And with a recall of 86.1% for sentiment label 2, KNN performed exceptionally well in identifying highly cooperative sentiment.

The F1-Score for each label is visualized in Fig. 4. In terms of extremely conflictive sentiment (label − 2), the F1-Scores across all models were relatively low, the best and worst results were reported by GBDT with F1-Score of 66.2% and LSTM with F1-Score of 44.3% respectively. This indicated subpar performance of all classers in predicting extremely conflictive sentiment. With respect to sentiment label − 1, aside from KNN, DT, and LSTM, the remaining models showed similar F1-Scores, falling within the range of 70–77.7%. Moving on to neutral sentiment (label 0), LSTM achieved the highest F1-Score at 70.7%, while the remaining nine models only fluctuated between 45.7 and 58.0%. For cooperative sentiment, labelled as 1, the F1-Scores for all models varied from 54.1 to 67.2%, signalling a challenge in accurately predicting news articles with mild cooperative sentiment. Lastly, for highly cooperative sentiment (label 2), BERT and XGBoost led the pack with the highest F1-Scores of 81.0% and 79.4%, shining in predicting highly cooperative sentiment.

The performance of the machine learning models in predicting the conflictive sentiments (− 1 and − 2) should be given special attention as they are foci for transboundary water management. Despite the precision of KNN on the sentiment label − 1 was 88.3%, which was significantly higher than other models, its recall was only 42.3%, indicating that over half of the news media articles with a true label of − 1 in the testing dataset were not successfully identified. Considering precision, recall and F1-Score comprehensively, the performance of BERT in predicting and identifying sentiment label − 1 remained superior. NB and DT exhibited lower precision at 47.7% and 46.3%, respectively, suggesting that NB and DT are not suitable for the prediction and recognition of sentient label − 2. Among all the news media articles with the sentiment label − 2 in the testing dataset, LSTM only successfully predicted at 38.0%, showing the poorest performance in recognizing sentiment label − 2. The recall of DT was also below 50%, at 48.7%. By comparing the recall of the models, it can be observed that LSTM and DT failed to meet the requirements of the task. Both BERT and GBDT demonstrated high F1-Score in predicting sentiment label − 2. BERT exhibited higher precision than GBDT, while GBDT displayed slightly higher recall than BERT, illustrating their similar competence in predicting and identifying the most conflictive sentiment.

3.2 Model performance assessment by comparing with sentiment dictionary

Given the comprehensive performance of BERT and its outstanding capabilities in predicting and identifying conflictive sentiment labels, it was chosen as the representative of all models. The landscape of the correspondence relationship between the sentiment polarities predicted by BERT and AFINN and the true labels of the news media articles is provided in Fig. 5.

Firstly, accuracy scores were calculated to evaluate the overall performance of these two approaches. BERT exhibited an accuracy of 72.7%, which was notably higher than AFINN's 27.4%. Next, regarding the prediction and identification of conflictive sentiment labels, among 84 news media articles with a sentiment label value of -2 in the testing dataset, BERT incorrectly predicted the sentiment values of 21 news articles as label − 1, 6 news articles as 0, 7 news articles as 1 and 4 news articles as 2. Surprisingly, compared to BERT, AFINN failed to identify any news articles with label − 2, and mispredicted 75% of them as neutral or cooperative. The performance of AFINN in predicting and identifying sentiment label -1 was similarly poor.

3.3 Validation of the model’s performance with historical water events

BERT was chosen as the representative of all models because of its outstanding capabilities in predicting and identifying conflictive sentiment labels. The performance of BERT in predicting sentiment label − 2 was verified with historical water events occurring within transboundary river basins. Table 4 presents the number of news media articles with true sentiment label − 2, the number of news media articles predicted as sentiment label − 2, the number of news media articles with true sentiment label − 2 successfully predicted in each basin, and calculated precision (Pre) and recall (Rec).

Table 4 Number of news with true/predicted sentiment label -2 in each river basin and calculated precision (Pre) and recall (Rec)

Full size table

BERT had the worst performance in predicting and identifying news media articles with sentiment label − 2 in the Lancang-Mekong River basin, with precision and recall rates of 33.3% and 22.2% respectively. Following that was the Nile River basin, with precision and recall rates of 83.3% and 41.7%. The best performance was observed in the Indus River basin, with precision and recall rates of 84.6% and 63.5%, respectively. As the number of news media articles with the true label − 2 increased in the river basin, both precision and recall improved. It can be observed that BERT predicted a minority of news media articles with the true label − 2 as neutral or cooperation. In the Lancang-Mekong River basin, for instance, a news media article about donors slashed funding for MRC was incorrectly predicted as neutral. In the Nile River basin, 5 news media articles with the most conflictive sentiment were predicted as neutral or cooperative, especially two of them about the giant dam construction of Ethiopia were predicted as the most cooperative sentiment. And in the Indus River basin, there were still instances discussing the arbitration between India and Pakistan on hydropower projects that were predicted as cooperative. In such cases, decision-makers would be confused by the predicted results of BERT, leading to biased decision-making. The overviews of the main contents of news media articles with true label − 1 and predicted as label − 2 in the Lancang-Mekong River basin, the Nile River basin and the Indus River basin are shown in Tables A1, A2, A3 in the Appendix.

4 Discussions and conclusions

Understanding the dynamics of conflict and cooperation on water is crucial for global transboundary river management. This paper presented a study on comparing the performance of different machine learning models in analyzing the sentiment polarity in news media articles on transboundary water conflict and cooperation. We developed a large corpus of 9382 news media articles on transboundary water conflict and cooperation, collected from the LexisNexis database, covering 105 of the 310 transboundary river basins globally. Each news article in the corpus was manually labeled with the value of − 2, − 1, 0, 1 or 2. A higher label indicates a greater cooperative sentiment, while a lower label signifies a higher level of conflict sentiment. A total of 10 well-known machine learning models including K-Nearest Neighbors, Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, Gradient Boosting Decision Tree, Extreme Gradient Boosting, Multilayer Perceptron, Long Short-Term Memory and Bidirectional Encoder Representations from Transformers, were explored for their application on the newly created corpus. The performance of each model was firstly assessed by four metrics: accuracy, precision, recall and F1-Score, then compared with the results from the dictionary approach, and finally validated with the historical water events in three case rivers. The key findings and their implications for future research and management practices are summarized below.

Regarding the four metrics, KNN, DT and LSTM fell short, whereas RF, GBDT and XGBoost displayed relatively stronger performance, with BERT leading the pack overall. As a deep learning model, the accuracy of the LSTM model should be higher than that of traditional machine learning models (Abd El-Jawad et al. 2018; Alameri and Mohd 2021; Dhola and Saradva 2021). One potential reason could be that the LSTM model requires a substantial amount of training data (Li et al. 2018; Derbentsev et al. 2023). In this study, 80% of the news text data from the corpus was utilized as the training data, which might be relatively limited for optimal performance of the LSTM model. Another reason may be that the LSTM model may face challenges in processing long texts, especially when the news text data is exceptionally lengthy, which could lead to some important feature information loss (Rao et al. 2018; Zhai et al. 2023).

The performance of these models in predicting and identifying news media articles with conflictive sentiments is extremely important for transboundary management because predicting news articles with conflictive sentiments as neutral or cooperative may lead to overlooking public opinions that require attention, downplaying real conflicts, and misleading decision-making. It was found that KNN, NB, DT and LSTM displayed limited applicability in predicting and identifying news articles with conflictive sentiments. While BERT excelled in overall performance, it also showed strong prediction and identification capabilities for conflictive sentiments.

The results from the comparison between BERT and the sentiment dictionary AFINN indicated that AFINN was not proficient at predicting news articles with conflictive sentiment polarities and may ignore public opinion information that required attention. The reason why AFINN predicted the sentiment polarities of many news articles as neutral mainly lies in two aspects. On the one hand, AFINN may have insufficient coverage of sentiment words, potentially lacking many terms that appeared in the news media articles of this study, leading to the inability to calculate sentiment scores for these words. On the other hand, AFINN directly calculated sentiment scores through the addition and subtraction of sentiments of each word, overlooking the semantic correlations between sentences in the news text. The main difference between BERT and AFINN was that the former concentrated errors on news articles marked by human annotators as neutral or hard to judge, while the errors in AFINN depended primarily on dictionary features. Although BERT performed significantly better than AFINN on this specific task, it operated as a black-box model, resulting in relatively poor interpretability of the model output (Li et al. 2022). While dictionary-based methods are based on manually curated vocabularies and sentiment rules, providing a certain level of interpretability to its results. Simultaneously, pre-tagging or labeling of data is not required, which can save time and effort in preparation. Therefore, in scenarios with limited computational resources and smaller datasets, dictionary-based methods can still be preferred, even if they offer slightly lower performance.

The results from the alignment of BERT with historical water events in three case rivers indicate that the performance of BERT varied across different river basins, with the best predictive and recognition performance for news media articles with the most conflictive sentiment in the Indus River basin. This was attributed to the availability of sufficient training data for the Indus River basin, allowing the model to better learn news text features with sentiment label − 2. According to the performance of BERT in the Lancang-Mekong River basin, it showed that if the sample size of news media articles with sentiment label − 2 was very small in a basin, the over-sampling method utilized in the training process can’t effectively improve the precision and recall of the model. Therefore, given the current inability to improve the model’s performance on small samples, machine learning model should be cautiously considered for extreme sentiment monitoring and forecasting in transboundary rivers lacking enough news media articles with these extreme sentiments. To address the challenge of classifying extreme sentiments accurately, it is necessary to combine the predictive results of machine learning models with manual review and examination, which can maintain efficient automated processing while improving the accuracy and reliability of the classification.

By assessing the performance of models in the predicting of sentiments of historical news articles on transboundary water conflict and cooperative issues, we established a benchmark for the reliability and robustness of these models. This is a critical step in demonstrating that the trained models could be trusted and used in real-time monitor for future sentiment analysis. The capability to predict sentiment trends allows for the early identification of potential conflicts, enabling the transboundary water issues to be addressed proactively rather than reactively. This historical perspective is also crucial for policymakers and stakeholders to engage more effectively with communities based on historical pattern of public sentiment towards transboundary water issues, fostering more inclusive and participatory policies to transboundary water resources management. As a result, the capability to predict sentiment trends and lessons learned from the historical events provided by this study enable policymakers to early detect the social, economic and environmental risks of transboundary rivers, foster cooperation among riparian countries, ensure the equitable distribution of water resources for regional stability, and build a community with a shared future for mankind.

Data availability

Data used in the article is available upon reasonable request.

References

Abd El-Jawad MH, Hodhod R, Omar YM (2018) Sentiment analysis of social media networks using machine learning. In: 2018 14th international computer engineering conference (ICENCO). IEEE, pp. 174–176
Abro NA, Waryani B, Narejo TN, Ferrando S, Abro SA, Abbasi AR, Ul-Hassan H (2020) Diversity of freshwater fish in the lower reach of Indus River, Sindh province section. Pakistan. Egypt J Aquat Biol Fish 24(6):243–265
Google Scholar
Alameri SA, Mohd M (2021) Comparison of fake news detection using machine learning and deep learning techniques. In: 2021 3rd international cyber resilience conference (CRC). IEEE, pp. 1–6
Alemu, Dinar (2000) The process of negotiation over international water disputes: the case of the Nile Basin. Int Negot 5(2): 331–356
Al Mahmoud RH, Hammo BH, Faris H (2023) Cluster-based ensemble learning model for improving sentiment classification of Arabic documents. Natl Lang Eng:1–39
Amin A, Anwar S, Adnan A, Nawaz M, Howard N, Qadir J, Hussain A (2016) Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study. IEEE Access 4:7940–7957
Google Scholar
Azar EE (1980) The conflict and peace data bank (COPDAB) project. J Conflict Resolut 24(1):143–152
Google Scholar
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York
Google Scholar
Baranyai G (2020) Theories of conflict and cooperation Over Transboundary River Basins. In: European water law and hydropolitics: an inquiry into the resilience of transboundary water governance in the European Union, pp. 15–27
Barberá P, Boydstun AE, Linn S, McMahon R, Nagler J (2021) Automated text classification of news articles: a practical guide. Polit Anal 29(1):19–42
Google Scholar
Bhavitha BK, Rodrigues AP, Chiplunkar NN (2017) Comparative study of machine learning techniques in sentimental analysis. In: 2017 International conference on inventive communication and computational technologies (ICICCT), IEEE, pp. 216–221
Biswas AK, Tortajada C (2019) Water crisis and water wars: myths and realities. Int J Water Resour Dev 35(5):727–731
Google Scholar
Bobichev V, Kanishcheva O, Cherednichenko O (2017) Sentiment analysis in the Ukrainian and Russian news. In: 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON) (pp. 1050–1055). IEEE
Breiman L (2001) Random forests. Mach Learn 45:5–32
Google Scholar
Bukovina J (2016) Social media big data and capital markets—an overview. J Behav Exp Financ 11:18–26
Google Scholar
Catelli R, Pelosi S, Comito C, Pizzuti C, Esposito M (2023) Lexicon-based sentiment analysis to detect opinions and attitudes towards COVID-19 vaccines on Twitter in Italy. Comput Biol Med 158:106876
Google Scholar
Chen M, Du W (2023) The predicting public sentiment evolution on public emergencies under deep learning and internet of things. J Supercomput 79(6):6452–6470
MathSciNet Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794
Chen LC, Lee CM, Chen MY (2020) Exploration of social media for sentiment analysis using deep learning. Soft Comput 24:8187–8197
Google Scholar
Chiny M, Chihab M, Bencharef O, Chihab Y (2021) LSTM, VADER and TF-IDF based hybrid sentiment analysis model. Int J Adv Comput Sci Appl. 12:11
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Google Scholar
Dashtipour K, Gogate M, Adeel A, Larijani H, Hussain A (2021) Sentiment analysis of persian movie reviews using deep learning. Entropy 23(5):596
Google Scholar
Derbentsev VD, Bezkorovainyi VS, Matviychuk AV, Pomazun OM, Hrabariev AV, Hostryk AM (2023) A comparative study of deep learning models for sentiment analysis of social media texts. In: CEUR Workshop Proceedings, pp. 168–188
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Dhola K, Saradva M (2021) A comparative evaluation of traditional machine learning and deep learning classification techniques for sentiment analysis. In: 2021 11th international conference on cloud computing, data science and engineering (Confluence). IEEE, pp. 932–936
Duan HK, Vasarhelyi MA, Codesso M, Alzamil Z (2023) Enhancing the government accounting information systems using social media information: an application of text mining and machine learning. Int J Acc Inf Syst 48:100600
Google Scholar
Elnagar A, Al-Debsi R, Einea O (2020) Arabic text classification using deep learning models. Inf Process Manage 57(1):102121
Google Scholar
Ertekin Ş (2013) Adaptive oversampling for imbalanced data classification. In: Information Sciences and Systems 2013: Proceedings of the 28th International Symposium on Computer and Information Sciences, Springer International Publishing, Cham, pp. 261–269
Fang X, Zhan J (2015) Sentiment analysis using product review data. J Big Data 2(1):1–14
Google Scholar
Fesseha A, Xiong S, Emiru ED, Dahou A (2020) Text classification of news articles using machine learning on low-resourced language: Tigrigna. In: 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD), IEEE, pp. 34–38
Giannakis M, Dubey R, Yan S, Spanaki K, Papadopoulos T (2022) Social media and sensemaking patterns in new product development: demystifying the customer sentiment. Ann Oper Res 308:145–175
Google Scholar
Gökçekuş H, Bolouri F (2023) Transboundary waters and their status in today’s water-scarce world. Sustainability 15(5):4234
Google Scholar
Gonçalves P, Araújo M, Benevenuto F, Cha M (2013) Comparing and combining sentiment analysis methods. In: Proceedings of the first ACM conference on Online social networks, pp. 27–38
Grünwald R, Feng Y, Wang W (2020a) Reconceptualization of the transboundary water interaction nexus (TWINS): approaches, opportunities and challenges. Water Int 45(5):458–478
Google Scholar
Grünwald R, Wang W, Feng Y (2020b) Modified transboundary water interaction Nexus (TWINS): Xayaburi dam case study. Water 12(3):710
Google Scholar
Guisan A, Thuiller W, Zimmermann NE (2017) Habitat suitability and distribution models: with applications in R. Cambridge University Press, Cambridge
Google Scholar
Guo L, Wei J, Zhang K, Wang J, Tian F (2022) Building a methodological framework and toolkit for news media dataset tracking of conflict and cooperation dynamics on transboundary rivers. Hydrol Earth Syst Sci 26(4):1165–1185
Google Scholar
Gupte A, Joshi S, Gadgul P, Kadam A, Gupte A (2014) Comparative study of classification algorithms used in sentiment analysis. Int J Comput Sci Inform Technol 5(5):6261–6264
Google Scholar
Hardeniya T, Borikar DA (2016) Dictionary based approach to sentiment analysis-a review. Int J Adv Eng Manag Sci 2(5):239438
Google Scholar
Hariguna T, Ruangkanjanases A (2023) Adaptive sentiment analysis using multioutput classification: a performance comparison. PeerJ Comput Sci 9:e1378
Google Scholar
Hartmann J, Huppertz J, Schamp C, Heitmann M (2019) Comparing automated text classification methods. Int J Res Mark 36(1):20–38
Google Scholar
Hemmatian F, Sohrabi MK (2019) A survey on classification techniques for opinion mining and sentiment analysis. Artif Intell Rev 52(3):1495–1545
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Google Scholar
Honkonen T (2022) The EU’s promotion of sustainable development beyond its borders through transboundary water cooperation
Huang A, Ebert D, Rider P (2017) You are what you tweet: a new hybrid model for sentiment analysis. In: Machine Learning and Data Mining in Pattern Recognition: 13th International Conference, MLDM 2017, New York, NY, USA, Proceedings 13, 403–416. Springer International Publishing
Iyer RR (2020) Transboundary water conflicts: a review. In: Water Conflicts in India, pp. 369–376
Janjua S, Hassan I, Muhammad S, Ahmed S, Ahmed A (2021) Water management in Pakistan’s Indus Basin: challenges and opportunities. Water Policy 23(6):1329–1343
Google Scholar
Jiang H, Qiang M, Lin P, Wen Q, Xia B, An N (2017) Framing the Brahmaputra River hydropower development: different concerns in riparian and international media reporting. Water Policy 19(3):496–512
Google Scholar
Junlin R, Ziqian P, Xue P (2021) New transboundary water resources cooperation for Greater Mekong Subregion: the Lancang-Mekong Cooperation. Water Policy 23(3):684–699
Google Scholar
Kadhim AI (2019) Survey on supervised machine learning techniques for automatic text classification. Artif Intell Rev 52(1):273–292
MathSciNet Google Scholar
Karim S (2020) Transboundary water cooperation between Bangladesh and India in the Ganges River Basin: exploring a benefit-sharing approach. https://www.diva-portal.org/smash/get/diva2:1499222/FULLTEXT01.pdf
Khalil Alsmadi M, Omar KB, Noah SA, Almarashdah I (2009) Performance comparison of multi-layer perceptron (back propagation, delta rule and perceptron) algorithms in neural networks. In: 2009 IEEE International Advance Computing Conference, IEEE, pp. 296–299
Khanal B (2022) Using machine learning to predict the risk of human-elephant conflict in the Nepal-India Transboundary Region (Master's thesis, University of Twente)
Kleinschmit D (2012) Confronting the demands of a deliberative public sphere with media constraints. For st Policy Econ 16:71–80
Google Scholar
Li C, Zhan G, Li Z (2018) News text classification based on improved Bi-LSTM-CNN. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), IEEE, pp. 890–893
Li Q, Peng H, Li J, Xia C, Yang R, Sun L, He L (2022) A survey on text classification: from traditional to deep learning. ACM Trans Intell Syst Technol (TIST) 13(2):1–41
Google Scholar
Li B, Li R, Sun T, Gong A, Tian F, Khan MYA, Ni G (2023) Improving LSTM hydrological modeling with spatiotemporal deep learning and multi-task learning: a case study of three mountainous areas on the Tibetan Plateau. J Hydrol 620:129401
Google Scholar
Liu H, Chen X, Liu X (2022a) A study of the application of weight distributing method combining sentiment dictionary and TF-IDF for text sentiment analysis. IEEE Access 10:32280–32289
Google Scholar
Liu J, Chen D, Mao G, Irannezhad M, Pokhrel Y (2022b) Past and future changes in climate and water resources in the lancang–mekong River Basin: current understanding and future research directions. Engineering 13:144–152
Google Scholar
Lu Y, Tian F, Guo L, Borzì I, Patil R, Wei J, Sivapalan M (2021) Socio-hydrologic modeling of the dynamics of cooperation in the transboundary Lancang-Mekong River. Hydrol Earth Syst Sci 25(4):1883–1903
Google Scholar
Maron ME (1961) Automatic indexing: an experimental inquiry. J ACM (JACM) 8(3):404–417
Google Scholar
Maulana R, Rahayuningsih PA, Irmayani W, Saputra D, Jayanti WE (2020) Improved accuracy of sentiment analysis movie review using support vector machine based information gain. J Phys Conf Ser 1641(1):012060
Google Scholar
McCracken M, Wolf AT (2019) Updating the register of international river basins of the world. Int J Water Resour Dev 35(5):732–782
Google Scholar
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113
Google Scholar
Meng F, Xiao X, Wang J (2022) Rating the crisis of online public opinion using a multi-level index system. arXiv preprint arXiv:2207.14740
Miles B, Morse S (2007) The role of news media in natural disaster risk and recovery. Ecol Econ 63(2–3):365–373
Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill. https://www.worldcat.org/zh-cn/title/61321007
Mohammadi A, Shaverizade A (2021) Ensemble deep learning for aspect-based sentiment analysis. Int J Nonlinear Anal Appl 12(Special Issue): 29–38
Morovati K, Nakhaei P, Tian F, Tudaji M, Hou S (2021) A Machine learning framework to predict reverse flow and water level: a case study of Tonle Sap Lake. J Hydrol 603:127168
Google Scholar
Nielsen FÅ (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903
Paisley RK, Henshaw TW (2013) Transboundary governance of the Nile River Basin: past, present and future. Environ Dev 7:59–71
Google Scholar
Phillips D, Daoudy M, McCaffrey S, Öjendal J, Turton A (2006) Trans-boundary water cooperation as a tool for conflict prevention and for broader benefit-sharing (Vol. 4). Ministry for Foreign Affairs, Stockholm
Prabha MI, Srikanth GU (2019) Survey of sentiment analysis using deep learning techniques. In: 2019 1st international conference on innovations in information and communication technology (ICIICT). IEEE, pp. 1–9
Racine E, Waldman S, Rosenberg J, Illes J (2010) Contemporary neuroscience in the media. Soc Sci Med 71(4):725–733
Google Scholar
Rai SP, Wolf AT, Sharma N, Tiwari H (2017) Hydropolitics in transboundary water conflict and cooperation. In: River system analysis and management, pp. 353–368
Rao G, Huang W, Feng Z, Cong Q (2018) LSTM with sentence representations for document-level sentiment classification. Neurocomputing 308:49–57
Google Scholar
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst 89:14–46
Google Scholar
Rigi H, Warner JF (2020) Two-level games on the trans-boundary river Indus: obstacles to cooperation. Water Policy 22(6):972–990
Google Scholar
Sadegh M, Ibrahim R, Othman ZA (2012) Opinion mining and sentiment analysis: a survey. Int J Comput Technol 2(3):171–178
Google Scholar
Saifullah S, Fauziah Y, Aribowo AS (2021) Comparison of machine learning for sentiment analysis in detecting anxiety based on social media data. arXiv preprint arXiv:2101.06353
Shuvo RA, Hossain MS, Islam MF, Rasel AA (2023) Sentiment analysis of restaurant reviews from bangladeshi food delivery apps. In: 2023 International Conference on Emerging Smart Computing and Informatics (ESCI), IEEE, pp. 1–5
Srivastava AK, Chaudhary A, Gautam A, Singh DP, Khan R (2020) Prediction of students performance using KNN and decision tree-a machine learning approach. Strad 7(9):119–125
Google Scholar
Steffek J (2009) Discursive legitimation in environmental governance. For Policy Econ 11(5–6):313–318
Google Scholar
Syriopoulos PK, Kalampalikis NG, Kotsiantis SB, Vrahatis MN (2023) kNN classification: a review. Ann Math Artif Intell:1–33
Ting KM, Zheng Z (2003) A study of adaboost with naive bayesian classifiers: weakness and improvement. Comput Intell 19(2):186–200
MathSciNet Google Scholar
Turton AR (2005) Water as a source of conflict or cooperation: the case of South Africa and its transboundary Rivers. CSIR Report 2
Uitto JI, Duda AM (2002) Management of transboundary water resources: lessons from international cooperation for conflict prevention. Geogr J 168(4):365–378
Google Scholar
Vinodhini G, Chandrasekaran RM (2012) Sentiment analysis and opinion mining: a survey. Int J 2(6):282–292
Google Scholar
Wadud MAH, Kabir MM, Mridha MF, Ali MA, Hamid MA, Monowar MM (2022) How can we manage offensive text in social media-a text classification approach using LSTM-BOOST. Int J Inform Manag Data Insights 2(2):100095
Google Scholar
Wang R, Zhou D, Jiang M, Si J, Yang Y (2019) A survey on opinion mining: from stance to product aspect. IEEE Access 7:41101–41124
Google Scholar
Warner J (2023) Conflict and cooperation over transboundary waters. In: Routledge Handbook of Water and Development, Routledge, pp. 184–193
Weaver DA, Bimber B (2008) Finding news stories: a comparison of searches using LexisNexis and Google News. J Mass Commun Quart 85(3):515–530
Google Scholar
Wei J, Wei Y, Tian F, Nott N, de Wit C, Guo L, Lu Y (2021) News media coverage of conflict and cooperation dynamics of water events in the Lancang-Mekong River basin. Hydrol Earth Syst Sci 25(3):1603–1615
Google Scholar
Whittington D (2022) Contested baselines and transboundary water resources management, with illustrations from the Nile. Water Int 47(6):934–951
Google Scholar
Yang K, Cai Y, Huang D, Li J, Zhou Z, Lei X (2017). An effective hybrid model for opinion mining and sentiment analysis. In: 2017 IEEE international conference on big data and smart computing (BigComp) (pp. 465–466). IEEE
Yang, G., Xu, Y., & Tu, L. (2023). An intelligent box office predictor based on aspect-level sentiment analysis of movie review. Wireless Networks, 1–11.
Yaqoob A (2019) Shared river basins in disputed territories: a case study of Indus and Brahmaputra. World Water Policy 5(1):36–42
MathSciNet Google Scholar
Yoffe S, Larson K (2001) Chapter 2 basins at risk: water event database methodology. Department of Geography, Oregon State, Corvallis
Zahoor K, Bawany NZ, Hamid S (2020) Sentiment analysis and classification of restaurant reviews using machine learning. In: 2020 21st International Arab Conference on Information Technology (ACIT) (pp. 1–6). IEEE
Zeitoun M, Mirumachi N (2008) Transboundary water interaction I: reconsidering conflict and cooperation. Int Environ Agreem Polit Law Econ 8:297–316
Google Scholar
Zhai Z, Zhang X, Fang F, Yao L (2023) Text classification of Chinese news based on multi-scale CNN and LSTM hybrid model. Multimed Tools Appl 82:1–14
Google Scholar
Zhang S, Wei Z, Wang Y, Liao T (2018) Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Fut Gen Comput Syst 81:395–403
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the National Natural Science Foundation of China (NSFC; grant no. 51961125204 and grant no. 92047301) for the funding and support of this research and sincerely thank the group members for their assistance in judging the sentiment polarity of the news articles. Also, we would like to thank Bu Li from Tsinghua University for his valuable suggestions on model optimization.

Funding

This research was sponsored by the National Natural Science Foundation of China (NSFC; Grant no. 51961125204 and Grant no. 92047301).

Author information

Authors and Affiliations

Department of Hydraulic Engineering, State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing, 100084, China
Jiale Wang, Jing Wei & Fuqiang Tian
School of the Environment, The University of Queensland, St. Lucia, QLD, 4072, Australia
Yongping Wei

Authors

Jiale Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Wei
View author publications
You can also search for this author in PubMed Google Scholar
Fuqiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yongping Wei
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JL: Model construction, original draft writing and results analysis; JW: Labels designing and labeling team coordination; FQ: Supervision, review and editing; YP: Supervision, review and editing.

Corresponding author

Correspondence to Fuqiang Tian.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This research does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See appendix Tables A1, A2, A3.

Table A1. Overview of the main contents of news media articles with true label − 2 and predicted label by BERT in the Lancang-Mekong River basin

Full size table

Table A2. Overview of the main contents of news media articles with true label -2 and predicted label by BERT in the Nile River basin

Full size table

Table A3. Overview of the main contents of news media articles with true label -2 and predicted label by BERT in the Indus River basin

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, J., Wei, J., Tian, F. et al. A comparative study of machine learning models for sentiment analysis of transboundary rivers news media articles. Soft Comput 28, 13331–13347 (2024). https://doi.org/10.1007/s00500-024-10357-2

Download citation

Accepted: 11 June 2024
Published: 06 December 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s00500-024-10357-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A comparative study of machine learning models for sentiment analysis of transboundary rivers news media articles

Abstract

Similar content being viewed by others