Transportation sentiment analysis using word embedding and ontology-based topic modeling

doi:10.1016/j.knosys.2019.02.033

Knowledge-Based Systems

Volume 174, 15 June 2019, Pages 27-42

https://doi.org/10.1016/j.knosys.2019.02.033 Get rights and content

Highlights

•
Social networks provide a new approach to collect data regarding transportation.
•
Sentiment analysis can make observations of social data to examine transportation.
•
Current text mining techniques are unable to generate the topics accurately.
•
Document representation is another challenging tasks in sentiment analysis.
•
We proposed a new topic modeling and word embedding system for sentiment analysis.

Abstract

Social networks play a key role in providing a new approach to collecting information regarding mobility and transportation services. To study this information, sentiment analysis can make decent observations to support intelligent transportation systems (ITSs) in examining traffic control and management systems. However, sentiment analysis faces technical challenges: extracting meaningful information from social network platforms, and the transformation of extracted data into valuable information. In addition, accurate topic modeling and document representation are other challenging tasks in sentiment analysis. We propose an ontology and latent Dirichlet allocation (OLDA)-based topic modeling and word embedding approach for sentiment classification. The proposed system retrieves transportation content from social networks, removes irrelevant content to extract meaningful information, and generates topics and features from extracted data using OLDA. It also represents documents using word embedding techniques, and then employs lexicon-based approaches to enhance the accuracy of the word embedding model. The proposed ontology and the intelligent model are developed using Web Ontology Language and Java, respectively. Machine learning classifiers are used to evaluate the proposed word embedding system. The method achieves accuracy of 93%, which shows that the proposed approach is effective for sentiment classification.

Introduction

Recent advances in social media and textual resources have allowed realization of information retrieval and sentiment analysis in data mining and natural language processing (NLP) [1].However, extracting valuable information from online news articles and social media, such as Twitter, Facebook, and TripAdvisor, has become a new challenge for sentiment analysis. On one hand, the texts on social networks are unstructured and constantly increasing. On the other hand, online texts are short and have a lot of slang, idioms, jargon, and dynamic topics.

Intelligent transportation systems (ITSs) need social network data in order to examine transportation services and support traffic control and management systems. In social media, information about transportation networks, such as traffic jams and accidents, appears regularly with unexpected texts, and it would be a challenging task to extract these data and transform them into valuable information for analysis.

Text mining has gained much more attention amongresearchers, and has been proposed for automating information extraction from unstructured textual data. The rapid improvement in NLP and machine learning (ML) has developed two frameworks for text mining: one-hot encoding and word embedding. Statistical learning models exhibit good performance in document representation. Bag-of-words (BoW) is the first and the most popular model to represent a document in the field of NLP [2]. This model represents a document as a dictionary, and contains all words that occur in the document. The BoW model is easy to implement, works fast, and achieves good results with very little data. However, the dimensionality of a word vector is high, even for a single sentence, and neglects word order in the BoW model. Since it is not capable of representing large-scale data, the performance of the classifiers could not be improved. Therefore, a probabilistic approach has been proposed to overcome the limitations of BoW, such as latent Dirichlet allocation (LDA), latent semantic indexing (LSI), and principle component analysis (PCA).

Word embedding is a distributed representation approach, which is an alternative to BoW [1], [2]. It represents each word with a very low-dimensional vector and semantic meaning. In order to represent a word-vector for corpus data, a word embedding model, such as word2vec, doc2vec, and GloVe, must be trained using a large amount of social media data. However, word embedding models have some limitations. Using a pre-trained word embedding model with high dimensionality for a small amount of data is not the best way. For document representation, the two estimation methods of a word-vector miss the context of documents. In addition, word embedding neglects information on sentiment in any given content.

An LDA statistical model can automatically discover a latent topic from a large volume of transportation data. LDA disregards word order and groups semantically related words into the same topic based on their representation in the documents. However, LDA has three main limitations that affect the classification results. First, the generated topics under LDA comprise irrelevant features when other transportation-related text is in them. Second, it produces very noisy topics from short text, and misses valuable topics because of the limited dataset. Third, it neglects the relation between topic and document when a document has low-probability words. Ontologies are considered the best approach, and can enhance the performance of LDA to find appropriate topics along with features (words) in transportation data.

The goal of the proposed system is to improve the performance of document representation and sentiment classification. However, the accuracy of sentiment classification is dependent on the representation of text in documents. The existing text representation models examine imprecise words, which are not associated with the topics of the document, and neglect information on sentiment in any given content. Therefore, we propose ontology- and LDA-based topic modeling and a word embedding system to precisely represent texts to improve the accuracy of sentiment classification. The proposed model was trained using datasets from different social media networks, and an evaluation is conducted with ML classifiers. The results prove that the proposed approach is capable of correctly representing documents, and improves the accuracy of sentiment classification. The main contributions in this research are the following.

•
We propose a novel framework that retrieves the most relevant documents, reviews, and Tweets from social media and news articles.
•
We propose ontology- and LDA-based topic modeling called topic2vec that extracts the most appropriate topics and features of document, and neglects irrelevant words to enhance the document representation. The proposed ontology represents semantic knowledge that enriches an LDA model to extract more accurate features from transportation texts.
•
We integrate a topic2vec with word2vec and generate a word embedding model that represents each word in the document with semantic meanings and a low-dimensional vector.
•
We propose a new fuzzy ontology-based lexicon method, which is used with six other lexicons to enhance the accuracy of the pre-trained word embedding model in sentiment classification tasks.
•
We compare the performance of string2vec, word2vec,doc2vec, glove2vec, and lexicon2vec with our proposed model. We use ML algorithms to classify the data from these models and present the results. The comparison results help understand the limitations and advantages of the document representation models.

This paper is structured as follows. Section 2 presents discussions of sentiment analysis, topic modeling, and document representation models. Section 3 illustrates our proposed framework and the procedure of data collection and filtration. Section 4 provides information about topic modeling and word embedding. Section 5 presents the experimental results. Finally, Section 6 concludes our work.

Section snippets

Related work

This section looks at sentiment analysis, topic modeling, and word embedding approaches. First, we discuss the general standpoint of sentiment analysis, and then focus on the domain of social data related to transportation. We also present a brief review of topic modeling and deep learning-based word embedding approaches in sentiment classification.

Proposed approach

This section briefly introduces different methods that are applied to develop the proposed OLDA-based topic modeling and word embedding system. The main focus of the proposed approach is to enhance the performance of topic modeling, document representation, and sentiment classification. We used different techniques (namely LDA, the ontology, and deep learning) to represent words along with the most relevant topics for opinion classification. LDA is applied to find the statistical relationships

Topic modeling and word embedding

In this section, we employ LDA and ontology-based topic modeling to identify transportation-related topics in preprocessed data. After that, word embedding algorithms (word2vec and glove2vec) along with lexicon2vec are used to convert words in the corpus into a vector format. The whole scenario is shown in Fig. 2.

Experiments

The dataset used in the evaluation process was discussed in Section 3. The proposed approach was presented in Section 4. Here, the validation procedure is defined and the obtained results are discussed.

Conclusion

In this paper, we presented an ontology and LDA-based topic modeling and word embedding system to enhance the performance of document representation and sentiment classification, and to facilitate mobility users and ITSs. Various sensible issues are discussed, including valuable-information extraction, transformation of extracted data into useful knowledge, generation of topics and features using an ontology and LDA, representation of documents under different approaches, and integration of

Acknowledgment

This research was supported by the Ministry of Science, ICT and Future Planning (MSIP) , South Korea, under the ITRC support program (IITP-2017-2014-0-00729) supervised by the Institute for Information & communications Technology Promotion (IITP).

References (75)

LauR.Y.K. et al.
Social analytics: Learning fuzzy product ontologies for aspect-oriented sentiment analysis
Decis. Support Syst.
(2014)
AgarwalS. et al.
A hybrid model using logistic regression and wavelet transformation to detect traffic incidents
IATSS Res.
(2016)
DragoniM. et al.
A fuzzy-based strategy for multi-domain sentiment analysis
Int. J. Approx. Reason.
(2018)
AliF. et al.
Fuzzy ontology-based sentiment analysis of transportation and city feature reviews for safe traveling q
Transp. Res. Part C
(2017)
ValdiviaA. et al.
Consensus vote models for detecting and filtering neutrality in sentiment analysis
Inf. Fusion
(2018)
RenY. et al.
A topic-enhanced word embedding for twitter sentiment classification
Inform. Sci.
(2016)
KatsumiM. et al.
Ontologies for transportation research: A survey
Transp. Res. Part C
(2018)
García-PablosA. et al.
W2VLDA: Almost unsupervised system for aspect based sentiment analysis
Expert Syst. Appl.
(2018)
KamkarhaghighiM. et al.
Content tree word embedding for document representation
Expert Syst. Appl.
(2017)
AliF. et al.
Opinion mining based on fuzzy domain ontology and support vector machine: A proposal to automate online review classification
Appl. Soft Comput.
(2016)

BobilloF. et al.

Fuzzy ontology representation using OWL 2

Internat. J. Approx. Reason.

(2011)

Rodríguez-GarcíaM.Á. et al.

Ontology-based annotation and retrieval of services in the cloud

Knowl.-Based Syst.

(2014)

PengH. et al.

Incremental term representation learning for social network analysis

Future Gener. Comput. Syst.

(2018)

AliF. et al.

Opinion mining based on fuzzy domain ontology and support vector machine: A proposal to automate online review classification

Appl. Soft Comput. J.

(2016)

DaiX. et al.

From social media to public health surveillance: Word embedding based clustering method for twitter classification

LeQ.V. et al.

Distributed Representations of Sentences and Documents, Vol. 32

(2014)

Salas-ZárateM.D.P. et al.

Sentiment analysis on tweets about diabetes: An aspect-level approach

Comput. Math. Methods Med.

(2017)

ClavelC. et al.

Sentiment analysis: From opinion mining to human-agent interaction

IEEE Trans. Affect. Comput.

(2016)

KrouskaA. et al.

Comparative evaluation of algorithms for sentiment analysis over social networking services

J. UCS

(2017)

ShibuyaY.

Public Sentiment and Demand for Used Cars after A Large-Scale Disaster : Social Media Sentiment Analysis with Facebook Pages

(2018)

A. Teixeira, Data extraction and preparation to perform a The example of a Facebook fashion brand page,...

MarquezF.B.

Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis, Vol. 1994

(2017)

SongJ. et al.

A novel classification approach based on Naïve Bayes for Twitter sentiment analysis, Vol. 11

(2017)

AliF. et al.

Merged Ontology and SVM-Based Information Extraction and Recommendation System for Social Robots, Vol. 5

(2017)

ChangC. et al.

LIBSVM : A library for support vector machines

ACM Trans. Intell. Syst. Technol. (TIST)

(2013)

EffendyV. et al.

Sentiment Analysis on Twitter about the Use of City Public Transportation Using Support Vector Machine Method

(2011)

GattiL. et al.

SentiWords: Deriving a high precision and high coverage lexicon for sentiment analysis

IEEE Trans. Affect. Comput.

(2016)

SantoshD.T. et al.

Opinion mining of online product reviews from traditional LDA topic clusters using feature ontology tree and sentiwordnet

Int. J. Educ. Manag. Eng.

(2016)

ZhaoW. et al.

Weakly-supervised deep embedding for product review sentiment analysis

IEEE Trans. Knowl. Data Eng.

(2017)

DragoniM. et al.

A neural word embeddings approach for multi-domain sentiment analysis

IEEE Trans. Affective Comput.

(2017)

PereiraF.C. et al.

Transport overcrowding with internet data

IEEE Trans. Intell. Transp. Syst.

(2015)

Grant-mullerS.M. et al.

Enhancing Transport Data Collection Through Social Media Sources: Methods, Challenges and Opportunities for Textual Data

(2014)

DasS. et al.

Text mining and topic modeling of compendiums of papers from transportation research board annual meetings

Transp. Res. Rec.: J. Transp. Res. Board

(2016)

AbberleyL. et al.

Modelling road congestion using ontologies for big data analytics in smart cities

PereiraJ.F.F.

Social Media Text Processing and Semantic Analysis for Smart Cities

(2017)

Riazul IslamS.M. et al.

The IoT: Exciting possibilities for bettering lives: Special application scenarios

IEEE Consum. Electron. Mag.

(2016)

AliK. et al.

Sentiment analysis as a service: A social media based sentiment analysis framework

Cited by (149)

Transforming sentiment analysis for e-commerce product reviews: Hybrid deep learning model with an innovative term weighting and feature selection
2024, Information Processing and Management
Improving user satisfaction by analyzing many user reviews found on e-commerce platforms is becoming increasingly significant in this modern world. However, accurately predicting sentiment polarities within these reviews remains challenging due to variable sequence lengths, textual orders, and complex logic within the content. This study introduces a new optimized Machine Learning (ML) algorithm named Enhanced Golden Jackal Optimizer-based Long Short-Term Memory (EGJO-LSTM) to perform Sentiment Analysis (SA) of e-commerce product reviews. This SA method comprises four critical stages: data collection, pre-processing, feature selection, feature extraction, and lastly, sentiment classification. The initial step involves utilizing a web scrapping tool to collate customer product reviews from various e-commerce websites. The collected data is subjected to a pre-processing phase to refine the scraped information. The pre-processed data then undergoes term weighting and feature selection processes by applying Log-term Frequency-based Modified Inverse Class Frequency (LF-MICF) and Improved Grey Wolf Optimizer (IGWO). In the final stage, the refined IGWO data is fed into the EGJO-LSTM model, which then classifies the sentiment of the shopper reviews into negative, positive, or neutral classes. Performance analysis was conducted using a prompt cloud dataset from Amazon.com, comparing the proposed classifier with state-of-the-art ML models. The metrics, such as precision, accuracy, recall and F1-score, were used to compare the performance. The results demonstrate that the EGJO-LSTM outperforms other models in sentiment classification. The proposed strategy is 25% and 32% better than the traditional and hybrid methods in terms of precision and accuracy. Further observations showed that when using the recommended LF-MICF weighting method, the EGJO-LSTM surpassed the performance of the state-of-the-art methods.
Progress, achievements, and challenges in multimodal sentiment analysis using deep learning: A survey
2024, Applied Soft Computing
Sentiment analysis is a computational technique that analyses the subjective information conveyed within a given expression. This encompasses appraisals, opinions, attitudes or emotions towards a particular subject, individual, or entity. Conventional sentiment analysis solely considers the text modality and derives sentiment by identifying the semantic relationship between words within a sentence. Despite this, certain expressions, such as exaggeration, sarcasm and humor, pose a challenge for automated detection when conveyed only through text. Multimodal sentiment analysis incorporates various forms of data, such as visual and acoustic cues, in addition to text. By utilizing fusion analysis, this approach can more precisely determine the implied sentiment polarity, which includes positive, neutral, and negative sentiments. Thus, the recent advancements in deep learning have boosted the domain of multimodal sentiment analysis to new heights. The research community has also shown significant interest in this topic due to its potential for both practical application and educational research. In light of this fact, this paper aims to present a thorough analysis of recent ground-breaking research studies conducted in multimodal sentiment analysis, which employs deep learning models across various modalities such as text, audio, image, and video. Furthermore, the article dives into a discussion of the multiple categories of multimodal data, diverse domains in which multimodal sentiment analysis can be applied, a range of operations that are integral to multimodal sentiment analysis, deep learning architectures, a variety of fusion methods, challenges associated with multimodal sentiment analysis, and the benchmark datasets in addition to the state-of-the-art approaches. The ultimate goal of this survey is to indicate the success of deep learning architectures in tackling the complexities associated with multimodal sentiment analysis.
BdSL47: A complete depth-based Bangla sign alphabet and digit dataset
2023, Data in Brief
Sign Language Recognition (SLR) is crucial for enabling communication between the deaf-mute and hearing communities. Nevertheless, the development of a comprehensive sign language dataset is a challenging task due to the complexity and variations in hand gestures. This challenge is particularly evident in the case of Bangla Sign Language (BdSL), where the limited availability of depth datasets impedes accurate recognition. To address this issue, we propose BdSL47, an open-access depth dataset for 47 one-handed static signs (10 digits, from ০ to ৯; and 37 letters, from অ to ँ) of BdSL. The dataset was created using the MediaPipe framework for extracting depth information. To classify the signs, we developed an Artificial Neural Network (ANN) model with a 63-node input layer, a 47-node output layer, and 4 hidden layers that included dropout in the last two hidden layers, an Adam optimizer, and a ReLU activation function. Based on the selected hyperparameters, the proposed ANN model effectively learns the spatial relationships and patterns from the depth-based gestural input features and gives an F1 score of 97.84 %, indicating the effectiveness of the approach compared to the baselines provided. The availability of BdSL47 as a comprehensive dataset can have an impact on improving the accuracy of SLR for BdSL using more advanced deep-learning models.
Changing or unchanging Chinese attitudes toward ride-hailing? A social media analytics perspective from 2018 to 2021
2023, Transportation Research Part A: Policy and Practice
Despite the global popularity of ride-hailing services, frequent events in recent years have caused the public to query and even refuse to adopt ride-hailing. The public’s attitudes are the decisive factor affecting ride-hailing development. Previous studies identified the public’s attitude toward ride-hailing at a one-time point through questionnaire design without monitoring changes in the public’s attitude. This study aims to analyze the evolution and reasons for the public’s attitudes toward ride-hailing in terms of trend lines and significant events. We collected 114,361 comments across social media platforms (Sina Weibo and Tik Tok) on Chinese ride-hailing events from May 2018 to September 2021. Through sentiment analysis to investigate the evolution of the public’s attitudes toward ride-hailing, we identified four significant events that led to significant changes in public attitudes. We then employed Latent Dirichlet Allocation (LDA) topic model and text network analysis to examine the comments in these significant events to understand the exact reasons for the change in attitudes. The results indicate that the public’s attitude variations are closely linked with social events. Meanwhile, among all topics, Platform and Safety are constant public concerns. Other topics (e.g., Publicity, Government, and Pity) of public concern are related to significant events. The gap between perceived and actual security toward ride-hailing services seems to exist. We also found gender discrimination against females in ride-hailing events. The findings provide valuable insights into the future development of ride-hailing.
Block-level dependency syntax based model for end-to-end aspect-based sentiment analysis
2023, Neural Networks
End-to-End aspect-based sentiment analysis (E2E-ABSA) aims to jointly extract aspect terms and identify their sentiment polarities. Although previous research has demonstrated that syntax knowledge can be beneficial for E2E-ABSA, standard syntax dependency parsing struggles to capture the block-level relation between aspect and opinion terms, which hinders the role of syntax in E2E-ABSA. To address this issue, this paper proposes a block-level dependency syntax parsing (BDEP) based model to enhance the performance of E2E-ABSA. BDEP is constructed by incorporating routine dependency syntax parsing and part-of-speech tagging, which enables the capture of block-level relations. Subsequently. the BDEP-guided interactive attention module (BDEP-IAM) is used to obtain the aspect-aware representation of each word. Finally the adaptive fusion module is leveraged to combine the semantic-syntactic representation to simultaneously extract the aspect term and identify aspect-orient sentiment polarity. The model is evaluated on five benchmark datasets, including Laptop14, Rest _ALL, Restaurant14, Restaurant15, and TWITTER, with F1 scores of 62.67%, 76.53%, 75.42%, 62.21%, and 58.03%, respectively. The results show that our model outperforms the other compared state-of-the-art (SOTA) methods on all datasets. Additionally, ablation experiments confirm the efficacy of BDEP and IAM in improving aspect-level sentiment analysis.
THAT-Net: Two-layer hidden state aggregation based two-stream network for traffic accident prediction
2023, Information Sciences
Traffic accident prediction based on dashboard cameras is of great significance for ensuring the guaranteed-safety of self-driving systems and reducing the occurrence of accidents. Due to complexity of traffic scenarios and wide range of object's motion, early prediction of accidents remains a challenge. However, the existing methods process all traffic objects and all video frames indiscriminately, thus leading to poor prediction performance. To address this issue, we propose a novel traffic accident prediction framework, namely two-layer hidden state aggregation based two-stream network (THAT-Net). The proposed method first fuses the spatial and temporal flow to capture complementary spatio-temporal information, which filters out irrelevant objects in traffic scenes to reduce the influence of the object's motion state. Furthermore, a two-layer hidden state aggregation structure is designed to reintegrate the hidden state weights of gated recurrent units. It captures contextual information through frame-level and segment-level aggregation and decreases the influence of irrelevant frames to reduce the complexity of traffic scenes. Experiments based on two real-world datasets show the state-of-the-art performance of THAT-Net. Our proposed method achieves the highest accuracy by predicting accidents 0.48 sec to 2.8 sec earlier compared to the baseline methods in more challenging situations. Our code is available at: https://github.com/redeyezt/THAT-Net.

View all citing articles on Scopus

View full text

Transportation sentiment analysis using word embedding and ontology-based topic modeling

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed approach

Topic modeling and word embedding

Experiments

Conclusion

Acknowledgment

Decis. Support Syst.

IATSS Res.

Int. J. Approx. Reason.

Transp. Res. Part C

Inf. Fusion

Inform. Sci.

Transp. Res. Part C

Expert Syst. Appl.

Expert Syst. Appl.

Appl. Soft Comput.

Internat. J. Approx. Reason.

Knowl.-Based Syst.

Future Gener. Comput. Syst.

Appl. Soft Comput. J.

From social media to public health surveillance: Word embedding based clustering method for twitter classification

Distributed Representations of Sentences and Documents, Vol. 32

Sentiment analysis on tweets about diabetes: An aspect-level approach

Comput. Math. Methods Med.

Sentiment analysis: From opinion mining to human-agent interaction

IEEE Trans. Affect. Comput.

Comparative evaluation of algorithms for sentiment analysis over social networking services

J. UCS

Public Sentiment and Demand for Used Cars after A Large-Scale Disaster : Social Media Sentiment Analysis with Facebook Pages

Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis, Vol. 1994

A novel classification approach based on Naïve Bayes for Twitter sentiment analysis, Vol. 11

Merged Ontology and SVM-Based Information Extraction and Recommendation System for Social Robots, Vol. 5

LIBSVM : A library for support vector machines

ACM Trans. Intell. Syst. Technol. (TIST)

Sentiment Analysis on Twitter about the Use of City Public Transportation Using Support Vector Machine Method

SentiWords: Deriving a high precision and high coverage lexicon for sentiment analysis

IEEE Trans. Affect. Comput.

Opinion mining of online product reviews from traditional LDA topic clusters using feature ontology tree and sentiwordnet

Int. J. Educ. Manag. Eng.

Weakly-supervised deep embedding for product review sentiment analysis

IEEE Trans. Knowl. Data Eng.

A neural word embeddings approach for multi-domain sentiment analysis

IEEE Trans. Affective Comput.

Transport overcrowding with internet data

IEEE Trans. Intell. Transp. Syst.

Enhancing Transport Data Collection Through Social Media Sources: Methods, Challenges and Opportunities for Textual Data

Text mining and topic modeling of compendiums of papers from transportation research board annual meetings

Transp. Res. Rec.: J. Transp. Res. Board

Modelling road congestion using ontologies for big data analytics in smart cities

Social Media Text Processing and Semantic Analysis for Smart Cities

The IoT: Exciting possibilities for bettering lives: Special application scenarios

IEEE Consum. Electron. Mag.

Sentiment analysis as a service: A social media based sentiment analysis framework