Skip to main content
Log in

Feature selection method on twitter dataset with part-of-speech (PoS) pattern applied to traffic analysis

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

In day-to-day life transportation plays a major role in cities. Present day traffic management is a complex task for transportation agencies through traditional approaches, hence Intelligent Transportation systems is applied to give traffic management solutions like parking, E-toll charge and traffic control by analyzing data from related sources. Data is collected from various sources for analyzing transportation need’s, yet transportation issues remain one of the major tribulations in cities. Unstructureddata gives enormous information load for big data analytics, but the unstructured content processing is a challenge in industry. Passive data like social media data is a major data sources for Intelligent Systems, social media applications such as Twitter, Facebook where user can share live comments based on their interaction with the world is a rich source for passive data. Social media data helps in analyzing traffic issues like traffic jam, accident locations, road condition etc. Major issue with social media data is processing and analysis of data is very complex because of volume and data format. Big data architecture helps in extracting, processing, loading in database and analyzing this unstructured data. To identify thesentimentalanalysis is majorly classified based onpositive, negative and neutral tweets. As the polarity of neutral tweets is zero it cannot be used for Opinion mining. So, this paper is focused on Neutral tweets classification based on feature selection. Part of Speech (PoS) tagging is used for labeling the words of the text in the tweets to find nouns example location, date and time are compared with the other attribute values for improving the classification of neutral tweets. Research work shown in this paper has taken social media speech data (Tweets) from twitter as input and preprocessing techniques are applied on the data collected, Methods such as feature selection are then used to extract the features related to tweets for classifying neutral tweets for better understanding on road condition, identification of traffic patterns and finally traffic behavior is analyzed by using Ensemble machine learning algorithm. In the proposed model to measure the sentimental analysis a new approach is provided based on feature selection. The findings disclose with SentiWordNetopinion lexicon approach gives 56% accuracy of positive or negative opinion using twitter dataset, the results of feature selection-based opinion mining proposed model increased substantially with 88% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • https://data.world/chanalytics/2017-sxsw-twitter-traffic

  • https://tweetbinder.intercom-attachments-2.com/i/o/277710248/eb4bafe82db57dee755f493c/Traffic+Los+Angeles+Sample.xlsx

  • https://www.kaggle.com/mounicapremkumar/traffic-analysis-twitter-dataset

  • Aggarwal CC (2011).An introduction to social network data analytics. Soc Netw Data Anal.1–15. https://link.springer.com/book/https://doi.org/10.1007/978-1-4419-8462-3.

  • Ahmad IS, Bakar AA and Yaakub MR A review of feature selection in sentiment analysis using information gain and domain specific ontology. Int J Adv Comput Res, 9(44)

  • Alarifi A, Tolba A, Al-Makhadmeh Z et al (2020) A big data approach to sentiment analysis using greedy feature selection with cat swarm optimization-based long short-term memory neural networks. J Supercomput 76:4414–4429

    Article  Google Scholar 

  • Anveshrithaa S, Lavanya K (2020) Real-time vehicle traffic analysis using long short term memory networks in apache spark. In: 2020 international conference on emerging trends in information technology and engineering (ic-ETITE), pp 1–5. https://doi.org/10.1109/ic-ETITE47903.2020.97

  • Arora M, Kansal V (2019) Character level embedding with deep recurrental neural network for text normalization of unstructured data for twitter sentiment analysis. Soc Netw Anal Min 9:12

    Article  Google Scholar 

  • Asghar MZ, Khan A, Khan F et al (2018) RIFT: a rule induction framework for twitter sentiment analysis. Arab J Sci Eng 43:857–877

    Article  Google Scholar 

  • Asriyanti A, Ilyas AA, Yulita Y (2020) The correlation between the completeness of patient anamnesis form and external causes diagnosis code accuracy in bahagia hospital makassar. International Proceedings The 2nd ISMoHIM 2020

  • Basheer S, Anbarasi M, Sakshi DG et al (2020) Efficient text summarization method for blind people using text mining techniques. Int J Speech Technol 23:713–725. https://doi.org/10.1007/s10772-020-09712-z

    Article  Google Scholar 

  • Batrinca B, Treleaven PC (2015) Social media analytics: a survey of techniques, tools and platforms. AI Soc 30:89–116

    Article  Google Scholar 

  • Bouazizi M, Ohtsuki T (2017) A pattern-based approach for multiclass sentiment analysis in Twitter. IEEE Access 5:20617–20639

    Article  Google Scholar 

  • Chen Y, Lv Y, Wang X, Li L, Wang F (2019) Detecting traffic information from social media texts with deep learning approaches. IEEE Trans Intell Transp Syst 20(8):3049–3058

    Article  Google Scholar 

  • Cheng Z, Jian S, Rashidi TH, Maghrebi M, Waller ST (2020) Integrating household travel survey and social media data to improve the quality of od matrix: a comparative case study. IEEE Trans Intell Transp Syst 21(6):2628–2636

    Google Scholar 

  • D’Andrea E, Ducange P, Lazzerini B, Marcelloni F (2015) Real-time detection of traffic from twitter stream analysis. IEEE Trans Intell Transp Syst 16(4):2269–2283

    Article  Google Scholar 

  • Figueiras P, Herga Z, Guerreiro G, Rosa A, Costa R, Jardim-Gonçalves R (2018). Real-time monitoring of road traffic using data stream mining. In 2018 IEEE international conference on engineering, technology and innovation, ICE/ITMC 2018–Proceedings [8436271] (International ICE conference on engineering technology and innovation). Institute of Electrical and Electronics Engineers Inc.

  • Fouad MM et al (2018) “Efficient Twitter Sentiment Analysis System with Feature Selection and Classifier Ensemble.” The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018), pp.516–527.

  • Gang Z (2015).Application of big data in intelligent traffic system. IOSR J Comput Eng (IOSR-JCE), e-ISSN: 2278–0661,p-ISSN: 2278–8727, 17(1), 01–04.

  • Gonen S, Roee A, Irad Ben-Gal A (2020) weighted information-gain measure for ordinal classification trees, Expert Syst Appl, 152, 113375,ISSN 0957–4174

  • Gong VX, Yang J, Daamen W, Bozzon A, Hoogendoorn S, Houben GJ (2018) Using social media for attendees density estimation in city-scale events. IEEE Access 6:36325–36340

    Article  Google Scholar 

  • Hassonah MA, Al-Sayyed R, Rodan A, Ala’ M. AZ, Aljarah I, Faris H, An efficient hybrid filter and evolutionary wrapper approach for sentiment analysis of various topics on Twitter, Knowl-Based Syst, 192,2020.

  • Hussein K. Al-Khafaji HK, Habeeb AT (2017). Efficient algorithms for preprocessing and stemming of tweets in a sentiment analysis system. IOSR J Comput Eng (IOSR-JCE) e-ISSN: 2278–0661,p-ISSN: 2278–8727, 19(3), 44–50.

  • Jianqiang Z, Xiaolin G, Xuejun Z (2018) Deep recurrent neural networks for twitter sentiment analysis. IEEE Access 6:23253–23260

    Article  Google Scholar 

  • Karthikeyan T, Karthik Sekaran D, Ranjith Vinoth V, Balajee J (2019) Personalized content extraction and text classification using effective web scraping techniques. Int J Web Portals 11:41–52. https://doi.org/10.4018/IJWP.2019070103

    Article  Google Scholar 

  • Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of twitter posts, Expert Syst Appl, 40(10), 4065–4074,ISSN 0957–4174.

  • Lenormand M, Tugores A, Colet P, Ramasco JJ (2014) Tweets on the road. PLoS One 9(8):e105407. https://doi.org/10.1371/journal.pone.0105407

    Article  Google Scholar 

  • Liyong W and Vateekul P(2019) Improve traffic prediction using accident embedding on ensemble deep neural networks. 11th international conference on knowledge and smart technology (KST), pp. 11–16.

  • Lv Y, Chen Y, Zhang X, Duan Y, Li NL (2017) Social media based transportation research: the state of the work and the networking. IEEE/CAA J AutomaticaSinica 4(1):19–26

    Article  MathSciNet  Google Scholar 

  • Mohammad-Alikhani A, Rahnama M, Vahedi A (2020) Neighbors class solidarity feature selection for fault diagnosis of brushless generator using thermal imaging. In: IEEE transactions on instrumentation and measurement, vol. 69, no. 9, pp 6221–6227. https://doi.org/10.1109/TIM.2020.2972081

  • Mounica B and Lavanya K (2020) Social media data analysis for intelligent transportation systems. International conference on emerging trends in information technology and engineering (ic-ETITE), Vellore, India, 2020, pp. 1–8.

  • Nallaperuma D et al (2019) Online incremental machine learning platform for big data-driven smart traffic management. IEEE Trans Intell Transp Syst 20(12):4679–4690

    Article  Google Scholar 

  • Pratiwi AI (2018) On the feature selection and classification based on information gain for document sentiment analysis, Appl Comput Intell SoftComput, 2018, 1407817, 5, 2018.

  • Qiu X, Zhang L, Ren Y, Suganthan PN, Amaratunga G (2014) Ensemble deep learning for regression and time series forecasting. IEEE symposium on computational intelligence in ensemble learning (CIEL), 1–6.

  • Rettore PHL, Santos BP, Lopes RF, Maia G, Villas LA, Loureiro AAF (2020), Road data enrichment framework based on heterogeneous data fusion for ITS," In IEEE transactions on intelligent transportation systems, 21(4), 1751–1766.

  • Rodríguez-Ibáñez M, Gimeno-Blanes F, Cuenca-Jiménez PM, Muñoz-Romero S, Soguero C, Rojo-Álvarez JL (2020) On the statistical and temporal dynamics of sentiment analysis. IEEE Access 8:87994–88013

    Article  Google Scholar 

  • Rout JK, Choo KKR, Dash AK et al (2018) A model for sentiment and emotion analysis of unstructured social media text. Electron CommerRes 18:181–199

    Article  Google Scholar 

  • Shamantha RB, Shetty SM, Rai P (2019) Sentiment analysis using machine learning classifiers: evaluation of performance. In: 2019 IEEE 4th international conference on computer and communication systems (ICCCS). Singapore, pp 21–25

  • Sharma S, Jain A (2020) An empirical evaluation of correlation based feature selection for tweet sentiment classification. In: Gunjan V., Senatore S., Kumar A., Gao XZ., Merugu S. (eds) Advances in cybernetics, cognition, and machine learning for communication technologies. Lecture Notes in Electrical Engineering, vol 643. Springer, Singapore.

  • Troussas KC and Virvou M (2016) The effect of preprocessing techniques on Twitter sentiment analysis. 7th international conference on information, intelligence, systems & applications (IISA), Chalkidiki, 2016, pp. 1–5.

  • Tugores, Antònia; Colet, Pere, Mining online social networks with python to study urban mobility. In: Proceedings of the 6th European conference on python in science 2013.

  • Wadawadagi R, Pagi V (2020) Sentiment analysis with deep neural networks: comparative study and performance assessment. ArtifIntell Rev 53:6155–6195

    Google Scholar 

  • Wang F et al (2016) Where does AlphaGo go: from church-turing thesis to AlphaGo thesis and beyond. In: IEEE/CAA journal of automatica sinica, vol. 3, no. 2, pp 113–120. https://doi.org/10.1109/JAS.2016.7471613

  • Wang D et al (2017) Modulation format recognition and OSNR estimation using CNN-based deep learning. In: IEEE photonics technology letters, vol. 29, no. 19, pp 1667–1670. https://doi.org/10.1109/LPT.2017.2742553

  • Wang Y, Kim K, Lee B et al (2018) Word clustering based on POS feature for efficient twitter sentiment analysis. Hum Cent Comput Inf Sci 8:17

    Article  Google Scholar 

  • Xia HuEmail Huan Liu (2012) Text analytics in social media. Text analytics in social media, Mining text data pp 385–414 SpringerScience+Business Media.

  • Yiming G, Qian ZS, Chen F (2016) From twitter to detector: real-time traffic incident detection using social media data. Transp Res Part C: Emerg Technol 67:321–342

    Article  Google Scholar 

  • Yu L, Lai KK, Wang S (2008) Multistage RBF neural network ensemble learning for exchange rates forecasting. Neurocomputing 2008(71):3295–3302

    Article  Google Scholar 

  • Zainuddin N, Selamat A, Ibrahim R (2018) Hybrid sentiment classification on twitter aspect-based sentiment analysis. Appl Intell 48:1218–1232

    Google Scholar 

  • Zheng X, Chen W, Wang P, Shen D, Chen S, Wang X, Zhang Q, and Yang L (2016). Big data for social transportation. IEEE transactions on intelligent transportation systems, vol. 17, no. 3.

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Lavanya.

Ethics declarations

Conflict of interest

No conflict of interest.

Human participants and/or animals

The authors assured there is no animals and humans involved in this research

Informed consent

No consent

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mounica, B., Lavanya, K. Feature selection method on twitter dataset with part-of-speech (PoS) pattern applied to traffic analysis. Int J Syst Assur Eng Manag 15, 110–123 (2024). https://doi.org/10.1007/s13198-022-01677-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-022-01677-3

Keywords

Navigation