Abstract
Considering the rampant spread of information over the web, identifying the credibility of these contents is challenging. Although numerous automated approaches have been defined in the literature for veracity classification, generating a relevant and rich set of features is still a need of time. To fill the gaps mentioned above, the authors in this research have developed a novel feature mash-up approach, which consists of stance, pragmatic, and sentiment features. Further, veracity assessment algorithm (VAA) is proposed based on the newly generated feature bag, which assigns weights to the novel features using linear regression and classifies the veracity of information. Exhaustive experimentation showed that VAA outperformed other machine learning, ensemble learning, baseline classifier, and baseline studies in the literature with 91.40% accuracy. Further, when implemented with an incremental learning approach, the VAA showed an improved accuracy of 94.47%. To test the robustness of the algorithm, the experimentation was performed on two datasets, wherein VAA outperformed other algorithms in both the datasets. Therefore, newly generated feature bags can be used separately to classify stances, sentiments, and pragmatics in the natural language processing problems, and can assist in solving the problems from other research areas such as hate speech and sarcasm detection.










Similar content being viewed by others
Data availability
Dataset will be accessible through Kaggle.
References
Xue J, Wang Y, Tian Y, Li Y, Shi L, Wei L (2021) Detecting fake news by exploring the consistency of multimodal data. Inf Process Manag 58(5):102610. https://doi.org/10.1016/j.ipm.2021.102610
Bensouda N, El Fkihi S, Faizi R (2024) A novel ensemble model for detecting fake news. IAES Int J Art Intell 13(1):1160–1171. https://doi.org/10.11591/ijai.v13.i1.pp1160-1171
Pattanaik B, Mandal S, Tripathy RM (2023) A survey on rumor detection and prevention in social media using deep learning. Knowl Inf Syst 65(10):3839–3880. https://doi.org/10.1007/s10115-023-01902-w
Jamialahmadi S, Sahebi I, Sabermahani MM, Shariatpanahi SP, Dadlani A, Maham B (2022) Rumor stance classification in online social networks: the state-of-the-art, prospects, and future challenges. IEEE Access 10:113131–113148. https://doi.org/10.1109/ACCESS.2022.3216835
Samuel H, Zaïane O (2018) Medfact Towards improving veracity of medical information in social media using applied machine learning. In: Cheung JC, Bagheri E (eds) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Cham
Zhang X, Gao W (2024) Predicting viral rumors and vulnerable users with graph-based neural multi-task learning for infodemic surveillance. Inf Process Manag. https://doi.org/10.1016/j.ipm.2023.103520
Nguyen VC, Birnbaum M, De Choudhury M (2023) “Understanding and Mitigating Mental Health Misinformation on Video Sharing Platforms, In: ” CHI ’23: ACM Conference on Human Factors in Computing Systems, April 23â•fi28, Hamburg, Germany, vol 1, no 1, pp 1–5, 2023
Castillo C, Mendoza M, Poblete B (2011) “Information credibility on Twitter,” In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, no January, pp 675–684 https://doi.org/10.1145/1963405.1963500
Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake News Detection on Social Media: A Data Mining Perspective. SIGKDD Explor Newsl. https://doi.org/10.1145/3137597.3137600
Alsaif HF, Aldossari HD (2023) Review of stance detection for rumor verification in social media. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2022.105801
ALDayel A, Magdy W (2021) Stance detection on social media: state of the art and trends. Inf Process Manag. https://doi.org/10.1016/j.ipm.2021.102597
Ma J, Gao W, Wong K-F (2018) “Detect Rumor and Stance Jointly by Neural Multi-task Learning,” In: The Web Conference 2018 - Companion of the World Wide Web Conference, WWW 2018, Association for Computing Machinery, pp 585–593 https://doi.org/10.1145/3184558.3188729
Yang R, Ma J, Lin H, Gao W (2022) “A Weakly Supervised Propagation Model for Rumor Verification and Stance Detection with Multiple Instance Learning,” In: SIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, pp 1761–1772 https://doi.org/10.1145/3477495.3531930.
Islam MR, Muthiah S, Ramakrishnan N, (2019) “Rumorsleuth: Joint detection of rumor veracity and user stance,” In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019, F S, W C, X X, (Eds.), Association for Computing Machinery, pp 131–136 https://doi.org/10.1145/3341161.3342916
Pamungkas EW, Basile V,Patti V (2019) “Stance classification for rumour analysis in Twitter: Exploiting affective information and conversation structure,” In: CEUR Workshop Proceedings, A C, F B, D G, (Eds.), CEUR-WS 2019
Masood R, Aker A (2018) “The fake news challenge: Stance detection using traditional machine learning approaches,”In: IC3K 2018 - Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, vol 3, no Kmis, pp 128–135, 2018 https://doi.org/10.5220/0006898801280135
Enayet O, El-Beltagy SR (2017) “NileTMRG at SemEval-2017 Task 8: Determining Rumour and Veracity Support for Rumours on Twitter,” In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2017, pp 470–474
Aljrees T et al (2023) Fake news stance detection using selective features and FakeNET. PLoS ONE. https://doi.org/10.1371/journal.pone.0287298
De Magistris G, Russo S, Roma P, Starczewski JT, Napoli C (2022) An explainable fake news detector based on named entity recognition and stance classification applied to COVID-19. Information (Switzerland) 13(3):1–14. https://doi.org/10.3390/info13030137
Suhaimin MSM, Hijazi MHA, Alfred R, Coenen F (2019) Modified framework for sarcasm detection and classification in sentiment analysis. Indon J Elect Eng Comput Sci 13(3):1175–1183. https://doi.org/10.11591/ijeecs.v13.i3.pp1175-1183
Zhang R, Liu N (2014) “Recognizing humor on twitter,” In: CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management, pp 889–898, 2014 https://doi.org/10.1145/2661829.2661997
Mane S, Khatavkar V (2023) “Polarity based Sarcasm Detection using Semigraph,” 2023
Barve Y, Saini JR, Kotecha K, Gaikwad H (2022) Detecting and fact-checking misinformation using ‘veracity scanning model.’ Int J Adv Comput Sci Appl 13(2):201–209. https://doi.org/10.14569/IJACSA.2022.0130225
Barve Y, Saini JR (2023) Detecting and classifying online health misinformation with ‘content similarity measure (CSM)’ algorithm: an automated fact-checking-based approach. J Supercomput. https://doi.org/10.1007/s11227-022-05032-y
Meel P, Vishwakarma DK (2020) Fake news, rumor, information pollution in social media and web: a contemporary survey of state-of-the-arts, challenges and opportunities. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2019.112986
Przybyła P, Soto AJ (2021) When classification accuracy is not enough: explaining news credibility assessment. Inf Process Manag. https://doi.org/10.1016/j.ipm.2021.102653
Zhou X, Jain A, Phoha VV, Zafarani R (2020) Fake news early detection: a theory-driven model. Digital Threats: Res Practice 1(2):1–25. https://doi.org/10.1145/3377478
Zhao Y, Da J, Yan J (2021) Detecting health misinformation in online health communities: incorporating behavioral features into machine learning based approaches. Inf Process Manag. https://doi.org/10.1016/j.ipm.2020.102390
Canhasi E, Shijaku R, Berisha E (2022) Albanian fake news detection. ACM Trans Asian Low-Resour Langu Inform Process. https://doi.org/10.1145/3487288
Sicilia R, Lo Giudice S, Pei Y, Pechenizkiy M, Soda P (2018) Twitter rumour detection in the health domain. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2018.05.019
Barve Y, Saini JR, Pal K, Kotecha K (2022) A novel evolving sentimental bag-of-words approach for feature extraction to detect misinformation. Int J Adv Comput Sci Appl 13(4):266–275. https://doi.org/10.14569/IJACSA.2022.0130431
Bai N, Wang Z, Meng F (2020) “A Stochastic Attention CNN Model for Rumor Stance Classification,” IEEE Access, 2020
Indah DR (2015) “Pragmatic Features in the Speaking Sections of Bahasa Inggris Untuk Sma/Ma Kelas Xi Semester 1,” Magister Scientiae, vol 0, no 37, pp 66–79
Bhatt S, Goenka N, Kalra S, Sharma Y (2022) Fake news detection: experiments and approaches beyond linguistic features. Lecture Notes on Data Eng Commun Technol 71:113–128. https://doi.org/10.1007/978-981-16-2937-2_9
Hardalov M, Arora A, Nakov P, Augenstein I (2022) “A Survey on Stance Detection for Mis- and Disinformation Identification,” In: Findings of the Association for Computational Linguistics: NAACL 2022 - Findings, Association for Computational Linguistics (ACL), 2022, pp. 1259–1277
Xuan K, Xia R (2019) “Rumor stance classification via machine learning with text, user and propagation features,” In: IEEE International Conference on Data Mining Workshops, ICDMW, P P, X C, Q H, (Eds.), IEEE Computer Society pp 560–566 https://doi.org/10.1109/ICDMW.2019.00085
Alturayeif N, Luqman H, Ahmed M (2023) A systematic review of machine learning techniques for stance detection and its applications. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08285-7
Vaideghy A, Thiyagarajan C (2023) An ensemble classification and hybrid feature selection approach for fake news stance detection. Int J Recent and Innov Trends in Comput Commun 11(March):28–39. https://doi.org/10.17762/ijritcc.v11i4s.6304
Margolin DB, Hannak A, Weber I (2018) Political fact-checking on twitter: when do corrections have an effect? Polit Commun 35(2):196–219. https://doi.org/10.1080/10584609.2017.1334018
Pamungkas EW, Basile V, Patti V (2019) “Stance classification for rumour analysis in Twitter: Exploiting Affective Information And Conversation Structure,” CEUR Workshop Proceedings, 2482
Bahuleyan H, Vechtomova O (2017) “UWaterloo at SemEval-2017 Task 8: Detecting Stance towards Rumours with Topic Independent Features,” In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2017, pp 461 –464
Ghanem B, Rosso P, Rangel F (2019) “Stance Detection in Fake News A Combined Feature Representation,” pp 66–71, 2019, https://doi.org/10.18653/v1/w18-5510
Hanselowski A et al. (2018) “A retrospective analysis of the fake news challenge stance detection task,” In: COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings, EM B, L D,P I (Eds.), Association for Computational Linguistics (ACL), 2018, pp 1859–1874
Shim E (2017) Hedges and boosters in academic writing. The Modern English Soc 18(3):71–90. https://doi.org/10.18095/meeso.2017.18.3.04
Gupta A, Li H, Farnoush A, Jiang W (2022) Understanding patterns of COVID infodemic: a systematic and pragmatic approach to curb fake news. J Bus Res 140:670–683. https://doi.org/10.1016/j.jbusres.2021.11.032
Stapleton A (2017) Deixis in Modern Linguistics. Article 9:1–9
Yang Y, Zheng L, Zhang J, Cui Q, Li Z, Yu PS (2018) “TI-CNN: Convolutional Neural Networks for Fake News Detection,” 2018
Kwon S, Cha M, Jung K, Chen W, Wang Y (2013) “Prominent features of rumor propagation in online social media,” In: Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 1103–1108, 2013 https://doi.org/10.1109/ICDM.2013.61
Yang FC, Lee AJT, Kuo SC (2016) Mining health social media with sentiment analysis. J Med Syst. https://doi.org/10.1007/s10916-016-0604-4
Morden JN, Khuman AS, Fasanmade A, Muhammad M (2022) A Fuzzy Logic Approach to a Hybrid Lexicon-Based Sentiment Analysis Detection Tool Using Healthcare Covid-19 News Articles. In: Chen T, Carter J, Mahmud M, Khuman AS (eds) Artificial Intelligence in Healthcare: Recent Applications and Developments. Springer, Singapore
Zang W, Zhang P, Zhou C, Guo L (2014) Comparative study between incremental and ensemble learning on data streams: case study. J Big Data 1(1):1–16. https://doi.org/10.1186/2196-1115-1-5
Ksieniewicz P, Zyblewski P, Choraś M, Kozik R, Giełczyk A, Woźniak M (2020) “fake news detection from data streams.” Proceed Int Joint Confer Neural Netw. https://doi.org/10.1109/IJCNN48605.2020.9207498
Habib A, Asghar MZ, Khan A, Habib A, Khan A (2019) False information detection in online content and its role in decision making: a systematic literature review. Soc Netw Anal Min. https://doi.org/10.1007/s13278-019-0595-5
Barve Y, Mulay P (2020) Bibliometric survey on incremental learning in text classification algorithms for false information detection. Libr Philos Pract 2020:2388–2392
Sanagar S, Gupta D (2020) Unsupervised genre-based multidomain sentiment lexicon learning using corpus-generated polarity seed words. IEEE Access 8:118050–118071. https://doi.org/10.1109/ACCESS.2020.3005242
Zeng L, Starbird K, Spiro ES (2016) “#Unconfirmed: Classifying rumor stance in crisis-related social media messages,” In: Proceedings of the 10th International Conference on Web and Social Media, ICWSM 2016, vol 892, no ICWSM, pp 747–750, 2016 https://doi.org/10.1609/icwsm.v10i1.14788
Ghanem B, Cignarella AT, Bosco C, Rosso P,Rangel F (2019) “UPV-28-UNITO at SemEval-2019 task 7: Exploiting post’s nesting and syntax information for rumor stance classification,” In: NAACL HLT 2019 - International Workshop on Semantic Evaluation, SemEval 2019, Proceedings of the 13th Workshop, Association for Computational Linguistics (ACL), 2019, pp 1125–1131
Salah I, Jouini K, Korbaa O (2023) On the use of text augmentation for stance and fake news detection. J Inform Telecommun 7(3):359–375. https://doi.org/10.1080/24751839.2023.2198820
Zhou X, Mulay A, Ferrara E, Zafarani R (2020) “ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research,” In: International Conference on Information and Knowledge Management, Proceedings, Association for Computing Machinery, 2020, pp 3205–3212 https://doi.org/10.1145/3340531.3412880
Cui L, Lee D (2006) “CoAID: COVID-19 Healthcare Misinformation Dataset,” pp. 1–10, 2020, arXiv preprint arXiv:2006.00885
Barve Y, Saini JR, Pal K, Kotecha K (2022) A novel evolving sentimental bag-of-words approach for feature extraction to detect misinformation. Int J Adv Comput Sci Appl 3(4):266–275
Di Sotto S, Viviani M (2022) Health misinformation detection in the social web: an overview and a data science approach. Int J Environ Res Public Health. https://doi.org/10.3390/ijerph19042173
Dementieva D, Panchenko A (2021) “Cross-lingual evidence improves monolingual fake news detection,” In: ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Student Research Workshop, pp 310–320, 2021 https://doi.org/10.18653/v1/2021.acl-srw.32
Barve Y, Saini JR (2022) “A Novel Text Resemblance Index Method for Reference-based Fact-checking,” In: 3rd IEEE 2022 International Conference on Computing, Communication, and Intelligent Systems, ICCCIS 2022, S. M. K. M. J. V Nand P. Singh M., Ed., Institute of Electrical and Electronics Engineers Inc., 2022, pp 829–836 https://doi.org/10.1109/ICCCIS56430.2022.10037728
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
The authors confirm their contribution to the paper: Shraddha Vaidya and Jatinderkumar Saini were involved in study conception and design; Shraddha Vaidya helped in data collection; Shraddha Vaidya and Jatinderkumar Saini contributed to analysis and interpretation of results; Shraddha Vaidya and Jatinderkumar Saini were involved in draft manuscript preparation. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saini, J.R., Vaidya, S. A veracity assessment algorithm for classification of healthcare information using feature bag mash-up approach. J Supercomput 81, 285 (2025). https://doi.org/10.1007/s11227-024-06500-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06500-3