Abstract
Fake news is an ever-growing concern in the modern age of the internet. Discerning fake information from the truthful is an important task given the simplicity of sharing information digitally. In this paper, we present a data mining solution to classify articles as real or fake by using bag-of-words (BoW) and sequential mining techniques, and compare reliability for detecting fake news on various datasets. Specifically, our solution first cleans the input news by normalizing words and removing “filler” words. It then uses the BoW or sequential mining techniques to vectorize cleaned data. Afterwards, it trains the classification models based on vectorized data and classifies unseen news as real or fake. Evaluation on real-life data shows the feasibility of our solution to mine and classify fake news.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
References
Argenzio, B., Amatucci, N., Botte, M., D'Acierno, L., Di Costanzo, L., Pariota, L.: The use of automatic vehicle location (AVL) data for improving public transport service regularity. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 3. LNNS, vol. 227, pp. 667–676. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75078-7_66
Leung, C.K., et al.: Data mining on open public transit data for transportation analytics during pre-COVID-19 era and COVID-19 era. In: Barolli, L., Li, K.F., Miwa, H. (eds.) INCoS 2020. AISC, vol. 1263, pp. 133–144. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-57796-4_13
Xhafa, F., Aly, A., Juan, A.A.: Optimization of task allocations in cloud to fog environment with application to intelligent transportation systems. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 1. LNNS, vol. 225, pp. 1–12. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75100-5_1
Leung, C.K.-S., Tanbeer, S.K., Cameron, J.J.: Interactive discovery of influential friends from social networks. Social Netw. Anal. Min. 4(1), 154:1–154:13 (2014). https://doi.org/10.1007/s13278-014-0154-z
Leung, C.K., et al.: Parallel social network mining for interesting ‘following’ patterns. Concurr. Computat. Pract. Exp. 28(15), 3994–4012 (2016)
Honda, M., Toshima, J., Suganuma, T., Takahashi, A.: Design of healthcare information sharing methods using range-based information disclosure incentives. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 1. LNNS, vol. 225, pp. 758–769. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75100-5_64
Leung, C.K., Kaufmann, T.N., Wen, Y., Zhao, C., Zheng, H.: Revealing COVID-19 data by data mining and visualization. In: Barolli, L., Chen, H.-C., Miwa, H. (eds.) INCoS 2021. LNNS, vol. 312, pp. 70–83. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-84910-8_8
Souza, J., Leung, C.K., Cuzzocrea, A.: An Innovative big data predictive analytics framework over hybrid big data sources with an application for disease analytics. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds.) AINA 2020. AISC, vol. 1151, pp. 669–680. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44041-1_59
Braun, P., et al.: Game data mining: clustering and visualization of online game data in cyber-physical worlds. Proc. Comput. Sci. 112, 2259–2268 (2017)
Anderson-Gregoire, I.M., et al.: A big data science solution for analytics on moving objects. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 2. LNNS, vol. 226, pp. 133–145. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75075-6_11
Atif, F., Rodriguez, M., Araujo, L.J.P., Amartiwi, U., Akinsanya, B.J., Mazzara, M.: A survey on data science techniques for predicting software defects. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 3. LNNS, vol. 227, pp. 298–309. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75078-7_31
Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8
Leung, C.K., et al.: Distributed uncertain data mining for frequent patterns satisfying anti-monotonic constraints. In: IEEE AINA Workshops 2014, pp. 1–6 (2014)
Leung, C.K., et al.: Fast algorithms for frequent itemset mining from uncertain data. In: IEEE ICDM 2014, pp. 893–898 (2014)
Liu, C., Li, X.: Mining method based on semantic trajectory frequent pattern. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 2. LNNS, vol. 226, pp. 146–159. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75075-6_12
Ni, J., Yin, W., Jiang, Y., Zhao, J., Hu, Y.: Periodic mining of traffic information in industrial control networks. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds.) AINA 2020. AISC, vol. 1151, pp. 176–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44041-1_16
Ngaffo, A.N., El Ayeb, W., Choukair, Z.: An IP multimedia subsystem service discovery and exposure approach based on opinion mining by exploiting Twitter trending topics. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds.) AINA 2019. AISC, vol. 926, pp. 431–445. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-15032-7_37
Ahn, S., et al.: A fuzzy logic based machine learning tool for supporting big data business analytics in complex artificial intelligence environments. In: FUZZ-IEEE 2019, pp. 1259–1264 (2019)
Leung, C.K.: Mathematical model for propagation of influence in a social network. In: Alhajj, R., Rokne, J. (eds) Encyclopedia of Social Network Analysis and Mining, 2nd edn., pp. 1261–1269. Springer, New York (2018). https://doi.org/10.1007/978-1-4939-7131-2_110201
Shu, K., et al.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explorat. 19(1), 22–36 (2017)
Whittaker, J.P.: Tech Giants, Artificial Intelligence and the Future of Journalism. Routledge, New York (2019)
Christin, A.: Metrics at Work: Journalism and the Contested Meaning of Algorithms. Princeton University Press (2020)
Sriram, S.: An Evaluation of Text Representation Techniques for Fake News Detection Using: TF-IDF, Word Embeddings, Sentence Embeddings with Linear Support Vector Machine. M.Sc. Dissertation, Technological University Dublin (2020). https://doi.org/10.21427/5519-h979
Hartley, K., Vu, M.K.: Fighting fake news in the COVID-19 era: policy insights from an equilibrium model. Policy Sci. 53(4), 735–758 (2020). https://doi.org/10.1007/s11077-020-09405-z
Horne, B.D., Adah, S.: This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In: ICWSM 2017 Workshop W7 on NECO, pp. 759–766 (2017). https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15772/14898
Ibrishimova M.D., Li K.F.: A machine learning approach to fake news detection using knowledge verification and natural language processing. In: Barolli L., Nishino H., Miwa H. (eds) INCoS 2019. AISC, vol. 1035, pp. 223–234. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29035-1_22
Shu, K., et al.: Mining disinformation and fake news: concepts, methods, and recent advancements. In: Disinformation, Misinformation, and Fake News in Social Media, pp. 1–19 (2020)
Pérez-Rosas, V., et al.: Automatic detection of fake news. In: COLING 2018, pp. 3391–3401 (2018). https://aclanthology.org/C18-1287
Pei, J., et al.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE TKDE 16(11), 1424–1440 (2004)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)
Acknowledgments
This project is partially supported by (a) Natural Sciences and Engineering Research Council of Canada (NSERC) and (b) University of Manitoba.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cabusas, R.M., Epp, B.N., Gouge, J.M., Kaufmann, T.N., Leung, C.K., Tully, J.R.A. (2022). Mining for Fake News. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2022. Lecture Notes in Networks and Systems, vol 450. Springer, Cham. https://doi.org/10.1007/978-3-030-99587-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-99587-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99586-7
Online ISBN: 978-3-030-99587-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)