Abstract
The number of news articles published on the Web has had a dramatic increase. News websites are overwhelmed daily with articles, and their processing and classification is a challenge. Reading news from the web has become an important citizen’s information source, and its classification can show relevant information about social or cultural patterns on society. In this context, techniques that can automatically analyze and classify news articles are essential. In particular, data mining and machine learning techniques have been applied for the classification of web news, as they can detect structural patterns based on documents characteristics. Their use requires specialized text processing and summarizing techniques. The objective of this study is to characterize data mining and machine learning techniques used for the web news classification, the datasets used, and the evaluation metrics. We performed a systematic literature mapping of 51 primary studies published between 2000 and 2019. We found that the most used techniques fall into these paradigms: clustering, support vector machines and generative models. Also, 33 studies used online data extracted from Internet’s news web pages, while 25 downloaded a previously published dataset. The most common metric is the F-measure, with 25 reports. In summary, several data mining and machine learning techniques have been applied to the automatic classification of web news, showing some trends regarding the techniques, datasets, and metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fisher, D., Hoff, A., Robertson, G., Hurst, M.: Narratives: a visualization to track narrative events as they develop. In: 2008 IEEE Symposium on Visual Analytics Science and Technology, pp. 115–122, October 2008
Arce, J.: Medios de comunicación de masas en costa rica: entre la digitalización, la convergencia y el auge de los “new media”, PROSIC, Informe del Programa de la Sociedad de la Información el Conocimiento, pp. 283–307. Universidad de Costa Rica, San José (2012)
Iglesias, J., Tiemblo, A., Ledezma, A., Sanchis, A.: Web news mining in an evolving framework. Inf. Fusion 28, 90–98 (2016)
Mittermayer, M., Knolmayer, G.: A survey. Institut für Wirtschaftsinformatik der Universitát Bern, Text mining systems for market response to news (2006)
Berendt, B.: Text mining for news and blogs analysis. In: Encyclopedia of Machine Learning, pp. 968–972 (2017)
Mladenić, M., Brank, J., Grobelnik, M.: Document Classification. Encyclopedia of Machine Learning, pp. 968–972 (2017)
Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B., Liu, X.: Learning approaches for detecting and tracking news events. Intell. Syst. Appl. 14, 32–43 (1999)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, Burlington (2005)
Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning: from theory to algorithms. In: Volume 9781107057135 of Understanding Machine Learning: From Theory to Algorithms, pp. 1–397 (2013). Cited by: 459
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Irfan, R., King, C.K., Grages, D., Ewen, S., Khan, S.U., Madani, S.A., Kolodziej, J., Wang, L., Chen, D., Rayes, A., Tziritas, N., Xu, C., Zomaya, A.Y., Alzahrani, A.S., Li, H.: A survey on text mining in social networks. Knowl. Eng. Rev. 30(2), 157–170 (2015). Cited by: 39
Bharti, D., Babu, K.: Automatic Keyword Extraction for Text Summarization: A Survey, April 2017. http://arxiv.org/abs/1704.03242
Castillo, E., Cervantes, O., Vilariño, D.: Text analysis using different graph based representations. Computacion y Sistemas 21(4), 581–599 (2017). Cited by: 1
Petersen, K., Vakkalanka, S., Kuzniarz, L.: Guidelines for conducting systematic mapping studies in software engineering: an update. Inf. Softw. Technol. 64, 1–18 (2015)
Kitchenham, B., Charters, S.: Guidelines for performing Systematic Literature reviews in Software Engineering Version 2.3. Engineering 45(d), 1051 (2007)
Basili, V., Gianluigi, C., Rombach, D.: The goal question metric approach. In: Encyclopedia of Software Engineering, pp. 528–532 (1994)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techiques. Morgan Kaufmann Publishers, Burlington (2012)
Maghdid, H.: Web news mining using new features: a comparative study. IEEE Access 7, 5626–5641 (2019)
Bouras, C., Tsogkas, V.: Assigning web news to clusters, pp. 1–6 (2010)
Dadgar, S.M.H., Araghi, M.S., Farahani, M.M.: A novel text mining approach based on TF-IDF and support vector machine for news classification. In: 2016 IEEE International Conference on Engineering and Technology (ICETECH), pp. 112–116, March 2016
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pandolfi-González, M., Quesada-López, C., Martínez, A., Jenkins, M. (2021). Automatic Classification of Web News: A Systematic Mapping Study. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-55187-2_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)