Abstract
In recent years, public collaboration to address social challenges has become increasingly important for the sustainable development of local and global communities. When discussing how to tackle social challenges, it is essential to investigate relevant existing activities. This study aims to collect web articles on various social issues to support such discussions and research. For this purpose, we have developed an automatic tagging system for web articles on social issues and evaluated it using the Bidirectional Encoder Representations from Transformers model (BERT), Wikidata, and Wikipedia. Specifically, our method classifies each sentence in a web article into social issue tags using BERT, which was trained beforehand using the Japanese Wikipedia. We constructed a training corpus using Wikidata and Wikipedia and extracted candidate social issue tags from the “Social issues” class of Wikidata. Hierarchical tagging is now possible, thanks to the structure of Wikidata. The results of our evaluation showed that the proposed method obtained F1 scores of 0.94 for zeroth-level classifications, 0.85 for first-level classifications, and 0.85 or better for all second-level classifications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Devlin J, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805
Lan Z, et al (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942
Da San Martino G, Barron-Cedeno A, Nakov P(2019) Findings of the nlp4if-2019 shared task on fine-grained propaganda detection.In: Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda
Sanderson R, Ciccarese P, Young B (2017) Web Annotation Data Model (2017)
Shun S, Teemu T, Tadachika O, Toramatsu S (2016) Developing a goal matching service using linked open data of social issues and their solutions as public goals. Trans Jpn Soc Artif Intell 31:29–39, 01
Masaru W, Shun S, Akihisa S (2017) A study of the Co-creation Support System “Mirailab” and the Task Structure System “MissionForest” to support the continuity of co-creation. In: Proceedings of the 3rd Annual Conference of the Japanese Society for Artificial Intelligence
Yasuaki G, Masaru W, Shun S (2017) MissionForest: Prototype of a task-structured system for supporting collaboration inside and outside of an organization In: Proceedings of the 79th national convention of the information processing society of Japan, vol 6ZA-02, no 1, pp 355–356
Saito T, Osamu U (2017) Automatic labeling for news article classification based on paragraph vector. In: 2017 9th international conference on information technology and electrical engineering (ICITEE). IEEE
Yohei K (2019) BERT pretrained model trained on Japanese Wikipedia articles: GitHub, GitHub repository (2019). https://github.com/yoheikikuta/bert-japanese
Vladimir S, Andy L, Christopher T, Chiristopher Culberson J, Sheridam RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR Modeling: J Chem Inf Comput Sci 43(6):1947–1958
McCallum A, Kamal N (1998) A comparison of event models for naive bayes text classification.In: AAAI-98 workshop on learning for text categorization, vol 752, No 1
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Ron K (1995) A study of crossvalidation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence IJCAI
McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157
Watanabe S, Shiramatsu T, Sengoku A (2017) Co-creation continuity support by mutual cooperation between co-creation support system “Mirilab” and task structuring system “MissionForest”. In: Proc. of the 3rd annual conference of Japan society for artificial intelligence
Acknowledgements
This work was supported by NEDO (JPNP20006), JST CREST (JPMJCR15E1, JPMJCR20D1), and JSPS KAKENHI (17K00461).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hasegawa, T., Shiramatsu, S. (2022). BERT-Based Tagging Method for Social Issues in Web Articles. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Sixth International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 235. Springer, Singapore. https://doi.org/10.1007/978-981-16-2377-6_82
Download citation
DOI: https://doi.org/10.1007/978-981-16-2377-6_82
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2376-9
Online ISBN: 978-981-16-2377-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)