BERT-Based Tagging Method for Social Issues in Web Articles

Hasegawa, Tokutaka; Shiramatsu, Shun

doi:10.1007/978-981-16-2377-6_82

BERT-Based Tagging Method for Social Issues in Web Articles

Conference paper
First Online: 24 September 2021

1170 Accesses
1 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 235))

Abstract

In recent years, public collaboration to address social challenges has become increasingly important for the sustainable development of local and global communities. When discussing how to tackle social challenges, it is essential to investigate relevant existing activities. This study aims to collect web articles on various social issues to support such discussions and research. For this purpose, we have developed an automatic tagging system for web articles on social issues and evaluated it using the Bidirectional Encoder Representations from Transformers model (BERT), Wikidata, and Wikipedia. Specifically, our method classifies each sentence in a web article into social issue tags using BERT, which was trained beforehand using the Japanese Wikipedia. We constructed a training corpus using Wikidata and Wikipedia and extracted candidate social issue tags from the “Social issues” class of Wikidata. Hierarchical tagging is now possible, thanks to the structure of Wikidata. The results of our evaluation showed that the proposed method obtained F1 scores of 0.94 for zeroth-level classifications, 0.85 for first-level classifications, and 0.85 or better for all second-level classifications.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Devlin J, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805
Lan Z, et al (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942
Da San Martino G, Barron-Cedeno A, Nakov P(2019) Findings of the nlp4if-2019 shared task on fine-grained propaganda detection.In: Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda
Google Scholar
Sanderson R, Ciccarese P, Young B (2017) Web Annotation Data Model (2017)
Google Scholar
Shun S, Teemu T, Tadachika O, Toramatsu S (2016) Developing a goal matching service using linked open data of social issues and their solutions as public goals. Trans Jpn Soc Artif Intell 31:29–39, 01
Google Scholar
Masaru W, Shun S, Akihisa S (2017) A study of the Co-creation Support System “Mirailab” and the Task Structure System “MissionForest” to support the continuity of co-creation. In: Proceedings of the 3rd Annual Conference of the Japanese Society for Artificial Intelligence
Google Scholar
Yasuaki G, Masaru W, Shun S (2017) MissionForest: Prototype of a task-structured system for supporting collaboration inside and outside of an organization In: Proceedings of the 79th national convention of the information processing society of Japan, vol 6ZA-02, no 1, pp 355–356
Google Scholar
Saito T, Osamu U (2017) Automatic labeling for news article classification based on paragraph vector. In: 2017 9th international conference on information technology and electrical engineering (ICITEE). IEEE
Google Scholar
Yohei K (2019) BERT pretrained model trained on Japanese Wikipedia articles: GitHub, GitHub repository (2019). https://github.com/yoheikikuta/bert-japanese
Vladimir S, Andy L, Christopher T, Chiristopher Culberson J, Sheridam RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR Modeling: J Chem Inf Comput Sci 43(6):1947–1958
Google Scholar
McCallum A, Kamal N (1998) A comparison of event models for naive bayes text classification.In: AAAI-98 workshop on learning for text categorization, vol 752, No 1
Google Scholar
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
MATH Google Scholar
Ron K (1995) A study of crossvalidation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence IJCAI
Google Scholar
McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157
Google Scholar
Watanabe S, Shiramatsu T, Sengoku A (2017) Co-creation continuity support by mutual cooperation between co-creation support system “Mirilab” and task structuring system “MissionForest”. In: Proc. of the 3rd annual conference of Japan society for artificial intelligence
Google Scholar

Download references

Acknowledgements

This work was supported by NEDO (JPNP20006), JST CREST (JPMJCR15E1, JPMJCR20D1), and JSPS KAKENHI (17K00461).

Author information

Authors and Affiliations

Nagoya Institute of Technology, Nagoya, Japan
Tokutaka Hasegawa & Shun Shiramatsu

Authors

Tokutaka Hasegawa
View author publications
You can also search for this author in PubMed Google Scholar
Shun Shiramatsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tokutaka Hasegawa .

Editor information

Editors and Affiliations

Middlesex University, London, UK
Xin-She Yang
University of Reading, Reading, UK
Simon Sherratt
JIS University, Kolkata, India
Nilanjan Dey
Global Knowledge Research Foundation, Ahmedabad, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hasegawa, T., Shiramatsu, S. (2022). BERT-Based Tagging Method for Social Issues in Web Articles. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Sixth International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 235. Springer, Singapore. https://doi.org/10.1007/978-981-16-2377-6_82

Download citation

DOI: https://doi.org/10.1007/978-981-16-2377-6_82
Published: 24 September 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2376-9
Online ISBN: 978-981-16-2377-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics