Skip to main content

BERT-Based Tagging Method for Social Issues in Web Articles

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 235))

Abstract

In recent years, public collaboration to address social challenges has become increasingly important for the sustainable development of local and global communities. When discussing how to tackle social challenges, it is essential to investigate relevant existing activities. This study aims to collect web articles on various social issues to support such discussions and research. For this purpose, we have developed an automatic tagging system for web articles on social issues and evaluated it using the Bidirectional Encoder Representations from Transformers model (BERT), Wikidata, and Wikipedia. Specifically, our method classifies each sentence in a web article into social issue tags using BERT, which was trained beforehand using the Japanese Wikipedia. We constructed a training corpus using Wikidata and Wikipedia and extracted candidate social issue tags from the “Social issues” class of Wikidata. Hierarchical tagging is now possible, thanks to the structure of Wikidata. The results of our evaluation showed that the proposed method obtained F1 scores of 0.94 for zeroth-level classifications, 0.85 for first-level classifications, and 0.85 or better for all second-level classifications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Devlin J, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805

  2. Lan Z, et al (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942

  3. Da San Martino G, Barron-Cedeno A, Nakov P(2019) Findings of the nlp4if-2019 shared task on fine-grained propaganda detection.In: Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

    Google Scholar 

  4. Sanderson R, Ciccarese P, Young B (2017) Web Annotation Data Model (2017)

    Google Scholar 

  5. Shun S, Teemu T, Tadachika O, Toramatsu S (2016) Developing a goal matching service using linked open data of social issues and their solutions as public goals. Trans Jpn Soc Artif Intell 31:29–39, 01

    Google Scholar 

  6. Masaru W, Shun S, Akihisa S (2017) A study of the Co-creation Support System “Mirailab” and the Task Structure System “MissionForest” to support the continuity of co-creation. In: Proceedings of the 3rd Annual Conference of the Japanese Society for Artificial Intelligence

    Google Scholar 

  7. Yasuaki G, Masaru W, Shun S (2017) MissionForest: Prototype of a task-structured system for supporting collaboration inside and outside of an organization In: Proceedings of the 79th national convention of the information processing society of Japan, vol 6ZA-02, no 1, pp 355–356

    Google Scholar 

  8. Saito T, Osamu U (2017) Automatic labeling for news article classification based on paragraph vector. In: 2017 9th international conference on information technology and electrical engineering (ICITEE). IEEE

    Google Scholar 

  9. Yohei K (2019) BERT pretrained model trained on Japanese Wikipedia articles: GitHub, GitHub repository (2019). https://github.com/yoheikikuta/bert-japanese

  10. Vladimir S, Andy L, Christopher T, Chiristopher Culberson J, Sheridam RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR Modeling: J Chem Inf Comput Sci 43(6):1947–1958

    Google Scholar 

  11. McCallum A, Kamal N (1998) A comparison of event models for naive bayes text classification.In: AAAI-98 workshop on learning for text categorization, vol 752, No 1

    Google Scholar 

  12. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  13. Ron K (1995) A study of crossvalidation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence IJCAI

    Google Scholar 

  14. McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157

    Google Scholar 

  15. Watanabe S, Shiramatsu T, Sengoku A (2017) Co-creation continuity support by mutual cooperation between co-creation support system “Mirilab” and task structuring system “MissionForest”. In: Proc. of the 3rd annual conference of Japan society for artificial intelligence

    Google Scholar 

Download references

Acknowledgements

This work was supported by NEDO (JPNP20006), JST CREST (JPMJCR15E1, JPMJCR20D1), and JSPS KAKENHI (17K00461).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tokutaka Hasegawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hasegawa, T., Shiramatsu, S. (2022). BERT-Based Tagging Method for Social Issues in Web Articles. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Sixth International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 235. Springer, Singapore. https://doi.org/10.1007/978-981-16-2377-6_82

Download citation

Publish with us

Policies and ethics