Skip to main content

QiBERT - Classifying Online Conversations

Messages with BERT as a Feature

  • Conference paper
  • First Online:
Technological Innovation for Connected Cyber Physical Spaces (DoCEIS 2023)

Abstract

Recent developments in online communication and their usage in everyday life have caused an explosion in the amount of a new genre of text data, short text. Thus, the need to classify this type of text based on its content has a significant implication in many areas. Online debates are no exception, once these provide access to information about opinions, positions and preferences of its users. This paper aims to use data obtained from online social conversations in Portuguese schools (short text) to observe behavioural trends and to see if students remain engaged in the discussion when stimulated. This project used the state of the art (SoA) Machine Learning (ML) algorithms and methods, through BERT based models to classify if utterances are in or out of the debate subject. Using SBERT embeddings as a feature, with supervised learning, the proposed model achieved results above 0.95 average accuracy for classifying online messages. Such improvements can help social scientists better understand human communication, behaviour, discussion and persuasion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    SBERT Model: paraphrase-multilingual-mpnet-base-v2. Multi-lingual model of paraphrase-mpnet-base-v2, extended to 50 + languages.

    https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2.

References

  1. Careaga-Butter, M., Mar´ıa Graciela, B.Q., Carolina, F.H.: Critical and prospective analysis of online education in pandemic and post-pandemic contexts: digital tools and resources to support teaching in synchronous and asynchronous learning modalities. Aloma: revista de psicologia, ci`encies de l’educacio´ i de l’esport Blanquerna 38(2), 23–32 (2020). https://raco.cat/index.php/Aloma/article/view/377756

  2. Uthus, D.C., Aha, D. W.: Multiparticipant chat analysis: a survey, 106–121 (2013)

    Google Scholar 

  3. Anjewierden, A., Kolloffel, B., Hulshof, C.: Towards educational data mining: using data mining methods for automated chat analysis to understand and support inquiry learning processes (2007)

    Google Scholar 

  4. Trausan-Matu, S., Rebedea, T., Dragan, A., Alexandru, C.: Visualisation of learners’ contributions in chat conversations, 217–226 (2007). https://www.researchgate.net/publication/2102418955.

  5. Alsmadi, I., Gan, K.H.: Review of short-text classification, 155–182 (2019)

    Google Scholar 

  6. Danilov, G., Ishankulov, T., Kotik, K., Orlov, Y., Shifrin, M., Potapov, A.: The classification of short scientific texts using pretrained BERT model, pp. 83–87, July 2021

    Google Scholar 

  7. Demirsoz, O., Ozcan, R.: Classification of news-related tweets. J. Inf. Sci. 43, 509–524 (2017)

    Article  Google Scholar 

  8. Hu, Y., Ding, J., Dou, Z., Chang, H.: Short-text classification detector: a BERT-based mental approach. Comput. Intell. Neurosci. 2022 (2022)

    Google Scholar 

  9. Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks, March 2016. http://arxiv.org/abs/1603.03827

  10. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey (2019)

    Google Scholar 

  11. Devlin, J., Chang, M.-W., Lee, K., Google, K.T., Language, A.I.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). https://github.com/tensorflow/tensor2tensor

  12. Lin, Y.H., et al.: Choosing transfer languages for cross-lingual learning. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 3125–3135 (2020)

    Google Scholar 

  13. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. CoRR, vol. abs/1908.10084, 2019. http://arxiv.org/abs/1908.10084

  14. Hidey, C., Musi, E., Hwang, A., Muresan, S., McKeown, K.: Analyzing the semantic types of claims and premises in an online persuasive forum, pp. 11–21 (2017)

    Google Scholar 

  15. Meredith, J., Stokoe, E.: Repair: comparing Facebook ‘chat’ with spoken interaction. Discourse Commun. 8, 181–207 (2014)

    Article  Google Scholar 

  16. Huynh, H.X., Nguyen, V.T., Duong-Trung, N., Pham, V.H., Phan, C.T.: Distributed framework for automating opinion discretization from text corpora on facebook. IEEE Access 7, 78675–78684 (2019)

    Article  Google Scholar 

  17. Jucker, A.H.: Methodological issues in digital conversation analysis, August 2021

    Google Scholar 

  18. Meredith, J.: Conversation analysis and online interaction. Res. Lang. Soc. Inter. 52, 241–256 (2019). https://doi.org/10.1080/08351813.2019.1631040

    Article  Google Scholar 

  19. Paulus, T., Warren, A., Lester, J.N.: Applying conversation analysis methods to online talk: a literature review. Discourse, Context Media 12, 1–10 (2016). https://doi.org/10.1016/j.dcm.2016.04.001

    Article  Google Scholar 

  20. Liu, Y., Li, P., Hu, X.: Combining context-relevant features with multi-stage attention network for short text classification. Comput. Speech Lang. 71, 1 (2022)

    Article  Google Scholar 

  21. Gupta, S., Bolden, S., Kachhadia, J., Korsunska, A., Stromer-Galley, J.: PoliBERT: classifying political social media messages with BERT (2020)

    Google Scholar 

  22. Khatri, A., Kumar, A.: Sarcasm detection in tweets with BERT and glove embeddings (2020)

    Google Scholar 

  23. Ye, Z., Jiang, G., Liu, Y., Li, Z., Yuan, J.: Document and word representations generated by graph convolutional network and BERT for short text classification, vol. 325, pp. 2275–2281. IOS Press BV, August 2020

    Google Scholar 

  24. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977). http://www.jstor.org/stable/2529310

  25. Krippendorff, K., Mathet, Y., Bouvry, S., Widlo¨cher, A.: On the reliability of unitizing textual continua further: developments. Qual. Quant. 50, 2347–2364 (2016). https://doi.org/10.1007/s11135015-0266-1

  26. Goldberg, Y., Levy, O.: word2vec explained: deriving mikolov et al.’s negativesampling word-embedding method. arXiv preprint: arXiv:1402.3722 (2014)

  27. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, vol. abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

  28. Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 302–308 (2014)

    Google Scholar 

  29. McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction (2018). https://arxiv.org/abs/1802.03426

  30. Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3, 1399–1414 (2003)

    MATH  Google Scholar 

  31. Chen, T., Guestrin, C.: XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, August 2016. https://doi.org/10.1145/2F2939672.2939785

  32. Mestre, G., Matos-Carvalho, J.P., Tavares, R.M.: Irrigation management system using artificial intelligence algorithms. In: 2022 International Young Engineers Forum (YEF-ECE), pp. 69–74 (2022)

    Google Scholar 

  33. Cristianini, N., Ricci, E.: Support Vector Machines. Springer, Boston, pp. 928–932 (2008). https://doi.org/10.1007/978-0-387-30162-4_415

  34. Matos-Carvalho, J.P., et al.: Static and dynamic algorithms for terrain classification in uav aerial imagery. Remote Sens. 11(21), 2501 (2019). https://doi.org/10.3390/rs11212501

    Article  Google Scholar 

  35. Sulemane, S., Matos-Carvalho, J.P., Pedro, D., Moutinho, F., Correia, S.D.: Vineyard gap detection by convolutional neural networks fed by multi-spectral images. Algorithms 15(12), 440 (2022)

    Article  Google Scholar 

  36. Santos, R., Matos-Carvalho, J.P., Tomic, S., Beko, M., Correia, S.D.: Applying deep neural networks to improve UAV navigation in satelliteless environments.In: 2022 International Young Engineers Forum (YEFECE), pp. 63–68 (2022)

    Google Scholar 

  37. Pedro, D., Matos-Carvalho, J.P., Fonseca, J.M., Mora, A.: Collision avoidance on unmanned aerial vehicles using neural network pipelines and flow clustering techniques. Remote Sens. 13(13), 2643 (2021)

    Article  Google Scholar 

  38. Matos-Carvalho, J.P., et al.: Static and dynamic algorithms for terrain classification in UAV aerial imagery. Remote Sens. 11(21), 2501 (2019)

    Article  Google Scholar 

  39. Nakama, J., Parada, R., Matos-Carvalho, J.P., Azevedo, F., Pedro, D., Campos, L.: Autonomous environment generator for UAV-based simulation. Appl. Sci. 11(5), 2185 (2021)

    Article  Google Scholar 

  40. Pedro, D., Mora, A., Carvalho, J., Azevedo, F., Fonseca, J.: Colanet: a UAV collision avoidance dataset. In: Camarinha-Matos, L.M., Farhadi, N., Lopes, F., Pereira, H. (eds.) DoCEIS 2020. IAICT, vol. 577, pp. 53–62. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45124-0_5

    Chapter  Google Scholar 

  41. Salvado, A.B., et al.: Semantic navigation mapping from aerial multispectral imagery. In: 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), pp. 1192–1197 (2019)

    Google Scholar 

  42. Matos-Carvalho, J.P., Correia, S.D., Tomic, S.: Sensitivity analysis of LSTM networks for fall detection wearable sensors. In: 2023 6th Conference on Cloud and Internet of Things (CIoT), Lisbon, Portugal, pp. 112–118 (2023) https://doi.org/10.1109/CIoT57267.2023.10084906

  43. Vong, A., et al.: How to build a 2D and 3D aerial multispectral map?—All steps deeply explained. Remote Sens. 13(16), 3227 (2021). https://doi.org/10.3390/rs13163227

    Article  Google Scholar 

  44. Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser B (Methodological) 36(2), 111–147 (1974). http://www.jstor.org/stable/2984809

Download references

Acknowledgment

This research was partially funded by Fundação para a Ciência e a Tecnologia under Projects “Factors for promoting dialogue and healthy behaviours in online school communities” with reference DSAIPA/DS/0102/2019 and developed at the R&D Unit CICANT - Research Center for Applied Communication, Culture and New Technologies, UIDB/04111/2020, UIDB/50008/2020 as well as Instituto Lusófono de Investigação e Desenvolvimento (ILIND) under Project COFAC/ILIND/COPELABS/1/2022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruno D. Ferreira-Saraiva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ferreira-Saraiva, B.D., Marques-Pita, M., Matos-Carvalho, J.P., Pirola, Z. (2023). QiBERT - Classifying Online Conversations. In: Camarinha-Matos, L.M., Ferrada, F. (eds) Technological Innovation for Connected Cyber Physical Spaces. DoCEIS 2023. IFIP Advances in Information and Communication Technology, vol 678. Springer, Cham. https://doi.org/10.1007/978-3-031-36007-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36007-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36006-0

  • Online ISBN: 978-3-031-36007-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics