QiBERT - Classifying Online Conversations

Ferreira-Saraiva, Bruno D.; Marques-Pita, Manuel; Matos-Carvalho, João Pedro; Pirola, Zuil

doi:10.1007/978-3-031-36007-7_16

Bruno D. Ferreira-Saraiva^17,18,
Manuel Marques-Pita^17,18,
João Pedro Matos-Carvalho^17,18 &
…
Zuil Pirola^17,18

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 678))

Included in the following conference series:

Doctoral Conference on Computing, Electrical and Industrial Systems

393 Accesses
1 Altmetric

Abstract

Recent developments in online communication and their usage in everyday life have caused an explosion in the amount of a new genre of text data, short text. Thus, the need to classify this type of text based on its content has a significant implication in many areas. Online debates are no exception, once these provide access to information about opinions, positions and preferences of its users. This paper aims to use data obtained from online social conversations in Portuguese schools (short text) to observe behavioural trends and to see if students remain engaged in the discussion when stimulated. This project used the state of the art (SoA) Machine Learning (ML) algorithms and methods, through BERT based models to classify if utterances are in or out of the debate subject. Using SBERT embeddings as a feature, with supervised learning, the proposed model achieved results above 0.95 average accuracy for classifying online messages. Such improvements can help social scientists better understand human communication, behaviour, discussion and persuasion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
SBERT Model: paraphrase-multilingual-mpnet-base-v2. Multi-lingual model of paraphrase-mpnet-base-v2, extended to 50 + languages.
https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2.

References

Careaga-Butter, M., Mar´ıa Graciela, B.Q., Carolina, F.H.: Critical and prospective analysis of online education in pandemic and post-pandemic contexts: digital tools and resources to support teaching in synchronous and asynchronous learning modalities. Aloma: revista de psicologia, ci`encies de l’educacio´ i de l’esport Blanquerna 38(2), 23–32 (2020). https://raco.cat/index.php/Aloma/article/view/377756
Uthus, D.C., Aha, D. W.: Multiparticipant chat analysis: a survey, 106–121 (2013)
Google Scholar
Anjewierden, A., Kolloffel, B., Hulshof, C.: Towards educational data mining: using data mining methods for automated chat analysis to understand and support inquiry learning processes (2007)
Google Scholar
Trausan-Matu, S., Rebedea, T., Dragan, A., Alexandru, C.: Visualisation of learners’ contributions in chat conversations, 217–226 (2007). https://www.researchgate.net/publication/2102418955.
Alsmadi, I., Gan, K.H.: Review of short-text classification, 155–182 (2019)
Google Scholar
Danilov, G., Ishankulov, T., Kotik, K., Orlov, Y., Shifrin, M., Potapov, A.: The classification of short scientific texts using pretrained BERT model, pp. 83–87, July 2021
Google Scholar
Demirsoz, O., Ozcan, R.: Classification of news-related tweets. J. Inf. Sci. 43, 509–524 (2017)
Article Google Scholar
Hu, Y., Ding, J., Dou, Z., Chang, H.: Short-text classification detector: a BERT-based mental approach. Comput. Intell. Neurosci. 2022 (2022)
Google Scholar
Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks, March 2016. http://arxiv.org/abs/1603.03827
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey (2019)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Google, K.T., Language, A.I.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). https://github.com/tensorflow/tensor2tensor
Lin, Y.H., et al.: Choosing transfer languages for cross-lingual learning. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 3125–3135 (2020)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. CoRR, vol. abs/1908.10084, 2019. http://arxiv.org/abs/1908.10084
Hidey, C., Musi, E., Hwang, A., Muresan, S., McKeown, K.: Analyzing the semantic types of claims and premises in an online persuasive forum, pp. 11–21 (2017)
Google Scholar
Meredith, J., Stokoe, E.: Repair: comparing Facebook ‘chat’ with spoken interaction. Discourse Commun. 8, 181–207 (2014)
Article Google Scholar
Huynh, H.X., Nguyen, V.T., Duong-Trung, N., Pham, V.H., Phan, C.T.: Distributed framework for automating opinion discretization from text corpora on facebook. IEEE Access 7, 78675–78684 (2019)
Article Google Scholar
Jucker, A.H.: Methodological issues in digital conversation analysis, August 2021
Google Scholar
Meredith, J.: Conversation analysis and online interaction. Res. Lang. Soc. Inter. 52, 241–256 (2019). https://doi.org/10.1080/08351813.2019.1631040
Article Google Scholar
Paulus, T., Warren, A., Lester, J.N.: Applying conversation analysis methods to online talk: a literature review. Discourse, Context Media 12, 1–10 (2016). https://doi.org/10.1016/j.dcm.2016.04.001
Article Google Scholar
Liu, Y., Li, P., Hu, X.: Combining context-relevant features with multi-stage attention network for short text classification. Comput. Speech Lang. 71, 1 (2022)
Article Google Scholar
Gupta, S., Bolden, S., Kachhadia, J., Korsunska, A., Stromer-Galley, J.: PoliBERT: classifying political social media messages with BERT (2020)
Google Scholar
Khatri, A., Kumar, A.: Sarcasm detection in tweets with BERT and glove embeddings (2020)
Google Scholar
Ye, Z., Jiang, G., Liu, Y., Li, Z., Yuan, J.: Document and word representations generated by graph convolutional network and BERT for short text classification, vol. 325, pp. 2275–2281. IOS Press BV, August 2020
Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977). http://www.jstor.org/stable/2529310
Krippendorff, K., Mathet, Y., Bouvry, S., Widlo¨cher, A.: On the reliability of unitizing textual continua further: developments. Qual. Quant. 50, 2347–2364 (2016). https://doi.org/10.1007/s11135015-0266-1
Goldberg, Y., Levy, O.: word2vec explained: deriving mikolov et al.’s negativesampling word-embedding method. arXiv preprint: arXiv:1402.3722 (2014)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, vol. abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 302–308 (2014)
Google Scholar
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction (2018). https://arxiv.org/abs/1802.03426
Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3, 1399–1414 (2003)
MATH Google Scholar
Chen, T., Guestrin, C.: XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, August 2016. https://doi.org/10.1145/2F2939672.2939785
Mestre, G., Matos-Carvalho, J.P., Tavares, R.M.: Irrigation management system using artificial intelligence algorithms. In: 2022 International Young Engineers Forum (YEF-ECE), pp. 69–74 (2022)
Google Scholar
Cristianini, N., Ricci, E.: Support Vector Machines. Springer, Boston, pp. 928–932 (2008). https://doi.org/10.1007/978-0-387-30162-4_415
Matos-Carvalho, J.P., et al.: Static and dynamic algorithms for terrain classification in uav aerial imagery. Remote Sens. 11(21), 2501 (2019). https://doi.org/10.3390/rs11212501
Article Google Scholar
Sulemane, S., Matos-Carvalho, J.P., Pedro, D., Moutinho, F., Correia, S.D.: Vineyard gap detection by convolutional neural networks fed by multi-spectral images. Algorithms 15(12), 440 (2022)
Article Google Scholar
Santos, R., Matos-Carvalho, J.P., Tomic, S., Beko, M., Correia, S.D.: Applying deep neural networks to improve UAV navigation in satelliteless environments.In: 2022 International Young Engineers Forum (YEFECE), pp. 63–68 (2022)
Google Scholar
Pedro, D., Matos-Carvalho, J.P., Fonseca, J.M., Mora, A.: Collision avoidance on unmanned aerial vehicles using neural network pipelines and flow clustering techniques. Remote Sens. 13(13), 2643 (2021)
Article Google Scholar
Matos-Carvalho, J.P., et al.: Static and dynamic algorithms for terrain classification in UAV aerial imagery. Remote Sens. 11(21), 2501 (2019)
Article Google Scholar
Nakama, J., Parada, R., Matos-Carvalho, J.P., Azevedo, F., Pedro, D., Campos, L.: Autonomous environment generator for UAV-based simulation. Appl. Sci. 11(5), 2185 (2021)
Article Google Scholar
Pedro, D., Mora, A., Carvalho, J., Azevedo, F., Fonseca, J.: Colanet: a UAV collision avoidance dataset. In: Camarinha-Matos, L.M., Farhadi, N., Lopes, F., Pereira, H. (eds.) DoCEIS 2020. IAICT, vol. 577, pp. 53–62. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45124-0_5
Chapter Google Scholar
Salvado, A.B., et al.: Semantic navigation mapping from aerial multispectral imagery. In: 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), pp. 1192–1197 (2019)
Google Scholar
Matos-Carvalho, J.P., Correia, S.D., Tomic, S.: Sensitivity analysis of LSTM networks for fall detection wearable sensors. In: 2023 6th Conference on Cloud and Internet of Things (CIoT), Lisbon, Portugal, pp. 112–118 (2023) https://doi.org/10.1109/CIoT57267.2023.10084906
Vong, A., et al.: How to build a 2D and 3D aerial multispectral map?—All steps deeply explained. Remote Sens. 13(16), 3227 (2021). https://doi.org/10.3390/rs13163227
Article Google Scholar
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser B (Methodological) 36(2), 111–147 (1974). http://www.jstor.org/stable/2984809

Download references

Acknowledgment

This research was partially funded by Fundação para a Ciência e a Tecnologia under Projects “Factors for promoting dialogue and healthy behaviours in online school communities” with reference DSAIPA/DS/0102/2019 and developed at the R&D Unit CICANT - Research Center for Applied Communication, Culture and New Technologies, UIDB/04111/2020, UIDB/50008/2020 as well as Instituto Lusófono de Investigação e Desenvolvimento (ILIND) under Project COFAC/ILIND/COPELABS/1/2022.

Author information

Authors and Affiliations

COPELABS, Universidade Lusófona, Campo Grande 376, 1749 - 024, Lisboa, Portugal
Bruno D. Ferreira-Saraiva, Manuel Marques-Pita, João Pedro Matos-Carvalho & Zuil Pirola
CICANT, Universidade Lusófona, Campo Grande 376, 1749 - 024, Lisboa, Portugal
Bruno D. Ferreira-Saraiva, Manuel Marques-Pita, João Pedro Matos-Carvalho & Zuil Pirola

Authors

Bruno D. Ferreira-Saraiva
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Marques-Pita
View author publications
You can also search for this author in PubMed Google Scholar
João Pedro Matos-Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Zuil Pirola
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bruno D. Ferreira-Saraiva .

Editor information

Editors and Affiliations

School of Science and Technology, NOVA University of Lisbon, Monte Caparica, Portugal
Luis M. Camarinha-Matos
School of Science and Technology, NOVA University of Lisbon, Monte Caparica, Portugal
Filipa Ferrada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferreira-Saraiva, B.D., Marques-Pita, M., Matos-Carvalho, J.P., Pirola, Z. (2023). QiBERT - Classifying Online Conversations. In: Camarinha-Matos, L.M., Ferrada, F. (eds) Technological Innovation for Connected Cyber Physical Spaces. DoCEIS 2023. IFIP Advances in Information and Communication Technology, vol 678. Springer, Cham. https://doi.org/10.1007/978-3-031-36007-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-36007-7_16
Published: 25 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36006-0
Online ISBN: 978-3-031-36007-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)