Skip to main content
Log in

Exploiting ontology information in fuzzy SVM social media profile classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Nowadays, social media like Twitter, Facebook, blogs, and LinkedIn are considered the most used sources of information, while at the same time being the most visited and most used sources of disinformation. These can have a negative impact on several areas and on our minds, hence on our behavior. It is obvious that this disinformation is closely related to the profiles of the authors of this information. The purpose of author profiling is to analyze the texts published by the authors in order to determine their profile category. A wide range of methods for selecting statistical characteristics and machine learning has been studied in recent years in order to automatically classify this information. However, these main methods of selecting statistical characteristics and machine learning used for this purpose have not proven their great performance in the processing of data from social networks. The main contribution of this article consists in integrating the semantic component, which has not been taken into account in the main approaches studied in the literature, as additional functionalities enabling the identification of relevant information. Our hypothesis is that the concepts and the relationships between these concepts tend to have a more coherent correlation with relevant and irrelevant information, and can therefore increase the discriminating power of classifiers. The semantic approach proposed revolves around an ontology combined with the linear SVM classifier and then with the fuzzy SVM classifier. The experimental study carried out, on the different collections of Twitter profiles. On our approach and on the main approaches to the literature that we have studied, as well as the analysis of the results obtained. The results we have clearly show the limits of these studied approaches and confirm the performance of our approach, as well as the efficiency of the integration of the semantic component in the categorization of Twitter profiles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://nlp.uned.es/replab2014/

  2. https://pan.webis.de/clef17/pan17-web/author-identification.html

  3. https://www.kaggle.com/arminehn/disasteraccident-sources

  4. https://data.world/crowdflower/gender-classifier-data

References

  1. Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Mining text data, pp 163–222. Springer

  2. Alrifai K, Rebdawi G, Ghneim N (2017) Arabic tweeps gender and dialect prediction. In: CLEF (Working notes)

  3. Amigó E, Carrillo-de Albornoz J, Chugur I, Corujo A, Gonzalo J, Meij E, de Rijke M, Spina D (2014) Overview of replab 2014: author profiling and reputation dimensions for online reputation management. In: International conference of the cross-language evaluation forum for european languages, pp 307–322. Springer

  4. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  5. Baldoni M, Baroglio C, Patti V, Rena P (2012) From tags to emotions: ontology-driven sentiment analysis in the social semantic web. Intelligenza Artificiale 6(1):41–54

    Article  Google Scholar 

  6. Batbaatar E, Li M, Ryu KH (2019) Semantic-emotion neural network for emotion recognition from text. IEEE Access 7:111866–111878

    Article  Google Scholar 

  7. Blanchard EG, Mizoguchi R (2008) Designing culturally-aware tutoring systems: towards an upper ontology of culture. Culturally Aware Tutoring Systems (CATS), pp 23–34

  8. Boukhari K, Omri MN, et al. (2020) Approximate matching-based unsupervised document indexing approach: application to biomedical domain. Scientometrics, pp 1–22

  9. Breslin JG, Harth A, Bojars U, Decker S (2005) Towards semantically-interlinked online communities. In: European semantic web conference, pp 500–514. Springer

  10. Chakrabarti S, Dom B, Agrawal R, Raghavan P (1997) Using taxonomy, discriminants, and signatures for navigating in text databases. In: VLDB, vol 97, pp 446–455

  11. Ciaramita M, Gangemi A, Ratsch E, Saric J, Rojas I (2005) Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In: IJCAI, pp 659–664

  12. Cossu JV, Sanjuan E, Torres-Moreno JM, El-Bèze M (2015) Automatic classification and pls-pm modeling for profiling reputation of corporate entities on twitter. In: International conference on applications of natural language to information systems, pp 282–289. Springer

  13. Dilrukshi I, de Zoysa K (2014) A feature selection method for twitter news classification. Int J Mach Learn Comput 4:365–371

    Article  Google Scholar 

  14. Eidoon Z, Yazdani N, Oroumchian F (2007) A vector based method of ontology matching. In: Third international conference on semantics, knowledge and grid (SKG 2007), pp 378–381. IEEE

  15. Fkih F, Omri MN (2020) Hidden data states-based complex terminology extraction from textual web data model. Appl Intell, pp 1–19

  16. Franco-Salvador M, Plotnikova N, Pawar N, Benajiba Y (2017) Subword-based deep averaging networks for author profiling in social media. In: CLEF (Working notes)

  17. Gao BB, Wang JJ, Wang Y, Yang CY (2015) Coordinate descent fuzzy twin support vector machine for classification. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 7–12. IEEE

  18. García-Mondeja Y, Castro-Castro D, Lavielle-Castro V, Muñoz R (2017) Discovering author groups using a β-compact graph-based clustering. In: CLEF (Working Notes), CEUR Workshop Proceedings, vol 1866

  19. Gómez-Adorno H, Aleman Y, Ayala DV, Sanchez-Perez MA, Pinto D, Sidorov G (2017) Author clustering using hierarchical clustering analysis. In: CLEF (Working notes)

  20. Graf AB, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Trans Neural Netw 14(3):597–605

    Article  Google Scholar 

  21. Hajjem M, Cossu JV, Latiri C, SanJuan E (2018) Clef mc2 2018 lab overview. In: International conference of the cross-language evaluation forum for european languages, pp 302–308. Springer

  22. Halvani O, Graner L (2017) Author clustering based on compression-based dissimilarity scores. In: CLEF (Working notes)

  23. Huang A (2008) Similarity measures for text document clustering. In: Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, pp 49–56

  24. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Machine learning: ECML-98, pp 137–142

  25. Kheng G, Laporte L, Granitzer M (2017) Insa lyon and uni passau’s participation at pan@ clef’17: author profiling task. In: CLEF (Working notes)

  26. Kocher M, Savoy J (2017) Unine at clef 2017: author profiling reasoning. In: CLEF (Working notes)

  27. Kodiyan D, Hardegger F, Neuhaus S, Cieliebak M (2017) Author profiling with bidirectional rnns using attention with grus: notebook for pan at clef 2017. In: CLEF 2017 evaluation labs and workshop–working notes papers, vol 1866. RWTH aachen

  28. Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications 40(10):4065–4074

    Article  Google Scholar 

  29. Kotsiantis SB, Zaharakis ID, Pintelas PE (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3):159–190

    Article  Google Scholar 

  30. Lang K (1995) Newsweeder: learning to filter netnews. In: Machine learning proceedings 1995, pp 331–339. Elsevier

  31. Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Transactions on neural networks 13(2):464–471

    Article  Google Scholar 

  32. Liu B, Zhang L (2012) A survey of opinion mining and sentiment analysis. In: Mining text data, pp 415–463. Springer

  33. Losada DE, Crestani F, Parapar J (2018) Overview of erisk: early risk prediction on the internet. In: International conference of the cross-language evaluation forum for european languages, pp 343–361. Springer

  34. Mabrouk O, Hlaoua L, Omri MN (2018) Fuzzy twin svm based-profile categorization approach. In: 2018 14th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD), pp 547–553. IEEE

  35. Mabrouk O, Hlaoua L, Omri MN (2018) Profile categorization system based on features reduction. In: International symposium on artificial intelligence and mathematics

  36. Miura Y, Taniguchi T, Taniguchi M, Ohkuma T (2017) Author profiling with word+ character neural attention network. In: CLEF (Working notes)

  37. Nakov P, Barrón-Cedeño A, Elsayed T, Suwaileh R, Màrquez L, Zaghouani W, Atanasova P, Kyuchukov S, Da San Martino G (2018) Overview of the clef-2018 checkthat! lab on automatic identification and verification of political claims. In: International conference of the cross-language evaluation forum for european languages, pp 372–387. Springer

  38. Ogaltsov A, Romanov A (2017) Language variety and gender classification for author profiling in pan 2017. In: CLEF (Working notes)

  39. Omri MN (2004) Pertinent knowledge extraction from a semantic network: application of fuzzy sets theory. International Journal on Artificial Intelligence Tools 13:705–720

    Article  Google Scholar 

  40. Omri MN (2004) Possibilistic pertinence feedback and semantic networks for goal extraction. Asian Journal of Information Technology 3(4):258–265

    Google Scholar 

  41. Omri MN (2020) Fuzzy ontology-based querying user’ requests under uncertain environment. International Journal of Cognitive Informatics and Natural Intelligence 14

  42. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol 10, pp 79–86. Association for Computational Linguistics

  43. Passant A, Bojars U, Breslin JG, Hastrup T, Stankovic M, Laublet P, et al. (2010) An overview of smob 2: open, semantic and distributed microblogging. In: ICWSM, pp 303–306

  44. Pham XH, Lee NH, Jung JJ, Sadeghi-Niaraki A (2013) Collaborative spam filtering based on incremental ontology learning. Telecommun Syst 52(2):693–700

    Google Scholar 

  45. Polpinij J, Ghose AK (2008) An ontology-based sentiment classification methodology for online consumer reviews. In: 2008 IEEE/WIC/ACM International conference on web intelligence and intelligent agent technology, vol 1, pp 518–524

  46. Poulston A, Waseem Z, Stevenson M (2017) Using tf-idf n-gram and word embedding cluster ensembles for author profiling. In: CLEF (Working notes)

  47. Rangel F, Rosso P, Potthast M, Stein B (2017) Overview of the 5th author profiling task at pan 2017: gender and language variety identification in twitter. Working Notes Papers of the CLEF

  48. Rector AL, Zanstra PE, Solomon WD, Rogers JE, Baud R, Ceusters W, Claassen W, Kirby J, Rodrigues JM, Mori AR, et al. (1998) Reconciling users’ needs and formal requirements: issues in developing a reusable ontology for medicine. IEEE Transactions on Information Technology in BioMedicine 2 (4):229–242

    Article  Google Scholar 

  49. Reimers N, Gurevych I (2017) Optimal hyperparameters for deep lstm-networks for sequence labeling tasks. arXiv:1707.06799

  50. Schaetti N (2017) Unine at clef 2017: Tf-idf and deep-learning for author profiling. In: CLEF (Working notes)

  51. Sendi M, Omri MN, Abed M (2017) Possibilistic interest discovery from uncertain information in social networks. Intelligent Data Analysis 21(6):1425–1442

    Article  Google Scholar 

  52. Sendi M, Omri MN, Abed M (2019) Discovery and tracking of temporal topics of interest based on belief-function and aging theories. Journal of Ambient Intelligence and Humanized Computing 10 (9):3409–3425

    Article  Google Scholar 

  53. Sierra S, Montes-y Gómez M, Solorio T, González FA (2017) Convolutional neural networks for author profiling. Working Notes of the CLEF

  54. Song Q, Hu W, Xie W (2002) Robust support vector machine with bullet hole image classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 32(4):440–448

    Article  Google Scholar 

  55. Stamatatos E, Rangel F, Tschuggnall M, Stein B, Kestemont M, Rosso P, Potthast M (2018) Overview of pan 2018. In: International conference of the cross-language evaluation forum for european languages, pp 267–285. Springer

  56. Vilares D, Hermo M, Alonso MA, Gómez-rodríguez C, Vilares J (2014) Lys at clef replab 2014: creating the state of the art in author influence ranking and reputation classification on twitter. In: Clef (working notes), pp 1468–1478

  57. Villatoro-Tello E, Ramírez-de-la Rosa G, Sánchez-Sánchez C, Jiménez-Salazar H, Luna-Ramírez WA, Rodríguez-Lucatero C (2014) Uamclyr at replab 2014: author profiling task. In: CLEF (Working notes), pp 1547–1558

  58. Zhang X (1999) Using class-center vectors to build support vector machines. In: Neural networks for signal processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop, pp 3–11. IEEE

  59. Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820

  60. Zipf GK (2016) Human behavior and the principle of least effort: an introduction to human ecology. Ravenio Books

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olfa Mabrouk.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mabrouk, O., Hlaoua, L. & Omri, M.N. Exploiting ontology information in fuzzy SVM social media profile classification. Appl Intell 51, 3757–3774 (2021). https://doi.org/10.1007/s10489-020-01939-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01939-2

Keywords

Navigation