Abstract
Nowadays, social media like Twitter, Facebook, blogs, and LinkedIn are considered the most used sources of information, while at the same time being the most visited and most used sources of disinformation. These can have a negative impact on several areas and on our minds, hence on our behavior. It is obvious that this disinformation is closely related to the profiles of the authors of this information. The purpose of author profiling is to analyze the texts published by the authors in order to determine their profile category. A wide range of methods for selecting statistical characteristics and machine learning has been studied in recent years in order to automatically classify this information. However, these main methods of selecting statistical characteristics and machine learning used for this purpose have not proven their great performance in the processing of data from social networks. The main contribution of this article consists in integrating the semantic component, which has not been taken into account in the main approaches studied in the literature, as additional functionalities enabling the identification of relevant information. Our hypothesis is that the concepts and the relationships between these concepts tend to have a more coherent correlation with relevant and irrelevant information, and can therefore increase the discriminating power of classifiers. The semantic approach proposed revolves around an ontology combined with the linear SVM classifier and then with the fuzzy SVM classifier. The experimental study carried out, on the different collections of Twitter profiles. On our approach and on the main approaches to the literature that we have studied, as well as the analysis of the results obtained. The results we have clearly show the limits of these studied approaches and confirm the performance of our approach, as well as the efficiency of the integration of the semantic component in the categorization of Twitter profiles.
Similar content being viewed by others
References
Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Mining text data, pp 163–222. Springer
Alrifai K, Rebdawi G, Ghneim N (2017) Arabic tweeps gender and dialect prediction. In: CLEF (Working notes)
Amigó E, Carrillo-de Albornoz J, Chugur I, Corujo A, Gonzalo J, Meij E, de Rijke M, Spina D (2014) Overview of replab 2014: author profiling and reputation dimensions for online reputation management. In: International conference of the cross-language evaluation forum for european languages, pp 307–322. Springer
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Baldoni M, Baroglio C, Patti V, Rena P (2012) From tags to emotions: ontology-driven sentiment analysis in the social semantic web. Intelligenza Artificiale 6(1):41–54
Batbaatar E, Li M, Ryu KH (2019) Semantic-emotion neural network for emotion recognition from text. IEEE Access 7:111866–111878
Blanchard EG, Mizoguchi R (2008) Designing culturally-aware tutoring systems: towards an upper ontology of culture. Culturally Aware Tutoring Systems (CATS), pp 23–34
Boukhari K, Omri MN, et al. (2020) Approximate matching-based unsupervised document indexing approach: application to biomedical domain. Scientometrics, pp 1–22
Breslin JG, Harth A, Bojars U, Decker S (2005) Towards semantically-interlinked online communities. In: European semantic web conference, pp 500–514. Springer
Chakrabarti S, Dom B, Agrawal R, Raghavan P (1997) Using taxonomy, discriminants, and signatures for navigating in text databases. In: VLDB, vol 97, pp 446–455
Ciaramita M, Gangemi A, Ratsch E, Saric J, Rojas I (2005) Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In: IJCAI, pp 659–664
Cossu JV, Sanjuan E, Torres-Moreno JM, El-Bèze M (2015) Automatic classification and pls-pm modeling for profiling reputation of corporate entities on twitter. In: International conference on applications of natural language to information systems, pp 282–289. Springer
Dilrukshi I, de Zoysa K (2014) A feature selection method for twitter news classification. Int J Mach Learn Comput 4:365–371
Eidoon Z, Yazdani N, Oroumchian F (2007) A vector based method of ontology matching. In: Third international conference on semantics, knowledge and grid (SKG 2007), pp 378–381. IEEE
Fkih F, Omri MN (2020) Hidden data states-based complex terminology extraction from textual web data model. Appl Intell, pp 1–19
Franco-Salvador M, Plotnikova N, Pawar N, Benajiba Y (2017) Subword-based deep averaging networks for author profiling in social media. In: CLEF (Working notes)
Gao BB, Wang JJ, Wang Y, Yang CY (2015) Coordinate descent fuzzy twin support vector machine for classification. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 7–12. IEEE
García-Mondeja Y, Castro-Castro D, Lavielle-Castro V, Muñoz R (2017) Discovering author groups using a β-compact graph-based clustering. In: CLEF (Working Notes), CEUR Workshop Proceedings, vol 1866
Gómez-Adorno H, Aleman Y, Ayala DV, Sanchez-Perez MA, Pinto D, Sidorov G (2017) Author clustering using hierarchical clustering analysis. In: CLEF (Working notes)
Graf AB, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Trans Neural Netw 14(3):597–605
Hajjem M, Cossu JV, Latiri C, SanJuan E (2018) Clef mc2 2018 lab overview. In: International conference of the cross-language evaluation forum for european languages, pp 302–308. Springer
Halvani O, Graner L (2017) Author clustering based on compression-based dissimilarity scores. In: CLEF (Working notes)
Huang A (2008) Similarity measures for text document clustering. In: Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, pp 49–56
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Machine learning: ECML-98, pp 137–142
Kheng G, Laporte L, Granitzer M (2017) Insa lyon and uni passau’s participation at pan@ clef’17: author profiling task. In: CLEF (Working notes)
Kocher M, Savoy J (2017) Unine at clef 2017: author profiling reasoning. In: CLEF (Working notes)
Kodiyan D, Hardegger F, Neuhaus S, Cieliebak M (2017) Author profiling with bidirectional rnns using attention with grus: notebook for pan at clef 2017. In: CLEF 2017 evaluation labs and workshop–working notes papers, vol 1866. RWTH aachen
Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications 40(10):4065–4074
Kotsiantis SB, Zaharakis ID, Pintelas PE (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3):159–190
Lang K (1995) Newsweeder: learning to filter netnews. In: Machine learning proceedings 1995, pp 331–339. Elsevier
Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Transactions on neural networks 13(2):464–471
Liu B, Zhang L (2012) A survey of opinion mining and sentiment analysis. In: Mining text data, pp 415–463. Springer
Losada DE, Crestani F, Parapar J (2018) Overview of erisk: early risk prediction on the internet. In: International conference of the cross-language evaluation forum for european languages, pp 343–361. Springer
Mabrouk O, Hlaoua L, Omri MN (2018) Fuzzy twin svm based-profile categorization approach. In: 2018 14th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD), pp 547–553. IEEE
Mabrouk O, Hlaoua L, Omri MN (2018) Profile categorization system based on features reduction. In: International symposium on artificial intelligence and mathematics
Miura Y, Taniguchi T, Taniguchi M, Ohkuma T (2017) Author profiling with word+ character neural attention network. In: CLEF (Working notes)
Nakov P, Barrón-Cedeño A, Elsayed T, Suwaileh R, Màrquez L, Zaghouani W, Atanasova P, Kyuchukov S, Da San Martino G (2018) Overview of the clef-2018 checkthat! lab on automatic identification and verification of political claims. In: International conference of the cross-language evaluation forum for european languages, pp 372–387. Springer
Ogaltsov A, Romanov A (2017) Language variety and gender classification for author profiling in pan 2017. In: CLEF (Working notes)
Omri MN (2004) Pertinent knowledge extraction from a semantic network: application of fuzzy sets theory. International Journal on Artificial Intelligence Tools 13:705–720
Omri MN (2004) Possibilistic pertinence feedback and semantic networks for goal extraction. Asian Journal of Information Technology 3(4):258–265
Omri MN (2020) Fuzzy ontology-based querying user’ requests under uncertain environment. International Journal of Cognitive Informatics and Natural Intelligence 14
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol 10, pp 79–86. Association for Computational Linguistics
Passant A, Bojars U, Breslin JG, Hastrup T, Stankovic M, Laublet P, et al. (2010) An overview of smob 2: open, semantic and distributed microblogging. In: ICWSM, pp 303–306
Pham XH, Lee NH, Jung JJ, Sadeghi-Niaraki A (2013) Collaborative spam filtering based on incremental ontology learning. Telecommun Syst 52(2):693–700
Polpinij J, Ghose AK (2008) An ontology-based sentiment classification methodology for online consumer reviews. In: 2008 IEEE/WIC/ACM International conference on web intelligence and intelligent agent technology, vol 1, pp 518–524
Poulston A, Waseem Z, Stevenson M (2017) Using tf-idf n-gram and word embedding cluster ensembles for author profiling. In: CLEF (Working notes)
Rangel F, Rosso P, Potthast M, Stein B (2017) Overview of the 5th author profiling task at pan 2017: gender and language variety identification in twitter. Working Notes Papers of the CLEF
Rector AL, Zanstra PE, Solomon WD, Rogers JE, Baud R, Ceusters W, Claassen W, Kirby J, Rodrigues JM, Mori AR, et al. (1998) Reconciling users’ needs and formal requirements: issues in developing a reusable ontology for medicine. IEEE Transactions on Information Technology in BioMedicine 2 (4):229–242
Reimers N, Gurevych I (2017) Optimal hyperparameters for deep lstm-networks for sequence labeling tasks. arXiv:1707.06799
Schaetti N (2017) Unine at clef 2017: Tf-idf and deep-learning for author profiling. In: CLEF (Working notes)
Sendi M, Omri MN, Abed M (2017) Possibilistic interest discovery from uncertain information in social networks. Intelligent Data Analysis 21(6):1425–1442
Sendi M, Omri MN, Abed M (2019) Discovery and tracking of temporal topics of interest based on belief-function and aging theories. Journal of Ambient Intelligence and Humanized Computing 10 (9):3409–3425
Sierra S, Montes-y Gómez M, Solorio T, González FA (2017) Convolutional neural networks for author profiling. Working Notes of the CLEF
Song Q, Hu W, Xie W (2002) Robust support vector machine with bullet hole image classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 32(4):440–448
Stamatatos E, Rangel F, Tschuggnall M, Stein B, Kestemont M, Rosso P, Potthast M (2018) Overview of pan 2018. In: International conference of the cross-language evaluation forum for european languages, pp 267–285. Springer
Vilares D, Hermo M, Alonso MA, Gómez-rodríguez C, Vilares J (2014) Lys at clef replab 2014: creating the state of the art in author influence ranking and reputation classification on twitter. In: Clef (working notes), pp 1468–1478
Villatoro-Tello E, Ramírez-de-la Rosa G, Sánchez-Sánchez C, Jiménez-Salazar H, Luna-Ramírez WA, Rodríguez-Lucatero C (2014) Uamclyr at replab 2014: author profiling task. In: CLEF (Working notes), pp 1547–1558
Zhang X (1999) Using class-center vectors to build support vector machines. In: Neural networks for signal processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop, pp 3–11. IEEE
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820
Zipf GK (2016) Human behavior and the principle of least effort: an introduction to human ecology. Ravenio Books
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mabrouk, O., Hlaoua, L. & Omri, M.N. Exploiting ontology information in fuzzy SVM social media profile classification. Appl Intell 51, 3757–3774 (2021). https://doi.org/10.1007/s10489-020-01939-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01939-2