A Probabilistic Vector Representation and Neural Network for Text Classification

Bounabi, Mariem; El Moutaouakil, Karim; Satori, Khalid

doi:10.1007/978-3-319-96292-4_27

Mariem Bounabi¹²,
Karim El Moutaouakil¹³ &
Khalid Satori¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Included in the following conference series:

International Conference on Big Data, Cloud and Applications

1192 Accesses
7 Citations

Abstract

The increasing of the textual databases and its representation in large spaces prevents the automation of the treatment of these great masses and the extraction of knowledge. In order to address the challenges of high dimensionality which using the methods and technics of the text mining. Where the term frequency-inverse document frequency (TF-IDF), weighting method, is the most required approach to represent the document. Unfortunately, TF-IDF produces descriptors of large sizes (generally greater than 1000), which requires models with great complexity. However, the texts classification systems based on these models suffer from the overfitting phenomenon and are very slow. Therefore, to overcome these problems, we use the select attributes methods; by giving the deterministic aspect of this latter, we risk to lose huge information. Thus, to recover from this loss, we propose a probabilistic vector representation of each document, based on the relevant terms selected previously. Then, we associate a set of features to each document composed by local and global probabilistic coefficients basing on the selected terms. More specifically and precisely, the components formulas are composed by the frequency of each descriptor, the length of each document and the size of the corpus. To show the performance of this treatment we propose comparative studies between TF-IDF representation and the new probabilistic representation, to classify the BBCSPORT corpus. Moreover, in the classification phase, we use several versions of Bayesian Network and Multilayer Perceptron. The obtained results are satisfied, where the neural network classifier, multilayer perceptron, gives 100% as a recognition rate, using the new representation and 94.69%, using the simple TF-IDF weighting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic Text Classification Using Neural Network and Statistical Approaches

Empirical Study to Evaluate the Performance of Classification Algorithms on Public Datasets

Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification

References

Ahmad, M.: Machine learning approach to text mining: a review. Int. J. 4(6), 1125–1131 (2014)
Google Scholar
Tan, A.H.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, vol. 8, pp. 65–70. Sn (1999)
Google Scholar
Kumar, L., Bhatia, P.K.: Text mining: concepts, process and applications. Int. J. Global Res. Comput. Sci. (UGC Approv. J.) 4(3), 36–39 (2013)
Google Scholar
Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. J. Comput. Sci. 98, 4–6 (1998)
Google Scholar
Trinh, A.P.: Classification de texte et estimation probabiliste par Machine à Vecteurs de Support. Actes du troisième DÉfi Fouille de Textes, pp. 77 (2007)
Google Scholar
Vu, T., Denoyer, L., Gallinari, P.: Un modèle statistique pour la classification de documents structurés (2003)
Google Scholar
Eensoo, E., Nouvel, D., Martin, A., Valette, M.: Combiner analyses textométriques, apprentissage supervisé et représentation vectorielle pour l’analyse de la subjectivité. In: 11e Défi Fouille de Texte (DEFT 2015), Caen, France (2016)
Google Scholar
Li, Y., Luo, C., Chung, S.M.: Text clustering with feature selection by using statistical data. IEEE Trans. Knowl. Data Eng. 20(5), 641–652 (2008)
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Bouckaert, R.R.: Bayesian network classifiers in weka for version 3-5-7. Artificial Intelligence Tools 11(3), 369–387 (2008)
Google Scholar
Panchal, G., Ganatra, A., Kosta, Y.P., Panchal, D.: Behaviour analysis of multilayer perceptronswith multiple hidden neurons and hidden layers. Int. J. Comput. Theory Eng. 3(2), 332 (2011)
Article Google Scholar
Buscema, M., Tastle, W.J., Terzi, S.: Meta Net: a new meta-classifier family. In: Tastle, W. (ed.) Data Mining Applications Using Artificial Adaptive Systems, pp. 141–182. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-4223-3_5
Chapter Google Scholar
Ettaouil, M., Ghanou, Y.: Neural architectures optimization and Genetic algorithms. Wseas Trans. Comput. 8(3), 526–537 (2009)
Google Scholar
Dahmouni, A., El Moutaouakil, K., Satori, K.: Robust face recognition using local gradient probabilistic pattern (LGPP). In: El Oualkadi, A., Choubani, F., El Moussati, A. (eds.) MedCT 2015. LNEE, vol. 380, pp. 277–286. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30301-7_29
Chapter Google Scholar
Fox, E.A.: Extending the boolean and vector space models of information retrieval with P-norm queries and multiple concept types (1983)
Google Scholar
Maron, M.E., Kuhns, J.L.: On relevance, probabilistic indexing and information retrieval. J. ACM (JACM) 7(3), 216–244 (1960)
Article Google Scholar
Robertson, S.E., Walker, S.: On relevance weights with little relevance information. In: ACM SIGIR Forum, vol. 31, no. SI, pp. 16–24. ACM (1997)
Google Scholar
Hall, M.A.: Correlation-based feature subset selection for machine learning. Thesis submitted in partial fulfillment of the requirement of the degree of Doctor of Philosophy at the University of Waikato (1998)
Google Scholar
Kjaerulff, U.B., Madsen, A.L.: Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis. Springer, New York (2008). https://doi.org/10.1007/978-0-387-74101-7. vol. 200, p. 114
Book MATH Google Scholar
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 616–623 (2003)
Google Scholar
Aharrane, N., Dahmouni, A., El Moutaouakil, K., Satori, K.: A robust statistical set of features for Amazigh handwritten characters. Pattern Recognit. Image Anal. 27(1), 41–52 (2017)
Article Google Scholar
Raschka, S.: Naive bayes and text classification I-introduction and theory. arXiv preprint arXiv:1410.5329 (2014)

Download references

Author information

Authors and Affiliations

Computer Sciences, Imaging and Numerical Analysis Laboratory (LIIAN), Fez, Morocco
Mariem Bounabi & Khalid Satori
Hoceima National School of Applied Sciences (ENSAH), University Mohammed First, Al-Hoceima, Morocco
Karim El Moutaouakil

Authors

Mariem Bounabi
View author publications
You can also search for this author in PubMed Google Scholar
Karim El Moutaouakil
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Satori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariem Bounabi .

Editor information

Editors and Affiliations

Abdelmalek Essaâdi University, Tétouan, Morocco
Youness Tabii
Abdelmalek Essaâdi University, Tétouan, Morocco
Mohamed Lazaar
Abdelmalek Essaâdi University, Tétouan, Morocco
Mohammed Al Achhab
Université Ibn-Tofail, Tétouan, Morocco
Nourddine Enneya

Appendix

See Tables 7 and 8 and Fig. 5.

Table 7. Results for different algorithms for learning the network structure using the new probabilistic method

Full size table

Table 8. Results for different Number of hidden nodes

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bounabi, M., El Moutaouakil, K., Satori, K. (2018). A Probabilistic Vector Representation and Neural Network for Text Classification. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-96292-4_27
Published: 14 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Probabilistic Vector Representation and Neural Network for Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Text Classification Using Neural Network and Statistical Approaches

Empirical Study to Evaluate the Performance of Classification Algorithms on Public Datasets

Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Probabilistic Vector Representation and Neural Network for Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Text Classification Using Neural Network and Statistical Approaches

Empirical Study to Evaluate the Performance of Classification Algorithms on Public Datasets

Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation