Neural Text Categorizer for topic identification of noisy Arabic Texts | IEEE Conference Publication | IEEE Xplore

Neural Text Categorizer for topic identification of noisy Arabic Texts


Abstract:

This paper deals with the topic identification problem, which consists of recognizing the subject in which the text is written. Despite there exist several statistical an...Show More

Abstract:

This paper deals with the topic identification problem, which consists of recognizing the subject in which the text is written. Despite there exist several statistical and machine learning approaches addressing the tackled problem, unfortunately, most of them assume relatively clean and long texts, and they present failure in corrupted or short texts. Moreover, there are few works were undergone on the Arabic language which is a rich language and the more complex one. For that reason, we aimed to conduct our investigation in topic identification of noisy Arabic texts. To overcome the addressed problem, we present the design and implementation of the Neural Text Categorizer (NTC), which is a novel Neural Network and different from the existing NNs in some concepts. Furthermore, we present and discuss the proposed improvement of the NTC (called NTCT), where it is based on TF-IDF weights and consists of modifying the input vector and the classification formula. The empirical evaluation of the two algorithms was undergone on in-house corpus (called ANTSIX) containing discussion forum texts. We also carried out a comparison between our best findings and the state of the art. We found that the proposed NTCT maintained consistently high performances and outperformed several algorithms in topic identification of noisy Arabic texts.
Date of Conference: 17-20 November 2015
Date Added to IEEE Xplore: 09 July 2016
ISBN Information:
Electronic ISSN: 2161-5330
Conference Location: Marrakech, Morocco

Contact IEEE to Subscribe

References

References is not available for this document.