Abstract:
The classification of textual documents has been widely studied. The majority of classification approaches use supervised learning methods, which are acceptable for rathe...Show MoreMetadata
Abstract:
The classification of textual documents has been widely studied. The majority of classification approaches use supervised learning methods, which are acceptable for rather small corpora allowing experts to generate representative sets of data for the training, but are not feasible for significant flows of data. Unsupervised classification methods discover latent (hidden) classes automatically while minimizing human intervention. Many such methods exist, among which Kohonen self- organizing maps (SOM), which gather a certain number of similar objects without prior information. In this paper, we evaluate and compare the use of SOMs for the classification of textual documents in two situations: a conceptual representation of texts and a representation based on n-grams.
Date of Conference: 31 March 2008 - 04 April 2008
Date Added to IEEE Xplore: 22 April 2008
ISBN Information: