Statistical pattern recognition techniques, supervised and unsupervised classification techniques being two good examples here, rely on the computations of similarity and distance metrics. The
distances are computed in a multi-dimensional space. The axes of this space in principle relate to the features inherent in the input data. Usually, such features are chosen by neural network developers, thereby introducing a possible bias. A method of automatically generating feature sets is discussed, with specific reference to the categorisation of streams of free-text news items. The feature sets were generated by a procedure that automatically selects a group of keywords based on a lexico-semantic analysis. Three different types of text streams – headlines only, news summaries and full news items including the body of the text –have been categorised using Self-Organising Feature Maps (SOFM). A method for assessing the discrimination ability of a SOFM, based on Fisher’s Linear Discriminant Rule suggests that the maps trained on vectors related to summaries only provides a fairly accurate cluster when compared with vectors related to full text. The use of summaries as document surrogates for document categorisation is suggested.
Similar content being viewed by others
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ahmad, K., Vrusias, B. & Ledford, A. Choosing Feature Sets for Training and Testing Self-Organising Maps: A Case Study . Neural Computing & Applications 10, 56–66 (2001). https://doi.org/10.1007/s005210170018
Issue Date:
DOI: https://doi.org/10.1007/s005210170018