On the quality of ART1 text clustering

doi:10.1016/S0893-6080(03)00088-1

Neural Networks

Volume 16, Issues 5–6, June–July 2003, Pages 771-778

https://doi.org/10.1016/S0893-6080(03)00088-1 Get rights and content

Abstract

There is a large and continually growing quantity of electronic text available, which contain essential human and organization knowledge. An important research endeavor is to study and develop better ways to access this knowledge. Text clustering is a popular approach to automatically organize textual document collections by topics to help users find the information they need. Adaptive Resonance Theory (ART) neural networks possess several interesting properties that make them appealing in the area of text clustering. Although ART has been used in several research works as a text clustering tool, the level of quality of the resulting document clusters has not been clearly established yet. In this paper, we present experimental results with binary ART that address this issue by determining how close clustering quality is to an upper bound on clustering quality.

Introduction

We consider the application of clustering to the selforganization of a textual document collection. Clustering is the operation by which similar objects are grouped together in an unsupervised manner (Jain et al., 1999, Kaufman and Rousseeuw, 1990). Hence, when clustering textual documents, one is hoping to form sets of documents with similar content. Instead of exploring the whole collection of documents, a user can then browse the resulting clusters to identify and retrieve relevant documents. As such, clustering provides a summarized view of the information space by grouping documents by topics. Clustering is often the only viable solution to organize large text collections into topics. The advantage of clustering is realized when a training set and classes definitions are unavailable, or when creating them is either cost prohibitive due to the collection shear size or unrealistic due to the rapidly changing nature of the collection.

We specifically study text clustering with Adaptive Resonance Theory (ART) (Carpenter and Grossberg, 1995, Grossberg, 1976) neural networks. ART neural networks are known for their ability to perform on-line and incremental clustering of dynamic datasets. Contrary to most other types of artificial neural networks such as the popular Back-propagation Multi-Layer Perceptron (MLP) (Rumelhart, Hinton, & Williams, 1986), ART is unsupervised and allows for plastic yet stable learning. ART detects similarities among data objects, typically data points in an N-dimensional metric space. When novelty is detected, ART adaptively and autonomously creates a new category. Another advantageous and distinguishing feature of ART is its ability to discover patterns at various levels of generality. This is achieved by setting the value of a parameter known as vigilance and denoted by ρ, ρ∈(0,1]. ART stability and plasticity properties as well as its ability to process dynamic data efficiently make it an attractive candidate for clustering large, rapidly changing text collections in real-life environments. Although ART has been investigated previously as a means of clustering text data, due to numerous variations in ART implementations, experimental data sets and quality evaluation methodologies, it is not clear whether ART performs well in this type of application. Since ART seems to be a logical and appealing solution to the rapidly growing amount of textual electronic information processed by organizations, it would be important to eliminate any confusion surrounding the quality of the text clusters it produces. In this paper, we present experimental results with a binary ART neural network (ART1) that address this issue by determining how close clustering quality achieved with ART is to an expected upper bound on clustering quality. We will consider other versions of ART in future work.

Section snippets

Related work

We consider one of the many applications of text clustering in the field of Information Retrieval (IR) (VanRijsbergen, 1979), namely clustering that aims at self-organizing textual document collections. This application of text clustering can be seen as a form of classification by topics, hence making it the unsupervised counterpart to Text Categorization (TC) (Sebastiani, 2002). Text self-organization has become increasingly popular due to the availability of large document collections that

Experimental settings

We selected two well-established cluster quality evaluation measures: Jaccard (JAC) (Downton & Brennan, 1980) and Fowlkes–Mallows (FM) (Fowlkes & Mallows, 1983): $JAC =a/(a+b+c)$ $FM =a/((a+b)(a+c))^{1/2}$ where

a is the pair-wise number of true positives, i.e. the total number of document pairs grouped together in the expected solution and that are indeed clustered together by the clustering algorithm;
b is the pair-wise number of false positives, i.e. the number of document pairs not expected to be grouped

Experimental results

We eliminated words that appear in 10, 20, 40 and 60 or less documents. In the first case, a total of 2282 term features were retained while in the last only 466 were. Our experiments indicated that less radical feature selection not only increased the number of features and consequently processing time, but also resulted in lower quality clusters in some cases (Fig. 1). Best quality is achieved at vigilance value of 0.05, with 106 clusters, a number close to the expected number of topics

Conclusions and future work

Text clustering work conducted with ART up to now has used many different forms of ART-based architectures, as well as different and non-comparable text collections and evaluation methods. This situation resulted in confusion as to the level of clustering quality achievable with ART. As a first step towards resolving this situation, we have tested a simple ART1 network implementation and evaluated its text clustering quality on the benchmark Reuter data set and with the standard F₁ measure. K

References (37)

G.A. Carpenter et al.
Artmap: supervised real-time learning and classification of nonstationary data by a self-organizing neural network
Neural Networks
(1991)
G.A. Carpenter et al.
Art2-a: an adaptive resonance algorithm for rapid category learning and recognition
Neural Networks
(1991)
G.A. Carpenter et al.
Fuzzy art: fast stable learning and categorization of analog patterns by an adaptive resonance system
Neural Networks
(1991)
K.J. MacLeod et al.
A neural algorithm for document clustering
Information Processing and Management
(1991)
C. Apte et al.
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems
(1994)
A. Baraldi et al.
Constructive feedforward art clustering networks—Part II
IEEE Transactions on Neural Networks
(2002)
F. Can
Incremental clustering for dynamic information, processing
ACM Transactions on Information Systems
(1993)
Carpenter, G. A., & Grossberg, S (1995). Adaptive Resonance Theory (ART). In Handbook of brain theory and neural...
Cutting, D. Karger, D., Pedersen, J., & Tukey, J (1992). Scatter-gather: a cluster-based approach to browsing large...
Downton, M., & Brennan, T (1980). Comparing classifications: an evaluation of several coefficient of partition...

E. Fowlkes et al.

A method for comparing two hierarchical clusterings

Journal of American Statistical Association

(1983)

M. Georgiopoulos et al.

Convergence properties of learning in ART1

Neural Computation

(1990)

S. Grossberg

Adaptive pattern classification and universal recording: I. Parallel development and coding of neural feature detectors

Biological Cybernetics

(1976)

Heuser, U., & Rosenstiel, W (2000). Automatic construction of local internet directories using hierarchical...

A.K. Jain et al.

Data clustering: a review

ACM Computing Surveys

(1999)

L. Kaufman et al.

Finding groups in data: An introduction to cluster analysis

(1990)

T. Kohonen

Springer series in information

(2001)

T. Kohonen et al.

Self organization of a document collection

IEEE Transactions on Neural Networks

(2000)

Cited by (41)

Clustering: A neural network approach
2010, Neural Networks
Citation Excerpt :
The simplest and most popular ART model is the ART 1 (Carpenter & Grossberg, 1987a) for learning to categorize arbitrarily many, complex binary input patterns presented in an arbitrary order. A popular fast learning implementation is given by Du and Swamy (2006), Moore (1988), Massey (2003) and Serrano-Gotarredona and Linares-Barranco (1996). The ART 1 is stable for a finite training set.
Clustering is a fundamental data analysis method. It is widely used for pattern recognition, feature extraction, vector quantization (VQ), image segmentation, function approximation, and data mining. As an unsupervised classification technique, clustering identifies some inherent structures present in a set of objects based on a similarity measure. Clustering methods can be based on statistical model identification (McLachlan & Basford, 1988) or competitive learning. In this paper, we give a comprehensive overview of competitive learning based clustering methods. Importance is attached to a number of competitive learning based clustering neural networks such as the self-organizing map (SOM), the learning vector quantization (LVQ), the neural gas, and the ART model, and clustering algorithms such as the $C$ -means, mountain/subtractive clustering, and fuzzy $C$ -means (FCM) algorithms. Associated topics such as the under-utilization problem, fuzzy clustering, robust clustering, clustering based on non-Euclidean distance measures, supervised clustering, hierarchical clustering as well as cluster validity are also described. Two examples are given to demonstrate the use of the clustering methods.
Knowledge acquisition method from domain text based on theme logic model and artificial neural network
2010, Expert Systems with Applications
In order to acquire knowledge from domain text such as failure analysis text of aviation product, a framework is proposed to enhance the efficiency and accuracy of knowledge acquisition. In this framework, sentence templates are defined to extract the meta-knowledge and RDF is used to describe the extracted knowledge. After the preprocessing steps, the authors propose a new model: theme logic model (TLM) to present all the themes of a piece of text and the logical relations among different themes. In this model, the text of each theme can be represented as an attribute–value vector based on domain ontology. Meanwhile, the logical relations are the domain knowledge to be acquired. The theme logic model then will be transformed to the training set of the artificial neural network to acquire the failure analysis knowledge. After training process, acquired knowledge will be extracted by SD method from the artificial neural network and represented by rules. Therefore, a prototype is developed to acquire knowledge from failure analysis reports of aviation product. Empirical results show that the framework can acquire knowledge from domain text efficiently.
Multilogistic regression by means of evolutionary product-unit neural networks
2008, Neural Networks
Citation Excerpt :
Multi-class pattern recognition has a wide range of applications including handwritten digit recognition (Chiang, 1998), speech tagging and recognition (Athanaselis et al., 2005), bioinformatics (Mahony et al., 2006) and text categorization (Massey, 2003).
We propose a multilogistic regression model based on the combination of linear and product-unit models, where the product-unit nonlinear functions are constructed with the product of the inputs raised to arbitrary powers. The estimation of the coefficients of the model is carried out in two phases. First, the number of product-unit basis functions and the exponents’ vector are determined by means of an evolutionary neural network algorithm. Afterwards, a standard maximum likelihood optimization method determines the rest of the coefficients in the new space given by the initial variables and the product-unit basis functions previously estimated. We compare the performance of our approach with the logistic regression built on the initial variables and several learning classification techniques. The statistical test carried out on twelve benchmark datasets shows that the proposed model is competitive in terms of the accuracy of the classifier.
Fuzzy Law: Towards Creating a Novel Explainable Technology-Assisted Review System for e-Discovery
2022, Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
Automatic Patents Classification Using Supervised Machine Learning
2020, Advances in Intelligent Systems and Computing
Research on NLP for RE at Fraunhofer FKIE: A report on grouping requirements
2018, CEUR Workshop Proceedings

View all citing articles on Scopus

View full text

2003 Special issueOn the quality of ART1 text clustering

Abstract

Introduction

Section snippets

Related work

Experimental settings

Experimental results

Conclusions and future work

Neural Networks

Neural Networks

Neural Networks

Information Processing and Management

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems

Constructive feedforward art clustering networks—Part II

IEEE Transactions on Neural Networks

Incremental clustering for dynamic information, processing

ACM Transactions on Information Systems

A method for comparing two hierarchical clusterings

Journal of American Statistical Association

Convergence properties of learning in ART1

Neural Computation

Adaptive pattern classification and universal recording: I. Parallel development and coding of neural feature detectors

Biological Cybernetics

Data clustering: a review

ACM Computing Surveys

Finding groups in data: An introduction to cluster analysis

Springer series in information

Self organization of a document collection

IEEE Transactions on Neural Networks

2003 Special issue
On the quality of ART1 text clustering