Unsupervised connectionist algorithms for clustering an environmental data set: A comparison

doi:10.1016/S0925-2312(98)00123-4

Neurocomputing

Volume 28, Issues 1–3, October 1999, Pages 177-189

https://doi.org/10.1016/S0925-2312(98)00123-4 Get rights and content

Abstract

Various unsupervised algorithms for vector quantization can be found in the literature. Being based on different assumptions, they do not all yield exactly the same results on the same problem. To better understand these differences, this article presents an evaluation of some unsupervised neural networks, considered among the most useful for quantization, in the context of a real-world problem: radioelectric wave propagation. Radio wave propagation is highly dependent upon environmental characteristics (e.g. those of the city, country, mountains, etc.). Within the framework of a cell net planning its radiocommunication strategy, we are interested in determining a set of environmental classes, sufficiently homogeneous, to which a specific prediction model of radio electrical field can be applied. Of particular interest are techniques that allow improved analysis of results. Firstly, Mahalanobis’ distance, taking data correlation into account, is used to make assignments. Secondly, studies of class dispersion and homogeneity, using both a data structure mapping representation and statistical analysis, emphasize the importance of the global properties of each algorithm. In conclusion, we discuss the advantages and disadvantages of each method on real problems.

Introduction

Faced with the current explosion of mobile communication systems, cell net planning is a strategic stage for telecommunication operators. Choosing the location and size of the glazed zone of transmitting stations is a key factor in optimizing the development of a radio mobile network. Cell planning depends upon the attenuation of radio electrical waves. Moreover, the laws of wave propagation change with the environment. Currently, there is no global theoretical model to explain this attenuation under all circumstances. Conversely, a statistical model can easily be built (for example, using supervised connectionist techniques), if homogeneous classes of environment can be defined (for example, in the mountain or in a sparse suburb). We thus aim to define a partition of the environment that is homogeneous enough to provide a correct predictive model of the radio electrical wave propagation for each class.

Such a clustering can be obtained with vector quantization algorithms, including unsupervised neural networks. A variety of such models have been presented in the literature, using different approaches, and thus giving different results, for determining classes and the frontiers between them. It is clear that such algorithms cannot be compared on their performances, but only for the subjective quality of their results. This is one reason why it is very interesting to compare unsupervised algorithms on a real data set. As mentioned, this comparison can only be subjective, in terms of the properties of the obtained classes and frontiers. Algorithms used in the vector quantization models are presented in Section 2, whereas Section 3 proposes a set of statistical tools allowing for a better analysis of the behavior of these models. A national geographic database, describing physical geography in France, is used for this comparison. We extract a random corpus of 5000 patterns from four typical regions represented in total by 65,000 patterns. Each pattern has eight attributes: altitude and the percentage presence of seven other parameters (water, wood, field, rock and three grades of construction density) within an area of $400×400 m^{2}$ . Each parameter is standardized according to mean and standard deviation calculated over the 65,000 patterns.

Section snippets

Models

For defining classes and frontiers between them, unsupervised learning is very useful. This technique can cluster data without heuristic or knowledge. The wide range of methods available shows that there is no one algorithm available for all problems, producing consistently good results. The question is rather to know what are their specific advantages. The methods presented in this paper have their roots in biological observations, mathematics or statistical physics. Some of them preserve

Evaluation

To better understand the functioning and the specific qualities of these unsupervised algorithms, we applied each of them to the 5000 patterns mentioned above. Their evaluation was conducted with two statistical tools that we introduce now.

Conclusion

Our work evaluates some of the most useful unsupervised neural networks on a partition problem. All vector quantization methods we have tested use a distance measure between patterns. Mahalanobis’ distance was applied to decorrelate parameters: a pattern is assigned to a class by using the covariance matrix for all patterns. To have a visual display, patterns and prototypes were projected onto a two-dimensional space with Sammon's nonlinear mapping algorithm. It can be concluded that the

Laurent Bougrain is a Ph.D. student in computer science in the CORTEX team at LORIA/INRIA-Lorraine and a CNRS research engineer. He holds a post-graduate diploma in artificial intelligence from the University of Paris, Pierre and Marie Curie. His research interests are Hybrid systems, Unsupervised classifiers and Contextual phenomena. He is also working on spatial representations and neural networks for his master's degree in psychology.

References (8)

D.E. Rumelhart et al.
Feature discovery by competitive learning
Cognitive Sci.
(1985)
J. Buhmann, H. Kühnel, Complexity optimized vector quantization: a neural network approach, in: James A. Storer, M....
D. DeSieno, Adding a conscience to competitive learning, Proceedings of International Conference on Neural Networks,...
B. Fritzke, A growing neural gas network learns topologies, in: G. Tesauro, D.S. Touretzky, T.K. Leen (Eds.), Advances...

There are more references available in the full text version of this article.

Cited by (0)

Frédéric Alexandre is a Senior Research Scientist of INRIA at INRIA-Lorraine/ LORIA-CNRS. He holds a B.S. in Computer Science from the Institut National Polytechnique de Lorraine and a Ph.D. in Computer Science from the University Henri Poincaré, Nancy, where he defended his dissertation entitled “A functional modelization of the cortex: the cortical column”. He also has a Master's degree in Psychology and another in Mechanics. His current research interests concern artificial neural networks for signal and speech recognition and biologically inspired architectures and learning rules for neural net design. As the head of the CORTEX team which concentrates connectionist activities in INRIA-Lorraine/LORIA-CNRS, he has developed and refined a number of connectionist models and has participated in industrial applications involving the use of connectionist and symbolic tools. He was also the leader of the European ESPRIT project MIX on neurosymbolic integration.

^☆: This research was supported by the Centre National d'Etudes des Télécommunications through Contract no. 97 1B008.

View full text

Unsupervised connectionist algorithms for clustering an environmental data set: A comparison☆