A directed batch growing approach to enhance the topology preservation of self-organizing map
Graphical abstract
Introduction
High-dimensional data can be difficult to visualize and interpret, particularly when the data of interest lies on a nonlinear manifold. Unsupervised learning methods can be used to generate data-mapping from high to a lower dimensional space and ease the interpretation and visualization of the data [1], [2]. Among the various techniques, Kohonen’s Self-Organizing Map (SOM) [3] has paid specific attention according to the ability of nonlinear data mapping from high dimensional data space to a (usually) two dimensional feature map, while preserving topological order of the data and capturing embedded manifold [4], [5], [6]. SOM is usually used as the tools for the knowledge extraction by visualizing data and reveal the clusters and relationship between data points. If the SOM fails to provide correct topological representation of the data points, then the interpretation of the resulted map and clusters will also be incorrect. Therefore, a proper representation of the dataset on the feature map is an important for data visualization and accurate interpretation [7]. The SOM algorithm performs in two steps, competition among the neurons to find the winner and adaptation of the weight vector of the winner neuron and its topological neighbors. A predefined structure and learning parameters should be assigned before the initialization of training process. This could be one of the major limitations of conventional SOM networks. Practically it needs a time-consuming method to generate numerous feature maps with pre-determined sizes and evaluate the produced map based on subjective criteria [8], [9]. The issue makes SOM unsuitable for online learning and handling non-stationary datasets. To work around this limitation, a dynamic variant of the SOM, growing SOM (GSOM) were proposed [10], [11] which has been applied to many different fields [7], [12], [13], [14], [15], [16]. Instead of being confined to a predetermined number of neurons, GSOM offers a flexible structure and requires less number of epochs compared to the original SOM which enable the ability to learn the nonlinear manifolds in high dimensional feature space [17], [18], [19]. Training phase of GSOM starts from a small number of initial neurons to a larger map by adding new neurons inside the network.
Despite the dynamic nature of GSOM algorithm, careful inspection of the GSOM grid revealed some negative effects which include wrapping and twisting of the map. In a topologically ordered SOM, distance between two neurons on the grid (link distance) is related to the similarity of the input vectors mapped on the respective neurons. Inappropriate positioning of the neighbor neurons (weight vector of neighbor neurons) in the feature space can cause mapping of similar input vectors on distant neurons and vice versa. As shown in Fig. 1b, in a topologically ordered map, the input data points 1 and 3 are dissimilar and the corresponding winner neurons are A and C respectively, which are distant on the grid (Fig. 1b). However, these data points are mapped on the neighbor neurons (A and D) in a twisted map (Fig. 1c). These effects could be considered as a consequence of improper weight initialization and neuron addition in growing phase and cause distortion of the map which may not be corrected in later stages [7]. In this way, similar input vectors will be mapped on distant neurons on grid which could result in misconfiguration of the map [20], [21]. Unexpected growth of the network is also another problem resulting in dead neurons and impose additional computation time of learning. Some incremental growing strategies were proposed [10], [17], [22], [23] which provide growing methodologies include assigning position and initial weight vector of new neurons based on the interpolation of weight vectors of adjacent nodes.
The majority of previously presented growing strategies use accumulated error only to find the candidate boundary neurons. A boundary neuron is defined as any neuron that has at least one adjacent free position on the grid. At the next step, all free positions around the candidate neuron are filled and the weight vector initialization of new neurons are performed without considering the accumulated error of the neurons around. This paper proposed a modified batch growing approach for SOM called directed batch growing self-organizing map (DBGSOM) which uses the accumulative error of the neurons on the grid to direct the growing phase in term of position and weight initialization of new neurons. The contribution of this paper lies in its predefined growing rules to control and evolve the structure of the network and enhance the topology preservation of the map. This approach starts with a fewer number of neurons in a rectangular grid. As training cycles pass, new neurons are added whenever need to be. Some rules were defined and enforced to manage the lattice growth in a proper direction to enhance the topology preservation quality and avoid twisting and misconfiguration of the map. New neurons can be added from boundaries by filling one of the adjacent free positions and assigning a proper weight vector in order to improve the topographic quality of the map and help the map to learn the manifold of the data in high dimensional feature space.
The remaining parts of the paper are organized as follow; Section 2 reviews the related works on GSOMs. Detail of the GSOM algorithm and the proposed directed batch growing strategy for GSOM (DBGSOM) are presented in Section 3. Section 4 presents experimental results of DBGSOM compared to GSOM and conventional SOM. Some general discussions of the results and a comparison of map quality in term of topology preservation is also presented in this section. A conclusion and possible future aspect of this work are drawn in Section 5.
Section snippets
Related works
Various methods have been proposed to resolve the deficiencies of classical SOMs such as fixed size and topology of the map. Hence, defining a learning scheme along with a dynamic structure for SOMs is a solution to overcome the limitations of the maps with static structure. Several types of the dynamic SOM that grow during the learning process have been proposed in the past. Alternative GSOM models mainly differ in three main aspects: deciding when to add new neurons, where to add them on the
Growing Self Organizing Map
The growing self-organizing map as an unsupervised neural network [10], [29] can dynamically grow and adapt its size and shape to represent the input data structure with a controllable spread. The GSOM network usually starts with four neurons in a rectangular lattice (Fig. 1). This could ensure that all neurons have similar lattice condition at the initialization phase. The learning algorithm in typical GSOM consists the following phases:
At the initialization phase, the weight vectors of the
Results and discussion
In this section, to evaluate topology preservation performance of DBGSOM, we presented several experiments conducted with one computer-generated and six real-world benchmark datasets with different characteristics (number of samples and variables). Data characteristics and literature references are collected in Table 2. In order to show the potential of the proposed method on large datasets, we used the breast cancer microarray dataset [37], [38] which includes the expression level of 47293
Conclusion
In this paper, we presented a new growing approach for GSOM in batch learning mode called DBGSOM. Comparing to other approaches, the growing process in DBGSOM is based on the accumulative error around the candidate boundary neuron which used to direct the addition of new neurons on a proper grid positions. In DBGSOM, new neurons can be added from boundaries by filling one of the adjacent available positions with a proper weight vector assignation. Some rules were defined and enforced to manage
Acknowledgments
We would also thankfully acknowledge the insightful comments and suggestions of the anonymous reviewers. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References (45)
- et al.
Enhancement of topology preservation and hierarchical dynamic self-organising maps for data visualisation
Int. J. Approximate Reasoning
(2003) - et al.
Genetic algorithms for architecture optimisation of counter-propagation artificial neural networks
Chemom. Intell. Lab. Syst.
(2011) - et al.
Topology representing networks
Neural Netw.
(1994) - et al.
Growing Self-Organizing Map with cross insert for mixed-type data clustering
Appl. Soft Comput.
(2012) - et al.
Mining massive document collections by the WEBSOM method
Inf. Sci.
(2004) Self-organizing map algorithm and distortion measure
Neural Netw.
(2006)- et al.
Information Visualization in Data Mining and Knowledge Discovery
(2017) - et al.
A global geometric framework for nonlinear dimensionality reduction
Science
(2000) Self-Oganizing Maps, Vol. 30 of Springer Series in Information Sciences
(2001)- et al.
Bibliography of self-organizing map (SOM) papers: 1998–2001 addendum
Neural Comput. Surv.
(2003)
Topology-based hierarchical clustering of self-organizing maps
IEEE Trans. Neural Netw.
Learning nonlinear principal manifolds by self-organising maps
Principal Manifolds for Data Visualization and Dimension Reduction
Automated Knowledge Acquisition
Dynamic self-organizing maps with controlled growth for knowledge discovery
Neural Networks IEEE Trans.
Dynamic self-organising maps: theory, methods and applications
Dynamic self organizing maps for discovery and sharing of knowledge in multi agent systems
Web Intell. Agent Syst. Int. J.
An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data
Bioinformatics
Growing self-organizing maps for surface reconstruction from unstructured point clouds
Integration of growing self-organizing map and continuous genetic algorithm for grading lithium-ion battery cells
Appl. Soft Comput.
Cluster identification and separation in the growing self-organizing map: application in protein sequence classification
Neural Comput. Appl.
HDGSOM: a modified growing self-organizing map for high dimensional data clustering
The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data
Neural Networks IEEE Trans.
Cited by (15)
Multi-scale Self-Organizing Map assisted Deep Autoencoding Gaussian Mixture Model for unsupervised intrusion detection
2021, Knowledge-Based SystemsCitation Excerpt :To learn the representation that accurately reflects the topological relationship of the data is the ultimate goal of those Intrusion Detection approaches employing SOM [26]. The SOM architectures can be either Hierarchical Self-Organizing Maps [27] or Growing Hierarchical Self-Organizing Maps [28]. Ramadas et al. construct more than one SOM to recognize multiple patterns for each network service [29].
A faster dynamic convergency approach for self-organizing maps
2023, Complex and Intelligent SystemsSelf-Organizing Clustering by Growing-SOM for EEG-based Biometrics
2023, 2023 International Conference on Artificial Intelligence and Applications, ICAIA 2023 and Alliance Technology Conference, ATCON-1 2023 - ProceedingA Survey on the Development of Self-Organizing Maps for Unsupervised Intrusion Detection
2021, Mobile Networks and ApplicationsCognitive Map Construction and Navigation Based on Hippocampal Place Cells
2021, Zidonghua Xuebao/Acta Automatica Sinica