Elsevier

Applied Soft Computing

Volume 55, June 2017, Pages 424-435
Applied Soft Computing

A directed batch growing approach to enhance the topology preservation of self-organizing map

https://doi.org/10.1016/j.asoc.2017.02.015Get rights and content

Highlights

  • A potent batch growing approach for GSOM was proposed.

  • Neuron insertion rules were defined to manage the map growth in proper directions.

  • DBGSOM performs much better than GSOM and SOM in term of topology preservation.

Abstract

The growing self-organizing map (GSOM) possesses effective capability to generate feature maps and visualizing high-dimensional data without pre-determining their size. Most of the proposed growing SOM algorithms use an incremental learning strategy. The conventional growing approach of GSOM is based on filling all available position around the candidate neuron which can decrease the topology preservation quality of the map due to the misconfiguration and twisting of the map which could be a consequence of unexpected network growth and improper neuron addition and weight initialization. To overcome this problem, in this paper we introduce a batch learning strategy for growing self-organizing maps called DBGSOM which direct the growing process based on the accumulative error around the candidate boundary neuron. In the proposed growing approach, just one new neuron is added around each candidate boundary neuron. The DBGSOM offers suitable mechanisms to find a proper growing positions and allocating initial weight vectors for the new neurons.

The potential of the DBGSOM was investigated with one synthetic dataset and six real-world benchmark datasets in terms of topology preservation and mapping quality. Experimental results showed that the proposed growing strategy provides an enhanced topology preserved map and reduces the susceptibility of twisting compared to GSOM. Furthermore, the proposed method has a better clustering ability than GSOM and SOM. According to the lower number of neurons generated by DBGSOM, it needs less time to learn the manifold of the data points compared to GSOM.

Introduction

High-dimensional data can be difficult to visualize and interpret, particularly when the data of interest lies on a nonlinear manifold. Unsupervised learning methods can be used to generate data-mapping from high to a lower dimensional space and ease the interpretation and visualization of the data [1], [2]. Among the various techniques, Kohonen’s Self-Organizing Map (SOM) [3] has paid specific attention according to the ability of nonlinear data mapping from high dimensional data space to a (usually) two dimensional feature map, while preserving topological order of the data and capturing embedded manifold [4], [5], [6]. SOM is usually used as the tools for the knowledge extraction by visualizing data and reveal the clusters and relationship between data points. If the SOM fails to provide correct topological representation of the data points, then the interpretation of the resulted map and clusters will also be incorrect. Therefore, a proper representation of the dataset on the feature map is an important for data visualization and accurate interpretation [7]. The SOM algorithm performs in two steps, competition among the neurons to find the winner and adaptation of the weight vector of the winner neuron and its topological neighbors. A predefined structure and learning parameters should be assigned before the initialization of training process. This could be one of the major limitations of conventional SOM networks. Practically it needs a time-consuming method to generate numerous feature maps with pre-determined sizes and evaluate the produced map based on subjective criteria [8], [9]. The issue makes SOM unsuitable for online learning and handling non-stationary datasets. To work around this limitation, a dynamic variant of the SOM, growing SOM (GSOM) were proposed [10], [11] which has been applied to many different fields [7], [12], [13], [14], [15], [16]. Instead of being confined to a predetermined number of neurons, GSOM offers a flexible structure and requires less number of epochs compared to the original SOM which enable the ability to learn the nonlinear manifolds in high dimensional feature space [17], [18], [19]. Training phase of GSOM starts from a small number of initial neurons to a larger map by adding new neurons inside the network.

Despite the dynamic nature of GSOM algorithm, careful inspection of the GSOM grid revealed some negative effects which include wrapping and twisting of the map. In a topologically ordered SOM, distance between two neurons on the grid (link distance) is related to the similarity of the input vectors mapped on the respective neurons. Inappropriate positioning of the neighbor neurons (weight vector of neighbor neurons) in the feature space can cause mapping of similar input vectors on distant neurons and vice versa. As shown in Fig. 1b, in a topologically ordered map, the input data points 1 and 3 are dissimilar and the corresponding winner neurons are A and C respectively, which are distant on the grid (Fig. 1b). However, these data points are mapped on the neighbor neurons (A and D) in a twisted map (Fig. 1c). These effects could be considered as a consequence of improper weight initialization and neuron addition in growing phase and cause distortion of the map which may not be corrected in later stages [7]. In this way, similar input vectors will be mapped on distant neurons on grid which could result in misconfiguration of the map [20], [21]. Unexpected growth of the network is also another problem resulting in dead neurons and impose additional computation time of learning. Some incremental growing strategies were proposed [10], [17], [22], [23] which provide growing methodologies include assigning position and initial weight vector of new neurons based on the interpolation of weight vectors of adjacent nodes.

The majority of previously presented growing strategies use accumulated error only to find the candidate boundary neurons. A boundary neuron is defined as any neuron that has at least one adjacent free position on the grid. At the next step, all free positions around the candidate neuron are filled and the weight vector initialization of new neurons are performed without considering the accumulated error of the neurons around. This paper proposed a modified batch growing approach for SOM called directed batch growing self-organizing map (DBGSOM) which uses the accumulative error of the neurons on the grid to direct the growing phase in term of position and weight initialization of new neurons. The contribution of this paper lies in its predefined growing rules to control and evolve the structure of the network and enhance the topology preservation of the map. This approach starts with a fewer number of neurons in a rectangular grid. As training cycles pass, new neurons are added whenever need to be. Some rules were defined and enforced to manage the lattice growth in a proper direction to enhance the topology preservation quality and avoid twisting and misconfiguration of the map. New neurons can be added from boundaries by filling one of the adjacent free positions and assigning a proper weight vector in order to improve the topographic quality of the map and help the map to learn the manifold of the data in high dimensional feature space.

The remaining parts of the paper are organized as follow; Section 2 reviews the related works on GSOMs. Detail of the GSOM algorithm and the proposed directed batch growing strategy for GSOM (DBGSOM) are presented in Section 3. Section 4 presents experimental results of DBGSOM compared to GSOM and conventional SOM. Some general discussions of the results and a comparison of map quality in term of topology preservation is also presented in this section. A conclusion and possible future aspect of this work are drawn in Section 5.

Section snippets

Related works

Various methods have been proposed to resolve the deficiencies of classical SOMs such as fixed size and topology of the map. Hence, defining a learning scheme along with a dynamic structure for SOMs is a solution to overcome the limitations of the maps with static structure. Several types of the dynamic SOM that grow during the learning process have been proposed in the past. Alternative GSOM models mainly differ in three main aspects: deciding when to add new neurons, where to add them on the

Growing Self Organizing Map

The growing self-organizing map as an unsupervised neural network [10], [29] can dynamically grow and adapt its size and shape to represent the input data structure with a controllable spread. The GSOM network usually starts with four neurons in a rectangular lattice (Fig. 1). This could ensure that all neurons have similar lattice condition at the initialization phase. The learning algorithm in typical GSOM consists the following phases:

At the initialization phase, the weight vectors of the

Results and discussion

In this section, to evaluate topology preservation performance of DBGSOM, we presented several experiments conducted with one computer-generated and six real-world benchmark datasets with different characteristics (number of samples and variables). Data characteristics and literature references are collected in Table 2. In order to show the potential of the proposed method on large datasets, we used the breast cancer microarray dataset [37], [38] which includes the expression level of 47293

Conclusion

In this paper, we presented a new growing approach for GSOM in batch learning mode called DBGSOM. Comparing to other approaches, the growing process in DBGSOM is based on the accumulative error around the candidate boundary neuron which used to direct the addition of new neurons on a proper grid positions. In DBGSOM, new neurons can be added from boundaries by filling one of the adjacent available positions with a proper weight vector assignation. Some rules were defined and enforced to manage

Acknowledgments

We would also thankfully acknowledge the insightful comments and suggestions of the anonymous reviewers. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References (45)

  • K. Tasdemir et al.

    Topology-based hierarchical clustering of self-organizing maps

    IEEE Trans. Neural Netw.

    (2011)
  • H. Yin

    Learning nonlinear principal manifolds by self-organising maps

    Principal Manifolds for Data Visualization and Dimension Reduction

    (2008)
  • S. Sestito et al.

    Automated Knowledge Acquisition

    (1994)
  • D. Alahakoon et al.

    Dynamic self-organizing maps with controlled growth for knowledge discovery

    Neural Networks IEEE Trans.

    (2000)
  • A.L. Hsu et al.

    Dynamic self-organising maps: theory, methods and applications

    (2017)
  • L.K. Wickramasinghe et al.

    Dynamic self organizing maps for discovery and sharing of knowledge in multi agent systems

    Web Intell. Agent Syst. Int. J.

    (2005)
  • A.L. Hsu et al.

    An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data

    Bioinformatics

    (2003)
  • R.L. do Rêgo et al.

    Growing self-organizing maps for surface reconstruction from unstructured point clouds

  • R. Kuo et al.

    Integration of growing self-organizing map and continuous genetic algorithm for grading lithium-ion battery cells

    Appl. Soft Comput.

    (2012)
  • N. Ahmad et al.

    Cluster identification and separation in the growing self-organizing map: application in protein sequence classification

    Neural Comput. Appl.

    (2010)
  • R. Amarasiri et al.

    HDGSOM: a modified growing self-organizing map for high dimensional data clustering

  • A. Rauber et al.

    The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data

    Neural Networks IEEE Trans.

    (2002)
  • Cited by (15)

    • Multi-scale Self-Organizing Map assisted Deep Autoencoding Gaussian Mixture Model for unsupervised intrusion detection

      2021, Knowledge-Based Systems
      Citation Excerpt :

      To learn the representation that accurately reflects the topological relationship of the data is the ultimate goal of those Intrusion Detection approaches employing SOM [26]. The SOM architectures can be either Hierarchical Self-Organizing Maps [27] or Growing Hierarchical Self-Organizing Maps [28]. Ramadas et al. construct more than one SOM to recognize multiple patterns for each network service [29].

    • Self-Organizing Clustering by Growing-SOM for EEG-based Biometrics

      2023, 2023 International Conference on Artificial Intelligence and Applications, ICAIA 2023 and Alliance Technology Conference, ATCON-1 2023 - Proceeding
    View all citing articles on Scopus
    View full text