Spherical self-organizing map using efficient indexed geodesic data structure

doi:10.1016/j.neunet.2006.05.021

Neural Networks

Volume 19, Issues 6–7, July–August 2006, Pages 900-910

https://doi.org/10.1016/j.neunet.2006.05.021 Get rights and content

Abstract

The two-dimensional (2D) Self-Organizing Map (SOM) has a well-known “border effect”. Several spherical SOMs which use lattices of the tessellated icosahedron have been proposed to solve this problem. However, existing data structures for such SOMs are either not space efficient or are time consuming when searching the neighborhood. We introduce a 2D rectangular grid data structure to store the icosahedron-based geodesic dome. Vertices relationships are maintained by their positions in the data structure rather than by immediate neighbor pointers or an adjacency list. Increasing the number of neurons can be done efficiently because the overhead caused by pointer updates is reduced. Experiments show that the spherical SOM using our data structure, called a GeoSOM, runs with comparable speed to the conventional 2D SOM. The GeoSOM also reduces data distortion due to removal of the boundaries. Furthermore, we developed an interface to project the GeoSOM onto the 2D plane using a cartographic approach, which gives users a global view of the spherical data map. Users can change the center of the 2D data map interactively. In the end, we compare the GeoSOM to the other spherical SOMs by space complexity and time complexity.

Introduction

The Self-Organizing Map introduced by Kohonen (2001) is a popular tool for high-dimensional data analysis and visualization. Its applications can be found in various research areas such as social science (Lin, White, & Buzydlowski, 2003), web-document mining (Lagus, Kaski, & Kohonen, 2004) and robotics (Ritter, Martinetz, & Schulten, 1992). In a conventional SOM, the neighborhood relationship is defined by a two-dimensional rectangular or hexagonal lattice. During the training phase, all neurons compete with each other for the input signals. The winner and its neighbors update their connection weights. Ideally, at the end of the training, all samples are equally represented by neurons and the neighboring units in the grid tend to model similar regions of the data space. However, the grid units at the boundary of the SOM have fewer neighbors than the units inside the map, so they have fewer chances of being updated. This often leads to the “border effect” — weight vectors of these units “collapse to the center of the input space” (Sarle, 2002). Several approaches have been suggested to solve the problem, such as the heuristic weighting rule method by Kohonen (2001) and local-linear smoothing by Wand and Jones (1995). Aside from these mathematical solutions, a straightforward method is to remove the boundaries from the grid by implementing the SOM on a torus (Ito et al., 2000, Li et al., 1993) or a sphere (Boudjemaï et al., 2003, Hirokazu et al., 2005, Nakatsuka and Oyabu, 2003, Sangole and Knopf, 2003).

Compared to a toroidal SOM, the spherical SOM is more visually effective. Firstly, the area associated with each neuron varies significantly on the surface of a torus: larger around the outer circle and compressed near the inner circle. For a SOM, it is desirable to have all neurons receive equal geometrical treatment. Secondly, the toroidal SOM fails to provide an intuitively readable map. People are usually more familiar with maps generated from a sphere, such as the world maps, than they are with maps based on a torus.

Several spherical SOMs have been implemented and applied to different types of data sets. Using a tessellated platonic polyhedron as a lattice was first suggested by Ritter (1999). He also pointed out that the spherical SOM would be very suitable for data with underlying directional structures. In Sangole et al.’s work (Sangole & Knopf, 2003), a spherical SOM was employed in three-dimensional (3D) immersive virtual reality environments for interactive data analysis. Boudjemaï et al. (2003) applied the spherical SOM in 3D object modeling. While these applications have been successful, existing data structures for geodesic domes are either not space efficient or are time consuming when finding the neighbors. In this paper, we present a 2D data structure to store the spherical lattice. It reduces the overhead in maintaining the spherical grid structure. Finding the immediate neighbors of a vertex (neuron) is efficient because it is indexing in a 2D array. It also supports fast dome tessellation (increasing the number of neurons). We call a spherical SOM using this data structure a GeoSOM.

The remainder of this paper is organized as follows. Section 2 reviews existing techniques for spherical SOMs using lattices of geodesic domes. Details of our data structure are presented in Section 3. Section 4 describes the interface that we developed to visualize the GeoSOM. In Section 5, we test our data structure on two data sets. Results are compared with the corresponding 2D SOMs. In Section 6, GeoSOM is compared analytically with the existing spherical SOMs. Conclusions and future work are presented in Section 7.

Section snippets

A brief discussion of geodesic domes

For a 2D SOM, a hexagonal lattice is preferable to a rectangular one as it is more uniform: every grid unit has the same number of immediate neighbors and the distances between a unit and its immediate neighbors are the same. However, such uniformity cannot be achieved on the sphere except for the five platonic polyhedra — tetrahedron, cube, octahedron, icosahedron and dodecahedron (Pugh, 1976). These polyhedra can be further tessellated into different frequencies — the geodesic domes. The

Opening the icosahedron-based geodesic dome

An icosahedron-based geodesic dome can be opened onto the 2D plane. Fig. 3 illustrates the process. Note that, for clarity, we did not project the vertices onto the surface of the sphere. Thicker lines are the original edges of the icosahedron. The 12 vertices of an icosahedron can be grouped into six pairs. Vertices in each pair are opposite to each other on the sphere and can be considered as two poles, such as vertices A and C. The edges marked with colors indicate where the dome is cut

Projecting the GeoSOM onto 2D plane

Usually, a spherical SOM is visualized as a 3D spherical object (see Fig. 6). The surface of the sphere is deformed to reflect the variation in weight vectors. Since our flat screens cannot show the front and back parts of a spherical object at the same time, interfaces such as rotation by mouse are provided so that the users can view every part of the SOM. However, it is still difficult for the user to maintain an entire picture of the map. We implemented an interface to project the spherical

Experiments

In this section, we conduct two sets of experiments on GeoSOMs using a synthetic data set and a breast cancer data set. The experiments are carried out on a Pentium IV 3.0 GHz workstation with 1 GB of memory. Results are compared with the corresponding 2D hexagonal SOMs generated by SOM_PAK (Kohonen, Hynninen, Kangas, & Laaksonen, 1996). Parameters in training the two type of SOMs are set to be the same:

•
Sizes of the GeoSOM and the 2D SOM are chosen to be as close as possible.
•
Before the

Comparing GeoSOM to other spherical SOMs

From the above experiments, we can see that the GeoSOM reduces up to two thirds of the data distortion. This reduction is achieved because we use geodesic domes to remove the SOM’s border effect. As mentioned earlier, several researchers have also implemented spherical SOMs using icosahedron-based geodesic domes (Boudjemaï et al., 2003, Nakatsuka and Oyabu, 2003, Sangole and Knopf, 2003). It is expected that these SOMs will have the same data distortion as the GeoSOM.

The difference between the

Conclusion and future work

It has been demonstrated that a spherical SOM can effectively remove the border effect of the 2D SOM and reveal more information about the high-dimensional data. We introduced a method to organize the neurons of such an SOM using a 2D rectangular grid structure, which reduces the overhead required to maintain the neuron’s relationships. Experimental results show that the speed of a GeoSOM is comparable to the speed of a 2D SOM generated by SOM_PAK. Because of the grid topology, GeoSOM reduces

References (22)

T. Kohonen
Self-organizing maps: Optimization approaches
K. Lagus et al.
Mining massive document collections by the WEBSOM method
Information Sciences
(2004)
X. Lin et al.
Real-time author co-citation mapping for online searching
Information Processing and Management: an International Journal
(2003)
A. Sangole et al.
Visualization of randomly ordered numeric data sets using spherical self-organizing feature maps
Computers & Graphics
(2003)
Boudjemaï, F., Enberg, P. B., & Postaire, J. -G. (2003). Self organizing spherical map architecture for 3D object...
F. Canters et al.
The world in perpective: A directory of world map projections
(1989)
S.K. Card et al.
Readings in information visualization: Using vision to think
(1999)
R.B. Fuller
Explorations in the geometry of thinking
(1975)
Hirokazu, N., Altaf-Ul, A. M., Ken, K., Kotaro, M., & Shigehiko, K. (2005). Spherical SOM with arbitrary number of...
Hoffmann, G. (2002). Sphere tessellation by icosahedron...

M. Ito et al.

The characteristics of the torus self organizing map

Cited by (64)

Unstructured borderline self-organizing map: Learning highly imbalanced, high-dimensional datasets for fault detection
2022, Expert Systems with Applications
Citation Excerpt :
To address the aforementioned issues, we develop a new model called an unstructured borderline self-organizing map (UB-SOM) that performs undersampling and feature selection considering the distributional information in borderline areas. SOMs have some limitations in addressing undersampling due to the fixed topological neighborhood structure they use to connect nodes (Wu & Takatsuka, 2006). In this paper, we replace the neighborhood structure with a dynamic connection between nodes and introduce a technique to increase the power to represent borderline data.
Fault detection in industrial processes is critical for yield improvement and manufacturing cost reduction. However, most industrial processes produce highly imbalanced and high-dimensional datasets, in which the normal data overwhelm the fault data in number and many noninformative features add noise to the data distribution. Thus, addressing class imbalance and high-dimensionality problems has been considered key to successful fault detection. In this paper, we propose a novel model called an unstructured borderline self-organizing map (UB-SOM) designed to solve these two problems. UB-SOM not only learns the distribution of the normal samples through a small number of representative nodes but also highlights borderline areas. Since UB-SOM yields a new data distribution that emphasizes borderlines, the distributional change from the normal data to the representative nodes reveals which features are considered significant in the borderline areas. We select the significant features based on the featurewise distributional change measured using the Kullback-Leibler divergence. UB-SOM is evaluated based on ten publicly available benchmark imbalanced datasets and two semiconductor process datasets. The experimental results show that we can increase the G-mean by 0.441 for the benchmark datasets and 0.657 for the industrial datasets with data preprocessing throughout UB-SOM. As a result, the proposed method outperforms various undersampling methods incorporating classifier-based feature selection methods.
Decoding the Stratigraphic Heterogeneity of Bengal Basin, India Using Supervised Machine Learning-A Case Study
2024, International Petroleum Technology Conference, IPTC 2024
Latest Trends in the Conversion of Industrial Territories of the Existing Urban Development
2023, AIP Conference Proceedings
The Method of Making the Low-dimensional Map that Preserves the Distance Relationships from Selected Data Point
2023, Proceedings - 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science, IRI 2023
Estimation of confidence margins for Direct Load Recognition (DLR) using supervised and unsupervised machine learning
2023, FORUM 2023 - Vertical Flight Society 79th Annual Forum and Technology Display
Riemannian Quaternion Self-Organizing Map to Overcome Degree-of-Polarization Error in Polarimetric Ground-Penetrating Radar
2023, IEEE Transactions on Geoscience and Remote Sensing

View all citing articles on Scopus

View full text

2006 Special IssueSpherical self-organizing map using efficient indexed geodesic data structure

Abstract

Introduction

Section snippets

A brief discussion of geodesic domes

Opening the icosahedron-based geodesic dome

Projecting the GeoSOM onto 2D plane

Experiments

Comparing GeoSOM to other spherical SOMs

Conclusion and future work

Information Sciences

Information Processing and Management: an International Journal

Computers & Graphics

The world in perpective: A directory of world map projections

Readings in information visualization: Using vision to think

Explorations in the geometry of thinking

The characteristics of the torus self organizing map

2006 Special Issue
Spherical self-organizing map using efficient indexed geodesic data structure