Skip to main content
Log in

A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We describe a scalable parallel implementation of the self organizing map (SOM) suitable for data-mining applications involving clustering or segmentation against large data sets such as those encountered in the analysis of customer spending patterns. The parallel algorithm is based on the batch SOM formulation in which the neural weights are updated at the end of each pass over the training data. The underlying serial algorithm is enhanced to take advantage of the sparseness often encountered in these data sets. Analysis of a realistic test problem shows that the batch SOM algorithm captures key features observed using the conventional on-line algorithm, with comparable convergence rates.

Performance measurements on an SP2 parallel computer are given for two retail data sets and a publicly available set of census data.These results demonstrate essentially linear speedup for the parallel batch SOM algorithm, using both a memory-contained sparse formulation as well as a separate implementation in which the mining data is accessed directly from a parallel file system. We also present visualizations of the census data to illustrate the value of the clustering information obtained via the parallel SOM method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Buhusi, C.V. 1993. Parallel implementation of self-organizing neural networks. In V. Felea and G. Ciobanu (Eds.), Proceedings of 9th Romanian Symposium on Computer Science '93, pp. 51-58.

  • Ceccarelli, M., Petrosino, A., and Vaccaro, R. 1993. Competetive neural networks on message-passing parallel computers. Concurrency: Practice and Experience, 5(6):449-470.

    Google Scholar 

  • Gropp, W., Lusk, E., and Skjellum, A., 1994. Using MPI: Portable Parallel Programming with the Message-Passing Interface, MIT Press.

  • Honkela, T. et al. 1998. WEBSOM—Self-organizing map for internet exploration, Helsinki University of Technology, http://websom.hut.fi.

  • Ienne, P., Thiran, P., and Vassilas, N. 1997. Modified self-organizing feature map algorithms for efficient digital hardware implementation. IEEE Transactions on Neural Networks, 8(2):315-330.

    Google Scholar 

  • Kohonen, T. 1985. The Self-Organizing Map. Proc. IEEE, 73:1551-1558.

    Google Scholar 

  • Kohonen, T. 1988. The neural phonetic typewriter. Computer, 21(3):11-22.

    Google Scholar 

  • Kohonen, T. 1990. Derivation of a class of training algorithms. IEEE Trans. Neural Networks, 1:229-232.

    Google Scholar 

  • Kohonen, T. 1993. Things you haven't heard about the self-organizing map. Proc. IEEE Int. Joint Conf. Neural Networks, San Francisco, 1147-1156.

  • Kohonen, T. 1995. Self-Organizing Maps., Springer.

  • Kohonen, T., Hynninen, J., Kangas, J., and Laaksonen, J. 1995. SOMPAK: The self-organizing map program package, Helsinki University of Technology, http://nucleus.hut.fi/nnrc/som_pak.

  • Kohonen, T., Kaski, S., Lagus, K., and Honkela, T. 1996. Very Large Two-Level SOM for the Browsing of NewsGroups, Proc. Artificial Neural Networks—ICANN 96, Bochum, Germany.

  • Koikkalainen, T. 1994. Progress with the tree-structured self-organizing map. Proc. ECAI'94, 11th European Conference on Artificial Intelligence, Amsterdam, The Netherlands.

  • Lagus, K., Honkela, T., Kaski, S., and Kohonen, T. 1996. Self-organizing maps of document collections: A new approach to interactive exploration. Proc. Second Intl. Conf. on Knowledge Discovery and Data Mining, Portland, pp. 238-243.

  • Lu, S. 1994. Pattern classification using self-organizing feature maps. IJCNN International Joint Conference on Neural Networks, Newport Beach, California.

  • Mann, R. and Haykin, S. 1990. A parallel implementation of Kohonen feature maps on the Warp systolic computer. Proc. Int. Joint Conf. Neural Networks, Washington D.C., vol. II, pp. 84-87.

    Google Scholar 

  • Mulier, F. and Cherkassky, V. 1994. Learning Rate Schedules for Self-Organizing maps. Proc. 12th IAPR International Conference on Pattern Recognition, Jerusalem, Vol. II, Conf. B, pp. 224-228.

    Google Scholar 

  • Mulier, F. and Cherkassky, V. 1995. Self-organization as an iterative kernel smoothing process. Neural Computation, 7:1141-1153.

    Google Scholar 

  • Myklebust, G. and Solheim, J.G. 1995. Parallel self-organizing maps for applications. Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia.

  • Natarajan, R. 1997. Exploratory data analysis in large sparse datasets, IBM Research Report RC 20749, IBM Research, Yorktown Heights, NY.

    Google Scholar 

  • Obermayer, K., Ritter, H., and Schulten, K. 1990. Large-scale simulations of self-organizing neural networks on parallel computers: Applications to biological modelling. Parallel Computing, 14:381-404.

    Google Scholar 

  • Rushmeier, H., Lawrence, R., and Almasi, G. 1997. Visualizing customer segmentations produced by self-organizing maps. Proc. IEEE Visualization 1997, Phoenix, Arizona, pp. 463-466.

  • Simoudis, E. 1996. Reality check for data mining. IEEE Expert: Intelligent Systems and their Applications, 26-33.

  • Wu, C.-H., Hodges, R.E., and Wang, C.J. 1991. Parallelizing the self-organizing feature map on multiprocessor systems. Parallel Computing, 17(6/7):821-832.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lawrence, R., Almasi, G. & Rushmeier, H. A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems. Data Mining and Knowledge Discovery 3, 171–195 (1999). https://doi.org/10.1023/A:1009817804059

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009817804059

Navigation