Skip to main content

Adapting Self-Organizing Map Algorithm to Sparse Data

  • Conference paper
  • First Online:
Computational Intelligence (IJCCI 2017)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 829))

Included in the following conference series:

Abstract

Machine learning techniques applied to data-mining face the challenge of time and memory requirements, and for this purpose should make full profit of the increase in power that recent multi-core processors bring. When applied to sparse data, it is also sometimes necessary to find an appropriate reformulation of the algorithms, keeping in mind that memory load was and still is an issue. In [1], we presented a mathematical reformulation of the standard and the batch versions of the Self-Organizing Map algorithm for sparse data, proposed a parallel implementation of the batch version, and carried out initial performance evaluation tests. We here reproduce and extend our experiments on a more powerful hardware architecture and compare the results to our previous ones. A thorough quantitative and qualitative analysis confirms our preceding results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Using the squared distance here is equivalent to using the euclidean distance, and avoids the square root computation.

  2. 2.

    Complexity of Eq. (2) does not depend on vector size and it is only \(\mathcal {O}(T M)\).

  3. 3.

    We used version 1.7.4. Later versions of Somoclu have been improved by us and use more efficient sparse computation (see: https://github.com/peterwittek/somoclu/commit/d5ffcf250db77aa103a9de96968ef0e27dc14d15).

References

  1. Melka, J., Mariage, J.: Efficient implementation of self-organizing map for sparse input data. In: Proceedings of the 9th International Joint Conference on Computational Intelligence, IJCCI 2017, pp. 54–63, Funchal, Madeira, Portugal (2017)

    Google Scholar 

  2. Ultsch, A.: Data mining and knowledge discovery with emergent self-organizing feature maps for multivariate time series. Kohonen Maps 46, 33–46 (1999)

    Article  Google Scholar 

  3. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  4. Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: Newsgroup exploration with WEBSOM method and browsing interface. Technical Report A32, Helsinki University of Technology (1996)

    Google Scholar 

  5. Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM-self-organizing maps of document collections. Neurocomputing 21, 101–117 (1998)

    Article  MATH  Google Scholar 

  6. Ultsch, A., Mörchen, F.: ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report 46, Department of Mathematics and Computer Science, University of Marburg, Germany (2005)

    Google Scholar 

  7. Polzlbauer, G., Dittenbach, M., Rauber, A.: A visualization technique for self-organizing maps with vector fields to obtain the cluster structure at desired levels of detail. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, 2005, IJCNN’05, vol. 3, pp. 1558–1563. IEEE (2005)

    Google Scholar 

  8. Vesanto, J., Ahola, J.: Hunting for correlations in data using the self-organizing map. In: Proceeding of the International ICSC Congress on Computational Intelligence Methods and Applications (CIMA’99), pp. 279–285. ICSC Academic Press (1999)

    Google Scholar 

  9. Carpenter, G.A., Grossberg, S.: Art 2: self-organization of stable category recognition codes for analog input patterns. Appl. Opt. 26, 4919–4930 (1987)

    Article  Google Scholar 

  10. He, J., Tan, A.-H., Tan, C.-L.: Modified ART 2A growing network capable of generating a fixed number of nodes. IEEE Trans. Neural Netw. 15, 728–737 (2004)

    Google Scholar 

  11. Carpenter, G.A., Grossberg, S., Rosen, D.B.: ART 2-A: an adaptive resonance algorithm for rapid category learning and recognition. Neural Netw. 4, 493–504 (1991)

    Article  Google Scholar 

  12. Wittek, P., Gao, S.C., Lim, I.S., Zhao, L.: Somoclu: an efficient parallel library for self-organizing maps. J. Stat. Softw. 78, 1–21 (2017)

    Article  Google Scholar 

  13. Liao, G., Chen, P., Du, L., Su, L., Liu, Z., Tang, Z., Shi, T.: Using SOM neural network for X-ray inspection of missing-bump defects in three-dimensional integration. Microelectron. Reliab. 55, 2826–2832 (2015)

    Article  Google Scholar 

  14. Kohonen, T.: Self-Organizing Maps. 2nd edn. Springer Series in Information Sciences, vol. 30. Springer, Berlin (1997)

    Google Scholar 

  15. Kohonen, T.: Things you haven’t heard about the self-organizing map. In: 1993 IEEE International Conference on Neural Networks, pp. 1147–1156 (1993)

    Google Scholar 

  16. Mulier, F., Cherkassky, V.: Self-organization as an iterative Kernel smoothing process. Neural Comput. 7, 1165–1177 (1995)

    Article  Google Scholar 

  17. Cheng, Y.: Convergence and ordering of Kohonen’s batch map. Neural Comput. 9, 1667–1676 (1997)

    Article  Google Scholar 

  18. Ienne, P., Thiran, P., Vassilas, N.: Modified self-organizing feature map algorithms for efficient digital hardware implementation. IEEE Trans. Neural Netw. 8, 315–330 (1997)

    Article  Google Scholar 

  19. Lawrence, R.D., Almasi, G.S., Rushmeier, H.E.: A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Data Min. Knowl. Discov. 3, 171–195 (1999)

    Article  Google Scholar 

  20. Maiorana, F.: Performance improvements of a Kohonen self organizing classification algorithm on sparse data sets. In: Proceedings of the 10th WSEAS International Conference on Mathematical Methods, Computational Techniques and Intelligent Systems, MAMECTIS’08, pp. 347–352. World Scientific and Engineering Academy and Society (WSEAS) (2008)

    Google Scholar 

  21. Natarajan, R.: Exploratory data analysis in large, sparse datasets. Technical Report, IBM Thomas J. Watson Research Division (1997)

    Google Scholar 

  22. Roussinov, D.G., Chen, H.: A scalable self-organizing map algorithm for textual classification: a neural network approach to thesaurus generation. Commun. Cogn. Artif. Intell. J. (1998)

    Google Scholar 

  23. Kohonen, T.: Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013)

    Article  Google Scholar 

  24. Olteanu, M., Villa-Vialaneix, N.: Sparse online self-organizing maps for large relational data. In: Advances in Self-Organizing Maps and Learning Vector Quantization (Proceedings of WSOM 2016). Advances in Intelligent Systems and Computing, vol. 428, pp. 27–37. Springer, Houston, Texas, USA (2016)

    Google Scholar 

  25. Wu, C.H., Hodges, R.E., Wang, C.J.: Parallelizing the self-organizing feature map on multiprocessor systems. Parallel Comput. 17, 821–832 (1991)

    Article  MATH  Google Scholar 

  26. Seiffert, U., Michaelis, B.: Multi-dimensional self-organizing maps on massively parallel hardware. In: Advances in Self-Organising Maps, pp. 160–166. Springer, Berlin (2001)

    Google Scholar 

  27. Guan, H., Li, C.K., Cheung, T.Y., Yu, S.: Parallel design and implementation of SOM neural computing model in PVM environment of a distributed system. In: Proceedings of the Advances in Parallel and Distributed Computing, pp. 26–31. IEEE (1997)

    Google Scholar 

  28. Bandeira, N., Lobo, V., Moura-Pires, F.: Training a Self-Organizing Map distributed on a PVM network. In: 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence, vol. 1, pp. 457–461 (1998)

    Google Scholar 

  29. Tomsich, P., Rauber, A., Merkl, D.: Optimizing the parSOM neural network implementation for data mining with distributed memory systems and cluster computing. In: Proceedings 11th International Workshop on Database and Expert Systems Applications, pp. 661–665. IEEE (2000)

    Google Scholar 

  30. Labonté, G., Quintin, M.: Network parallel computing for SOM neural networks. In: High Performance Computing Systems and Applications, pp. 575–586. Springer, Berlin (2002)

    Google Scholar 

  31. Hämäläinen, T.D.: Parallel implementations of self-organizing maps. In: Seiffert, U., Jain, L.C. (eds.) Self-Organizing Neural Networks, pp. 245–278. Springer, New York (2002)

    Chapter  Google Scholar 

  32. Campbell, A., Berglund, E., Streit, A.: Graphics hardware implementation of the parameter-less self-organising map. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 343–350. Springer, Berlin (2005)

    Google Scholar 

  33. Moraes, F.C., Botelho, S.C., Duarte Filho, N., Gaya, J.F.O.: Parallel high dimensional self organizing maps using CUDA. In: Robotics Symposium and Latin American Robotics Symposium (SBR-LARS), pp. 302–306. IEEE, Brazilian (2012)

    Google Scholar 

  34. Richardson, T., Winer, E.: Extending parallelization of the self-organizing map by combining data and network partitioned methods. Adv. Eng. Softw. 88, 1–7 (2015)

    Article  Google Scholar 

  35. Daneshpajouh, H., Delisle, P., Boisson, J.C., Krajecki, M., Zakaria, N.: Parallel batch self-organizing map on graphics processing unit using CUDA. In: Latin American High Performance Computing Conference, pp. 87–100. Springer, Berlin (2017)

    Google Scholar 

  36. Wittek, P., Darányi, S.: Accelerating text mining workloads in a MapReduce-based distributed GPU environment. J. Parallel Distrib. Comput. 73, 198–206 (2013)

    Article  Google Scholar 

  37. Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Trans. Neural Netw. 11, 574–585 (2000)

    Article  Google Scholar 

  38. Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Inf. Sci. 163, 135–156 (2004)

    Article  Google Scholar 

  39. Takatsuka, M., Bui, M.: Parallel batch training of the self-organizing map using openCL. In: Neural Information Processing: Models and Applications, pp. 470–476. Springer, Berlin (2010)

    Google Scholar 

  40. Nordström, T.: Designing parallel computers for self organizing maps. In: Proceedings of the 4th Swedish Workshop on Computer System Architecture (DSA-92), pp. 13–15 (1992)

    Google Scholar 

  41. Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 46–55 (1998)

    Article  Google Scholar 

  42. Yang, M.H., Ahuja, N.: A data partition method for parallel self-organizing map. In: International Joint Conference on Neural Networks, IJCNN’99, vol. 3, pp. 1929–1933. IEEE (1999)

    Google Scholar 

  43. Silva, B., Marques, N.: A hybrid parallel SOM algorithm for large maps in data-mining. New Trends in Artificial Intelligence (2007)

    Google Scholar 

  44. Chang, C.C., Lin, C.J.: LIBSVM data: classification (Multi Class). https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html (2006)

  45. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious urls: an application of large-scale online learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 681–688. ACM (2009)

    Google Scholar 

  46. Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G., Koedinger, K.: Bridge to Algebra 2008–2009, Challenge data set from KDD Cup 2010 Educational Data Mining Challenge (2010)

    Google Scholar 

  47. Juan, Y., Zhuang, Y., Chin, W.S., Lin, C.J.: Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 43–50. ACM (2016)

    Google Scholar 

  48. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  49. Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339 (1995)

    Google Scholar 

  50. McCallum, A., Nigam, K.: A Comparison of event models for Naive Bayes text classification. In: AAAI/ICML-98 Workshop on Learning for Text Categorization. Technical Report WS-98-05, pp. 41–48 (1998)

    Google Scholar 

  51. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  52. Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16, 550–554 (1994)

    Article  Google Scholar 

  53. Wang, J.Y.: Application of support vector machines in bioinformatics. Ph.D. Thesis, National Taiwan University (2002)

    Google Scholar 

  54. Noordewier, M.O., Towell, G.G., Shavlik, J.W.: Training knowledge-based neural networks to recognize genes in DNA sequences. Adv. Neural Inf. Process. Syst. 3, 530–536 (1991)

    Google Scholar 

  55. King, R.D., Feng, C., Sutherland, A.: StatLog: comparison of classi cation algorithms on large real-world problems. Appl. Artif. Intell. Int. J. 9, 289–333 (1995)

    Article  Google Scholar 

  56. Frey, P.W., Slate, D.J.: Letter recognition using Holland-style adaptive classifiers. Mach. Learn. 6, 161–182 (1991)

    Article  Google Scholar 

  57. Fort, J.C., Letremy, P., Cottrell, M.: Advantages and drawbacks of the Batch Kohonen algorithm. ESANN 2, 223–230 (2002)

    Google Scholar 

  58. Nöcker, M., Mörchen, F., Ultsch, A.: An algorithm for fast and reliable ESOM learning. In: ESANN, 14th European Symposium on Artificial Neural Networks, pp. 131–136 (2006)

    Google Scholar 

  59. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J.: SOM\(\_\)PAK: The self-organizing map program package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science (1996)

    Google Scholar 

  60. Kiviluoto, K.: Topology preservation in self-organizing maps. In: IEEE International Conference on Neural Networks, vol. 1, pp. 294–299 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josué Melka .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Perf Analysis of Serial Runs

Note: sparse-bsom-v2 is a variation of the Sparse-BSom algorithm with outer loop on data and inner loop on codebook in BMU search.

figure e

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Melka, J., Mariage, JJ. (2019). Adapting Self-Organizing Map Algorithm to Sparse Data. In: Sabourin, C., Merelo, J.J., Madani, K., Warwick, K. (eds) Computational Intelligence. IJCCI 2017. Studies in Computational Intelligence, vol 829. Springer, Cham. https://doi.org/10.1007/978-3-030-16469-0_8

Download citation

Publish with us

Policies and ethics