Abstract
An active learning algorithm is devised for training Self-Organizing Feature Maps on large data sets. Active learning algorithms recognize that not all exemplars are created equal. Thus, the concepts of exemplar age and difficulty are used to filter the original data set such that training epochs are only conducted over a small subset of the original data set. The ensuing Hierarchical Dynamic Subset Selection algorithm introduces definitions for exemplar difficulty suitable to an unsupervised learning context and therefore appropriate Self-organizing map (SOM) stopping criteria. The algorithm is benchmarked on several real world data sets with training set exemplar counts in the region of 30–500 thousand. Cluster accuracy is demonstrated to be at least as good as that from the original SOM algorithm while requiring a fraction of the computational overhead.
Similar content being viewed by others
References
Blake C.L., Merz C. J. UCI Repository of machine learning databases (1998).
Curry R., Heywood M.I. (2004). Towards Efficient Training on Large Datasets for Genetic Programming. Proceedings of the 17th Conference of the Canadian Society for Computational Studies of Intelligence. Lecture Notes in Artificial Intelligence . Vol. 3060. Springer-Verlag, Berlin. 161–174.
A. Eskin A. Arnold M. Prerau L. Portnoy S. Stolfo (2002) A Geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. In Applications of Data Mining in Computer Science. Chapter 4 Kluwer Academic Boston, MA
Gathercole C., Ross P. (1994). Dynamic Training subset selection for supervised learning in genetic programming, parallel problem solving from Nature III. Lecture Notes in Computer Science. Vol. 866. Springer-Verlag, Berlin. 312–321.
U. Halici G. Ongun (1996) ArticleTitleFingerprint classification through self-organizing feature maps modified to treat uncertainties Proceedings of the IEEE 84 IssueID10 1497–1512 Occurrence Handle10.1109/5.537114
S. Haykin (1999) Neural Networks – A Comprehensive Foundation (2nd Edition) Prentice-Hall New Jersey
Kayacik H.G. (2003). Hierarchical Self Organizing Map Based IDS on KDD Benchmark Master Thesis. Dalhousie University, Faculty of Computer Science.
T. Kohonen (2001) Self-Organizing Maps EditionNumber3 Springer-Verlag Berlin
T. Kohonen S. Kaski K. Lagus J. Salojarvi J. Hokela V. Paatero A. Saarela (2000) ArticleTitleSelf Organization of a massive document collection IEEE Transactions on Neural Networks 11 IssueID3 574–585 Occurrence Handle10.1109/72.846729
Lee W., Stolfo S., Mok K. (1999). A data mining framework for building intrusion detection models. Proceedings of the 1999 IEEE Symposium on Security and Privacy. 120–132.
I. Levin (2000) ArticleTitleKDD-99 Classifier Learning context: llsoft’s results overview ACM SIGKDD Explorations 1 IssueID2 67–75
N. Lightowler C.T. Spracklen A.R. Allen (1997) ArticleTitleAn introduction to modular map systems IEE Colloquium on Neural and Fuzzy Systems: Design. Hardware and Applications 97 IssueID133 3/1–3/4
B. Pfahringer (2000) ArticleTitleWinning the KDD99 Classification cup: bagged boosting ACM SIGKDD Explorations 1 IssueID2 65–66
Song D., Heywood M.I., Zincir-Heywood A.N. (2003). A Linear genetic programming approach to intrusion detection. Proceedings of the Genetic and Evolutionary Computation Conference. Lecture Notes in Computer Science. Vol. 2724 Springer-Verlag, Berlin, 2325–2336.
M.-C. Su H.-T. Chang (2000) ArticleTitleFast Self-Organizing Feature Map Algorithm IEEE Transactions on Neural Networks 11 IssueID3 721–733 Occurrence Handle10.1109/72.846743
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wetmore, L., Heywood, M.I. & Zincir-Heywood, A.N. Speeding up the Self-Organizing Feature Map Using Dynamic Subset Selection. Neural Process Lett 22, 17–32 (2005). https://doi.org/10.1007/s11063-004-7775-6
Issue Date:
DOI: https://doi.org/10.1007/s11063-004-7775-6