Cluster identification and separation in the growing self-organizing map: application in protein sequence classification

Ahmad, Norashikin; Alahakoon, Damminda; Chau, Rowena

doi:10.1007/s00521-009-0300-0

Cluster identification and separation in the growing self-organizing map: application in protein sequence classification

Original Article
Published: 04 September 2009

Volume 19, pages 531–542, (2010)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Norashikin Ahmad¹,
Damminda Alahakoon¹ &
Rowena Chau¹

365 Accesses
Explore all metrics

Abstract

Growing self-organizing map (GSOM) has been introduced as an improvement to the self-organizing map (SOM) algorithm in clustering and knowledge discovery. Unlike the traditional SOM, GSOM has a dynamic structure which allows nodes to grow reflecting the knowledge discovered from the input data as learning progresses. The spread factor parameter (SF) in GSOM can be utilized to control the spread of the map, thus giving an analyst a flexibility to examine the clusters at different granularities. Although GSOM has been applied in various areas and has been proven effective in knowledge discovery tasks, no comprehensive study has been done on the effect of the spread factor parameter value to the cluster formation and separation. Therefore, the aim of this paper is to investigate the effect of the spread factor value towards cluster separation in the GSOM. We used simple k-means algorithm as a method to identify clusters in the GSOM. By using Davies–Bouldin index, clusters formed by different values of spread factor are obtained and the resulting clusters are analyzed. In this work, we show that clusters can be more separated when the spread factor value is increased. Hierarchical clusters can then be constructed by mapping the GSOM clusters at different spread factor values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Short Review on Different Clustering Techniques and Their Applications

Effective Data Clustering Algorithms

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

Article 10 October 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Kohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480
Article Google Scholar
Fritzke B (1994) Growing cell structures: a self-organizing network for unsupervised and supervised learning. Neural Netw 7:1441–1460
Article Google Scholar
Blackmore J, Miikkulainen R (1993) Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map. In: IEEE international conference on neural networks, pp 450–455
Alahakoon LD (2000) Data mining with structure adapting neural networks. In: School of computer science and software engineering. Monash University, pp xvii, 286 leaves
Alahakoon D, Halgamuge SK, Srinivasan B (2000) Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw 11:601–614
Article Google Scholar
Alahakoon LD (2004) Controlling the spread of dynamic self-organising maps. Neural Comput Appl 13:168–174
Google Scholar
Amarasiri R, Alahakoon D,Smith KA (2004) HDGSOM: a modified growing self-organizing map for high dimensional data clustering. In: Fourth international conference on hybrid intelligent systems, 2004 (HIS ‘04), pp 216–221
Zheng X, Liu W, He P, Dai W (2004) Document clustering algorithm based on tree-structured growing self-organizing feature map advances in neural networks—ISNN 2004, pp 840–845
Hsu AL, Tang S-L, Halgamuge SK (2003) An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19:2131–2140
Article Google Scholar
Chan C-KK, Hsu AL, Tang S-L, Halgamuge SK (2008) Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J Biomed Biotechnol 2008:10
Google Scholar
Wang H, Azuaje F, Black N (2004) An integrative and interactive framework for improving biomedical pattern discovery and visualization. IEEE Trans Inf Technol Biomed 8:16–27
Article Google Scholar
Zheng H, Wang H, Azuaje F (2008) Improving pattern discovery and visualization of SAGE data through poisson-based self-adaptive neural networks. IEEE Trans Inf Technol Biomed 12:459–469
Article Google Scholar
Wang H, Zheng H, Hu J (2008) Poisson approach to clustering analysis of regulatory sequences. Int J Comput Biol Drug Design 1:141–157
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
Article Google Scholar
Amarasiri R, Wickramasinge K,Alahakoon D (2003) Enhanced cluster visualization using the data skeleton model. In: 3rd international conference on intelligent systems design and application (ISDA), Oklahoma, USA
Hsu A, Alahakoon D, Halgamuge SK, Srinivasan B (2000) Automatic clustering and rule extraction using a dynamic SOM tree. In: Proceedings of the 6th international conference on automation, robotics, control and vision, Singapore
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Google Scholar
Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11:586–600
Article Google Scholar
Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl Acids Res 31:365–370
Article Google Scholar
Ferran EA, Pflugfelder B, Ferrara P (1994) Self-organized neural maps of human protein sequences. Protein Sci 3:507–521
Article Google Scholar
Wang H-C, Dopazo J, De La Fraga LG, Zhu Y-P, Carazo JM (1998) Self-organizing tree-growing network for the classification of protein sequences. Protein Sci 7:2613–2622
Article Google Scholar
Wu CH, McLarty JW (2000) Neural networks and genome informatics. Elsevier, Oxford, Amsterdam
Google Scholar
Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ (2006) PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl Acids Res 34:W32–37
Article Google Scholar
Andrade MA, Casari G, Sander C, Valencia A (1997) Classification of protein families and detection of the determinant residues with an improved self-organizing map. Biol Cybern 76:441–450
Article MATH Google Scholar
Ferran EA, Ferrara P (1991) Topological maps of protein sequences. Biol Cybern 65:451–458
Article MATH Google Scholar
Wu CH, Ermongkonchai A, Chang T-C (1991) Protein classification using a neural network database system. In: Proceedings of the conference on analysis of neural network applications. ACM, Fairfax, Virginia, United States
Wu C, Whitson G, McLarty J, Ermongkonchai A, Chang TC (1992) Protein classification artificial neural system. Protein Sci 1:667–677
Article Google Scholar
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678
Article Google Scholar

Download references

Author information

Authors and Affiliations

Clayton School of Information Technology, Monash University, Clayton, VIC, 3800, Australia
Norashikin Ahmad, Damminda Alahakoon & Rowena Chau

Authors

Norashikin Ahmad
View author publications
You can also search for this author inPubMed Google Scholar
Damminda Alahakoon
View author publications
You can also search for this author inPubMed Google Scholar
Rowena Chau
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Norashikin Ahmad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmad, N., Alahakoon, D. & Chau, R. Cluster identification and separation in the growing self-organizing map: application in protein sequence classification. Neural Comput & Applic 19, 531–542 (2010). https://doi.org/10.1007/s00521-009-0300-0

Download citation

Received: 09 January 2009
Accepted: 19 August 2009
Published: 04 September 2009
Issue Date: June 2010
DOI: https://doi.org/10.1007/s00521-009-0300-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster identification and separation in the growing self-organizing map: application in protein sequence classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Short Review on Different Clustering Techniques and Their Applications

Effective Data Clustering Algorithms

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now