ABSTRACT
Gene sequence modeling and clustering is one of the most important problems in bioinformatics. Hidden Markov Models (HMMs) have been widely used to find similarity between sequences with large and various lengths. In this paper a novel gene sequence clustering method based on HMMs optimized by Particle Swarm Optimization (PSO) algorithm is introduced. In this approach, each gene sequence is described by a specific HMM, and then its probability to generate individual sequence is evaluated for each model. A hierarchical clustering algorithm based on a new definition of a distance measure, has been applied to find the best clusters. Experiments carried out on lung cancer related genes dataset show that the proposed approach can be successfully utilized for gene clustering.
- Al-Hajj R, Mokbel C, Likforman-Sulem L (2007) Combination of HMM-Based Classifiers for the Recognition of Arabic Handwritten Words. In 9th International Conference on Document Analysis and Recognition: 959--963. Google ScholarDigital Library
- Angeline P.J (1998) Evolutionary Optimization versus Particle Swarm Optimization: Philosophy and Performance Differences. In Evolutionary Programming VII 1447: 601--610. Google ScholarDigital Library
- Banu P.K, Andrews S (2015) Gene Clustering Using Metaheuristic Optimization Algorithms. 14 International Journal of Applied Metaheuristic Computing, 6(4): 14--38. Google ScholarDigital Library
- Bicego M, Murino V, Figueiredo M.A.T (2004) Similarity-based classification of sequences using hidden Markov models. Pattern Recognition Society 37: 2281 -- 2291. Google ScholarDigital Library
- Durbin R, Eddy S.R, Krogh A, Mitchison G (1998) Biological Sequence Analysis. Cambridge University Press.Google Scholar
- Ferles C, Stafylopatis A (2008) Sequence Clustering with the Self-Organizing Hidden Markov Model Map. In 8th IEEE International Conference on Bioinformatics and Bioengineering: 1--7.Google Scholar
- Goldberg D.E and Holland J.H (1988) Genetic Algorithms and Machine Learning. Machine Learning 3: 95--99. Google ScholarDigital Library
- Kennedy J and Eberhart R.C (1995) Particle Swarm Optimization. In Processing of the IEEE International Conference on Neural Networks 4: 1942--1948.Google ScholarCross Ref
- Krogh A, Brown M, Mian I.S, Sjolander K, Haussler D (1993) Hidden Markov Models in Computational Biology: Application to Protein Modeling. UCSC-CRL-93-32. Google ScholarDigital Library
- Lee K.F (1990) Context-Dependent Phonetic Hidden Markov Models for Speaker-Independent Continuous Speech Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38: 599--609.Google ScholarCross Ref
- Li C, Biswas G (2000) A Bayesian Approach to Temporal Data Clustering using Hidden Markov Models. In Proceedings of the 17th International Conference on Machine Learning: 543--550. Google ScholarDigital Library
- Mesa A, Basterrech S, Guerberoff G, Alveraz-Valin F (2015) Hidden Markov Models for Gene Sequence Classification. Pattern Analysis and Application: 1--13. Google ScholarDigital Library
- Panuccio A, Bicego M, Murino V (2002) A Hidden Markov Model-Based Approach to Sequential Data Clustering. Structural Syntactic and Statistical Pattern Recognition 2396: 734--743. Google ScholarDigital Library
- Rabiner L.R (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proceedings of IEEE 77: 257--286.Google ScholarCross Ref
- Rabiner L.R, Lee C.H, Juang B.H, Wilpon J.G (1989) HMM Clustering for Connected Word Recognition. In Proceedings of IEEE ICASSP: 405--408.Google ScholarCross Ref
- Theodoridis S and Koutroumbas K (1999) Pattern Recognition. Academic Press. Google ScholarDigital Library
- Vignes M, Forbes F (2009) Gene Clustering via Integrated Markov Models Combining Individual and Pairwise Features. In IEEE/ACM Transactions on Computational Biology and Bioinformatics 6: 260--270. Google ScholarDigital Library
- Xue L, Yin J, Ji Z, Jiang L (2006) A Particle Swarm Optimization for Hidden Markov Model Training. In Proceedings of 8th International Conference on Signal Processing.Google Scholar
- Zhang Z.Y, Li T, Ding C, Ren X.W, Zhang X.S (2010) Binary matrix factorization for analyzing gene expression data. Data Mining And Knowledge Discovery 20: 28--52. Google ScholarDigital Library
- http://healthfinder.gov/orgs/HR3150.htm, visited Nov.2011.Google Scholar
Recommendations
Gene clustering with hidden Markov model optimized by PSO algorithm
Gene clustering is one of the most important problems in bioinformatics. In the sequential data clustering, hidden Markov models (HMMs) have been widely used to find similarity between sequences, due to their capability of handling sequence patterns ...
Hidden Markov models for gene sequence classification
The article presents an application of hidden Markov models (HMMs) for pattern recognition on genome sequences. We apply HMM for identifying genes encoding the variant surface glycoprotein (VSG) in the genomes of Trypanosoma brucei (T. brucei) and other ...
PSO Optimized Hidden Markov Model Performance Analysis for IEEE 802.16/WiMAX Standard
AbstractThe discrete channel hidden Markov models indicate the temporal statistical characteristics of the generated error sequences by fading channels with memory precisely. In this paper first the optimal order of the hidden Markov models is estimated ...
Comments