Abstract
Studies of biological evolution have generally focused on nucleotide or amino acid sequences of certain genes related to specific enzymes. Most phylogenetic tree constructions have been carried out using amino acid sequences and are used as a predictor to show evolutionary relationships. Phylogenetic analysis is usually performed based on multiple sequence alignment of a gene from different organisms including fungi. A number of programs have been introduced for gene clustering and phylogenetic analysis. For example, the most popular web-based program is Clustal Omega which is commonly used by biologists. When the number of uploaded sequences increases, this program not only works slowly but also the final constructed cladogram is confusing and incorrect from evolutionary point of view. In the present study, we used fungal hexosaminidases which are extracellular enzymes with a lot of applications in biotechnology but extremely varied and confusing in evolutionary terms. A standard taxonomy-based phylogenetic tree was constructed for 835 FH amino acid sequences retrieved from National Center for Biotechnology Information (NCBI) on March 16, 2015. Then a supervised multilayer perceptron (MLP) neural network was used to discriminate FH sequences. Based on relative frequency of amino acid in FH sequences, 41 neural networks were designed for seven levels from the phylum to family. Minimum accuracy of the neural network was equal to 99% at all seven discrimination levels. As a final step, an additional evaluation was performed on the designed model with 143 new released FH sequences extracted on July 1, 2015. The clustering results have shown a proper match with fungal taxonomy to show evolutionary relationships.
Similar content being viewed by others
References
Bakhtiarizadeh MR, Moradi-Shahrbabak M, Ebrahimi M, Ebrahimie E (2014) Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J Theor Biol 356:213–222
Cai CZ, Han LY, Ji ZL, Chen YZ (2004) Enzyme family classification by support vector machines. Proteins 55:66–76
Gnanavel M, Mehrotra P, Rakshambikai R, Martin J, Srinivasan N, Bhaskara RM (2014) CLAP: a web-server for automatic classification of proteins with special reference to multi-domain proteins. BMC Bioinform 15:343
Gutteridge A, Thornton JM, Bartlett G (2003) Using a neural network and spatial clustering to predict the location of active sites in enzymes. Biochemistry 37:11940–11948
Hamid R, Khan MA, Ahmad M, Ahmad MM, Abdin MZ, Musarrat J, Javed S (2013) Chitinases: an update. J Pharm BioAllied Sci 5:21–29
Kelil A, Wang S, Brzezinski R, Fleury A (2007) CLUSS: clustering of protein sequences based on a new similarity measusre. BMC Bioinform 8:286–305
Kulik N, Slámová K, Ettrich R, Křen V (2015) Computational study of β-N-acetylhexosaminidase from Talaromyces flavus, a glycosidase with high substrate flexibility. BMC Bioinform 16:28
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948
Li W, Cowley A, Uludag M, Gur T, McWilliam H, Squizzato S, Park YM, Buso N, Lopez R (2015) The EMBL-EBI bioinformatics web and programmatic tools framework. Nucl Acids Res 43:W1
Mamarabadi M, Tokhmechi B (2012) Signal processing approaches as novel tools for the clustering of N-acetyl-β-d-glucosaminidases. Iran J Biotechnol 10(3):175–183
Pashaiasl M, Khodadadi K, Kayvanjoo AH, Pashaeiasl R, Ebrahimie E, Ebrahimi M (2016) Unravelling evolution of Nanog, the key transcription factor involved in self-renewal of undifferentiated embryonic stem cells, by pattern recognition in nucleotide and tandem repeats characteristics. Gene 578:194–204
Rohani A, Abbaspour Fard MH, Abdolahpour S (2011) Prediction of tractor repair and maintenance costs using artificial neural network. Expert Syst Appl 38(7):8999–9007
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011) Fast scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539
Slámová K, Bojarová P, Petrásková L, Křen V (2010) β-N-Acetylhexosaminidase: what’s in a name…? Biotechnol Adv 28:682–693
Sorimachi K, Okayasu T (2013) Phylogenetic tree construction based on amino acid composition and nucleotide content of complete vertebrate mitochondrial genomes. IOSR J Pharm 3(6):51–60
Tahrokh E, Ebrahimi M, Ebrahimie E, Ebrahimi M, Zamansani F, RahpeymaSarvestani N, Mohammadi-Dehcheshmeh M (2011) Comparative study of ammonium transporters in different organisms by simultaneous study of a large number of protein features using data mining algorithms. Genes Genom 33:561–571
Verbanck M, Le S, Pages J (2013) A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data. BMC Bioinform 14:42
Zhang YP, Sheng YJ, Zheng W, He PA, Ruan JS (2015) Novel numerical characterization of protein sequences based on individual amino acid and its application. BioMed Res Int 2015:1–8
Acknowledgements
Financial support by the vice president for research and technology, Ferdowsi University of Mashhad, is gratefully acknowledged. We thank Mr. Meisam Nazari for editing the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Availability and Implementation: Our suggested software and other related information are freely available on the web at the following link: https://www.dropbox.com/s/q2irc46g0wsj43k/soft%20ware.zip?dl=0
Rights and permissions
About this article
Cite this article
Mamarabadi, M., Rohani, A. Clustering of fungal hexosaminidase enzymes based on free alignment method using MLP neural network. Neural Comput & Applic 30, 2819–2829 (2018). https://doi.org/10.1007/s00521-017-2876-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-2876-0