Abstract
The biological databases are flooded with genomic and proteomic data which can be analyzed to generate the information and knowledge which can be useful for understanding molecular mechanisms involved in disease and health state of a living being. The tuberculosis is an infectious disease and is pandemic, causing large number of deaths every year. In this paper an attempt has been made to develop a model for mining amino acid association patterns in peptide sequences of MTBC. The peptide sequences of species of MTBC are taken from the NCBI. The variation in the length of these sequences leads to variation in degree of relationship among amino acids present in each sequence. The fuzzy set is employed to model this uncertainty of degree of relationships among the amino acids of the peptide sequences of MTBC. The crisp and fuzzy amino acid association rules have been generated from the peptide sequences of MTBC and on comparison it is observed that fuzzy set approach is able to address the issue of under prediction and over prediction of amino acid association patterns due to uncertainty in degree of relationship among the amino acid. The amino acid association patterns have been used to predict secondary structure and physiochemical properties as an illustration. Thus the patterns generated can be useful in understanding the molecular mechanisms involved in MTBC by predicting physiochemical properties, structures and protein–protein interactions etc.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In proc of the 20th Int’l Conference on Very Large Databases. pp 407–419
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases”. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 22(2): pp 207–216.
Aminian M, Shabbeer A, Bennett KP (2009) Determination of major lineages of Mycobacterium tuberculosis Complex using mycobacterial interspersed repetitive units. IEEE International Conference on Bioinformatics and Biomedicine
Artamonova I, Frishman G, Gelfand SM, Frishman D (2005) Mining sequence annotation databanks for association patterns. Bioinformatics 21(suppl.3):iii49–iii57. doi:10.1093/bioinformatics/bti1206
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. Proc of 1997 ACM-SIGMOD International Conference on Management of Data. Tucson, Arizona, pp 255–264
Brosch R, Gordon SV, Marmiesse M, Brodin P, Buchrieser C, Eiglmeier K, Garnier T, Gutierrez C, Hewinson G, Kremer K, Parsons LM, Pym AS, Samper S, van Soolingen D, Cole ST (2002) A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl Acad Sci USA 99:3684–3689
Cole ST (2002) Comparative and functional genomics of the Mycobacterium tuberculosis complex. Microbiology 148(10):2919–2928
Gupta G, Mangal N et al (2006) Mining quantitative association rules in protein sequences. Lect Notes Comput Sci 3755:273–281
Khare N, Adlakha N, Pardasani KR (2009) Karnaugh map model for mining association rules in large databases. IJCNS Int J Comp Netw Secur 1(1):16–21
Khare N, Adlakha N, Pardasani KR (2010) An algorithm for mining multidimensional association rules using boolean matrix. IEEE Proceedings on Recent Trends in information, Telecommunication and CComputing (ITC): pp 95–99
Kumari T, Pardasani KR (2012) Mining Fuzzy associations among amino acids of class A GPCRs. Online J Bioinform 13(2):202–213
Lopez FJ, Blanco A, Garcia F, Cano C, Marin A (2008) FUZZY association rules for biological data analysis: a case study on yeast. BMC Bioinform 9:107
NCBI (National Center for Biotechnology Information), http://www.ncbi.nlm.nih.gov/
Omiecinski ER (2003) Alternative interest measures for mining associations in databases. IEEE Transact Knowl Data Eng 15(1): pp 57–69
Panday A, Pardasani KR (2009a) Rough set model for discovering multidimensional association rules. IJCSNS Int J Comp Sci Netw Secur 9(6):159–164
Panday A, Pardasani KR (2009b) PPCI algorithm for mining temporal association rules in large database. J Inform Knowl Manag 8(04):345–352
Patel R, Swami DK, Pardasani KR (2006) Lattice based algorithm for incremental mining of association rules. Int J Theor Appl Comp Sci 1(1):119–128
Shabbeer A, Cowan LS, Ozcaglar C, Rastogi N, Vandenberg SL, Yener B, Bennett KP (2012) TB-Lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex. Infect Genet Evol 12(4):789–797
Shankar A, Pardasani KR (2013) Mining Fuzzy amino acid association patterns in various orders of class Apphaproteobacteria. J Med Imag Health Inform
Tan PN, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns, ACM SIGKDD
Thakur RS, Jain RC, Pardasani KR (2007) Fast algorithm for mining multilevel association rules. J Comput Sci 2(1):76–81
WHO Report 2013 Global tuberculosis report
Wu X, Zhu X et al (2013) PMBC: pattern mining from biological sequences with wildcard constraints. Comput Biol Med 43(2013):481–492
Zadeh LA (1965) 1965. Fuzzy sets, Information and Control 8(3):338–353
Acknowledgments
The authors are highly grateful to the Department of Biotechnology, New Delhi and MPCST Bhopal for Providing Bioinformatics Infrastructure facility at MANIT, Bhopal for carrying out this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jain, A., Pardasani, K.R. Mining fuzzy amino acid associations in peptide sequences of mycobacterium tuberculosis complex (MTBC). Netw Model Anal Health Inform Bioinforma 4, 3 (2015). https://doi.org/10.1007/s13721-015-0075-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-015-0075-4