Abstract
Essential genes are indispensable for an organism’s living. These genes are widely discussed, and many researchers proposed prediction methods that not only find essential genes but also assist pathogens discovery and drug development. However, few studies utilized the relationship between gene functions and essential genes for essential gene prediction. In this paper, we explore the topic of essential gene prediction by adopting the association rule mining technique with Gene Ontology semantic analysis. First, we proposed two features named GOARC (Gene Ontology Association Rule Confidence) and GOCBA (Gene Ontology Classification Based on Association), which are used to enhance the classifier constructed with the features commonly used in previous studies. Secondly, we use an association-based classification algorithm without rule pruning for predicting essential genes. Through experimental evaluations and semantic analysis, our methods can not only enhance the accuracy of essential gene prediction but also facilitate the understanding of the essential genes’ semantics in gene functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinformatics 10, 290 (2009)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., United States, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Demšar, J., Zupan, B., Leban, G., et al.: Orange: From experimental machine learning to interactive data mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 537–539. Springer, Heidelberg (2004)
Fleuret, F.: Fast Binary Feature Selection with Conditional Mutual Information. J. Mach. Learn. Res. 5, 1531–1555 (2004)
Giaever, G., Chu, A.M., Ni, L., et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391 (2002)
Gustafson, A.M., Snitkin, E.S., Parker, S.C., et al.: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics 7, 265 (2006)
Hall, M., Frank, E., Holmes, G., et al.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Harris, M.A., Clark, J., Ireland, A., et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258-D261 (2004)
Hwang, Y.C., Lin, C.C., Chang, J.Y., et al.: Predicting essential genes based on network and sequence analysis. Mol. Biosyst. 5, 1672–1678 (2009)
Kittler, J., Hatef, M., Duin, R.P.W., et al.: On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 226–239 (1998)
Liang, H., Li, W.H.: Gene essentiality, gene duplicability and protein connectivity in human and mouse. Trends Genet. 23, 375–378 (2007)
Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York City, New York, USA, pp. 80–86 (1998)
Pei, J., Han, J., Mao, R.: CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30 (2000)
Seringhaus, M., Paccanaro, A., Borneman, A., et al.: Predicting essential genes in fungal genomes. Genome Res. 16, 1126–1135 (2006)
Wang, J., Han, J., Pei, J.: CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, D.C., pp. 236–245 (2003)
Winzeler, E.A., Shoemaker, D.D., Astromoff, A., et al.: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999)
The IlliMine Project, http://illimine.cs.uiuc.edu
Saccharomyces Genome Database, http://downloads.yeastgenome.org/
LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, YC., Chiu, PI., Huang, HC., Tseng, V.S. (2011). Prediction of Essential Genes by Mining Gene Ontology Semantics. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-21260-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21259-8
Online ISBN: 978-3-642-21260-4
eBook Packages: Computer ScienceComputer Science (R0)