Abstract
Recent research has shown that association rules are useful in gene expression data analysis. Interestingness measure plays an important role in the association rule mining on small sample size, high dimensionality, and noisy gene expression data. This work introduces two interestingness measures by exploring prior knowledge contained in open biological databases. They are Max-Pathway-Distance (MaxPD), which explores the gene’s relativity in Kyoto encyclopedia of genes and genomes Pathway, and Max-Chromosomal-Distance (MaxCD), which makes use of the distance among genes in the chromosome. The properties of our proposed interestingness measures are also explored to mine the interesting rules efficiently. Experimental results on four real-life gene expression datasets show the effectiveness of MaxPD and MaxCD in both classification accuracy and biological interpretability.
Similar content being viewed by others
References
Cai R, Hao Z, Wen W, Huang H (2010) Kernel based gene expression pattern discovery and its application on cancer classification. Neurocomputing 73:2562–2570
Cai R, Tung AKH, Zhang Z, Hao Z (2011) What is unequal among the equals? Ranking equivalent rules from gene expression data. In: IEEE transactions on knowledge and data engineering
Callegaro A, Basso D et al (2006) A locally adaptive statistical procedure (lap) to identify differentially expressed chromosomal regions. Bioinformatics 22(21):2658–2666
Caron H et al (2001) The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291:1289–1292
Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: ICDE
Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: ICDE
Cong G, Tan K-L, Tung AKH, Xu X, Pan F, Yang J (2004) Farmer: finding interesting rule groups in microarray datasets. In: SIGMOD
Cong G, Tan K-L, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: SIGMOD
Crawley JJ, Furge KA (2002) Identification of frequent cytogenetic aberrations in hepatocellular carcinoma using gene-expression microarray data. Genome Biol 3(12):1–8
Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5:(6)345
Geest CR, Coffer PJ (2009) MAPK signaling pathways in the regulation of hematopoiesis. J Leukoc Biol 86(2):237–250
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Gordon GJ, Jensen RV, Hsiao LL et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967
http://www.stjuderesearch.org/hcnetdat/webFront/searchMainPage.php
Janssens D, Brijs T, Vanhoof K, Wets G (2006) Evaluating the performance of cost-based discretization versus entropy and error based discretization. Comput Oper Res 33(11):3107–3123
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470
Seifert M, Strickert M, Schliep A, Grosse I (2011) Exploiting prior knowledge and gene distances in the analysis of tumor expression profiles with extended Hidden Markov Models. Bioinformatics 27(12):1645–1652
Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Wu S, Gessner R, von Stackelberg A, Kirchner R, Henze G, Seeger K (2005) Cytokine/cytokine receptor gene expression in childhood acute lymphoblastic leukemia. Cancer 103(5):1054–1063
Acknowledgments
This work was financially supported by Natural Science Foundation of China (61100148), Natural Science Foundation of Guangdong province (S2011040004804), Key Technology Research and Development Programs of Guangdong Province (2010B080701070), Opening Project of the State Key Laboratory for Novel Software Technology (KFKT2011B19), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (LYM11060).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, M., Wu, S. & Cai, R. Two novel interestingness measures for gene association rule mining. Neural Comput & Applic 23, 835–841 (2013). https://doi.org/10.1007/s00521-012-1005-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-1005-3