Skip to main content
Log in

Two novel interestingness measures for gene association rule mining

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recent research has shown that association rules are useful in gene expression data analysis. Interestingness measure plays an important role in the association rule mining on small sample size, high dimensionality, and noisy gene expression data. This work introduces two interestingness measures by exploring prior knowledge contained in open biological databases. They are Max-Pathway-Distance (MaxPD), which explores the gene’s relativity in Kyoto encyclopedia of genes and genomes Pathway, and Max-Chromosomal-Distance (MaxCD), which makes use of the distance among genes in the chromosome. The properties of our proposed interestingness measures are also explored to mine the interesting rules efficiently. Experimental results on four real-life gene expression datasets show the effectiveness of MaxPD and MaxCD in both classification accuracy and biological interpretability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Cai R, Hao Z, Wen W, Huang H (2010) Kernel based gene expression pattern discovery and its application on cancer classification. Neurocomputing 73:2562–2570

    Article  Google Scholar 

  2. Cai R, Tung AKH, Zhang Z, Hao Z (2011) What is unequal among the equals? Ranking equivalent rules from gene expression data. In: IEEE transactions on knowledge and data engineering

  3. Callegaro A, Basso D et al (2006) A locally adaptive statistical procedure (lap) to identify differentially expressed chromosomal regions. Bioinformatics 22(21):2658–2666

    Article  Google Scholar 

  4. Caron H et al (2001) The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291:1289–1292

    Article  Google Scholar 

  5. Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: ICDE

  6. Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: ICDE

  7. Cong G, Tan K-L, Tung AKH, Xu X, Pan F, Yang J (2004) Farmer: finding interesting rule groups in microarray datasets. In: SIGMOD

  8. Cong G, Tan K-L, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: SIGMOD

  9. Crawley JJ, Furge KA (2002) Identification of frequent cytogenetic aberrations in hepatocellular carcinoma using gene-expression microarray data. Genome Biol 3(12):1–8

    Article  Google Scholar 

  10. Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5:(6)345

  11. Geest CR, Coffer PJ (2009) MAPK signaling pathways in the regulation of hematopoiesis. J Leukoc Biol 86(2):237–250

    Article  Google Scholar 

  12. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  13. Gordon GJ, Jensen RV, Hsiao LL et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967

    Google Scholar 

  14. http://www.genomesonline.org

  15. http://www.kegg.jp/kegg/pathway.html

  16. http://www.kegg.jp

  17. http://www.kegg.jp/kegg/soap/

  18. http://www.kegg.jp/kegg/xml/

  19. http://www.stjuderesearch.org/hcnetdat/webFront/searchMainPage.php

  20. http://www.kegg.jp/kegg/pathway/hsa/hsa05200.html

  21. http://www.kegg.jp/kegg/pathway/hsa/hsa05221.html

  22. Janssens D, Brijs T, Vanhoof K, Wets G (2006) Evaluating the performance of cost-based discretization versus entropy and error based discretization. Comput Oper Res 33(11):3107–3123

    Article  MATH  Google Scholar 

  23. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470

    Article  Google Scholar 

  24. Seifert M, Strickert M, Schliep A, Grosse I (2011) Exploiting prior knowledge and gene distances in the analysis of tumor expression profiles with extended Hidden Markov Models. Bioinformatics 27(12):1645–1652

    Article  Google Scholar 

  25. Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74

    Article  Google Scholar 

  26. Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  Google Scholar 

  27. Wu S, Gessner R, von Stackelberg A, Kirchner R, Henze G, Seeger K (2005) Cytokine/cytokine receptor gene expression in childhood acute lymphoblastic leukemia. Cancer 103(5):1054–1063

    Article  Google Scholar 

Download references

Acknowledgments

This work was financially supported by Natural Science Foundation of China (61100148), Natural Science Foundation of Guangdong province (S2011040004804), Key Technology Research and Development Programs of Guangdong Province (2010B080701070), Opening Project of the State Key Laboratory for Novel Software Technology (KFKT2011B19), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (LYM11060).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shumin Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Wu, S. & Cai, R. Two novel interestingness measures for gene association rule mining. Neural Comput & Applic 23, 835–841 (2013). https://doi.org/10.1007/s00521-012-1005-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1005-3

Keywords

Navigation