Skip to main content
Log in

Efficient mining of multilevel gene association rules from microarray and gene ontology

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Some recent studies have shown that association rules can reveal the interactions between genes that might not have been revealed using traditional analysis methods like clustering. However, the existing studies consider only the association rules among individual genes. In this paper, we propose a new data mining method named MAGO for discovering the multilevel gene association rules from the gene microarray data and the concept hierarchy of Gene Ontology (GO). The proposed method can efficiently find out the relations between GO terms by analyzing the gene expressions with the hierarchy of GO. For example, with the biological process in GO, some rules like Process A (up) → Process B (up) cab be discovered, which indicates that the genes involved in Process B of GO are likely to be up-regulated whenever those involved in Process A are up-regulated. Moreover, we also propose a constrained mining method named CMAGO for discovering the multilevel gene expression rules with user-specified constraints. Through empirical evaluation, the proposed methods are shown to have excellent performance in discovering the hidden multilevel gene association rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Ableson, A., & Glasgow, J. I. (2003). Efficient Statistical Pruning of Association Rules. In: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 22–26, Cavtat-Dubrovnik, Croatia, 23–34.

  • Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, May, Washington, D. C., 207–216.

  • Agrawal, R., & Srikant, R. (1994). Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 487–499.

  • Berrar, D., Dubitzky, W., Granzow, M., & Ells, R. (2001). Analysis of Gene Expression and Drug Activity Data by Knowledge-based Association Mining. In: Proceedings of Critical Assessment of Techniques for Microarray Data Analysis, Duke University, NC, USA, 23–28.

  • Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sug-net, C. W., Furey, T. S., et al. (2000). Know-ledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences, USA, 97(1), 262–267.

    Article  Google Scholar 

  • Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J. M., & Pascual-Montano, A. (2006). Integrated analysis of gene expression by association rules discovery. BMC Bioinformatics, 7(54), 1–16.

    Google Scholar 

  • Chen, R., Jiang, Q., Yuan, H., & Gruenwald, L. (2001). Mining Association Rules in Analysis of Transcription Factors Essential to Gene Expressions. In: Proceedings of The Atlantic Symposium on Computational Biology and genome Information Systems & Technology, Durham, NC, USA.

  • Chuang, J. H., Huang, Y. H., Yu, H. H., & Tseng, V. S. (2006). Liver hepcidin and stainable iron expression in biliary atresia. Pediatric Research, 59(5), 662–666.

    Article  Google Scholar 

  • Creighton, C., & Hanash, S. (2003). mining gene expression databases for association rules. Bioinformatics, 19, 79–86.

    Article  Google Scholar 

  • Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.

    Article  Google Scholar 

  • Gruźdź, A., Ihnatowicz, A., Śl, , & zak, D. (2006). Interactive gene clustering—a case study of breast cancer microarray data. Information Systems Frontiers, 8(1), 21–27.

    Article  Google Scholar 

  • Han, J., & Fu, Y. (1995). Discovery of Multiple-Level Association Rules from Large Databases. In: Proceedings of the 21st International Conference on Very Large Data Bases, 420–431.

  • Huang, Z., Li, J., Su, H., Watts, G. S., & Chen, H. (2007). Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Systems, 43(4), 1207–1225.

    Article  Google Scholar 

  • Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., et al. (2000). Functional Discovery via a compendium of expression profiles. Cell, 102, 109–126.

    Article  Google Scholar 

  • Hvidsten, T. R., Lægreid, A., & Komorowski, J. (2003). Learning rule-based models of biological process from gene expression time profiles using Gene Ontology. Bioinformatics, 19, 1116–1123.

    Article  Google Scholar 

  • Icev, A., Ruiz, C., & Ryder, E. F. (2003). Distance-Enhanced Association Rules for Gene Expression. In: Proceedings of the 3rd ACM SIGKDD Workshop on Data Mining in Bioinformatics, 34–40.

  • Johnson, S. C. (1967). Hierarchical Clustering Schemes. Psychometrika, 2, 241–254.

    Article  Google Scholar 

  • Kotala, P., Zhou, P., Mudivarthy, S., Perrizo, W., & Deckard, E. (2001). Gene Expression Profiling of DNA Microarray Data using Peano Count Trees (P-trees). In Online Proceedings of the First Virtual Conference on Genomics and Bioinformatics, 15–16.

  • Kotlyar, M., & Jurisica, I. (2006). Predicting protein–protein interactions by association mining. Information Systems Frontiers, 8(1), 37–47.

    Article  Google Scholar 

  • Lee, C. F., Changchien, S. W., Wang, W. T., & Shen, J. J. (2006). A data mining approach to database compression. Information Systems Frontiers, 8(3), 147–161.

    Article  Google Scholar 

  • Li, J., & Wong, L. (2002). Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18, 725–734.

    Article  Google Scholar 

  • MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1, 281–297.

  • Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Lecture Notes in Computer Science, 1540, 398–416.

    Article  Google Scholar 

  • Pe’er, D., Regev, A., Elidan, G., & Friedman, N. (2001). Inferring subnetworks from perturbed expression profiles. Bioinformatics, 17, 215–224.

    Google Scholar 

  • Tamayo, P., et al. (1996). Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. In: Proceedings of the National Academy of Sciences, USA, 96, 2907–2912.

    Article  Google Scholar 

  • The Gene Ontology (GO) Consortium (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25, 25–29.

    Article  Google Scholar 

  • The Gene Ontology (GO) Consortium (2001). Creating the Gene Ontology resource: design and implementation. Genome Research, 11, 1425–1433.

    Article  Google Scholar 

  • Toivonen, H., Klemettinen, M., Ronkainen, P., Hätönen, K., & Mannila, H. (1995). Pruning and Grouping Discovered Association Rules. In: Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, 47–52.

  • Tseng, V. S., & Kao, C.-P. (2005). Efficiently mining gene expression data via a novel parameterless clustering method. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(4), 355–365.

    Article  Google Scholar 

  • Tseng, V. S., & Kao, C.-P. (2007). A novel similarity-based fuzzy clustering algorithm by integrating PCM and Mountain Method. In: IEEE Transactions on Fuzzy Systems, 15(6), 1188–1196.

    Article  Google Scholar 

  • Tuzhilin, A., & Adomavicius, G. (2002). Handling Very Large Numbers of Association Rules in the Analysis of Microarray Data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 296–304.

  • Umebayashi, K., & Nakano, A. (2003). Ergosterol is required for targeting of tryptophan permease to the yeast plasma membrane. Journal of Cell Biology, 11, 1117–1131.

    Article  Google Scholar 

  • Wang, L., Zhu, J., & Zou, H. (2008). Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics, 24, 412–419.

    Article  Google Scholar 

Download references

Acknowledgement

This research was supported by the Landmark Project of National Cheng Kung University, Taiwan and National Science Council, Taiwan, R.O.C. under grant no. NSC 96-2221-E-006-143-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent S. Tseng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tseng, V.S., Yu, HH. & Yang, SC. Efficient mining of multilevel gene association rules from microarray and gene ontology. Inf Syst Front 11, 433–447 (2009). https://doi.org/10.1007/s10796-009-9156-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-009-9156-1

Keywords

Navigation