Skip to main content

Combining Subgroup Discovery and Permutation Testing to Reduce Reduncancy

  • Conference paper
Leveraging Applications of Formal Methods, Verification, and Validation (ISoLA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6415))

  • 1947 Accesses

Abstract

Scientific workflows are becoming more popular in the research community, due to their ease of creation and use, and because of the benefits of repeatability of such workflows. In this paper we investigate the benefits of workflows in a genomics experiment which requires intensive computing as well as parallelization, and show that substantial optimizations in rule redundancy reduction can be achieved by simple workflow parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216. ACM Press, New York (1993)

    Chapter  Google Scholar 

  2. Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002)

    Article  Google Scholar 

  3. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000)

    Article  Google Scholar 

  4. Balaban, M.: The F-logic Approach for Description Languages. Annals of Mathematics and Artificial Intelligence 15, 19–60 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  5. for Biotechnology Information, N.C.: GeneRIF – Gene Reference Into Functions (2009), http://www.ncbi.nlm.nih.gov/projects/GeneRIF/

  6. Bodon, F.: A fast apriori implementation. In: FIMI 2003, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations (2003)

    Google Scholar 

  7. Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)

    Article  Google Scholar 

  8. Fisher, R.A.: On the interpretation of χ 2 from contingency tables, and the calculation of p. Journal of the Royal Statistical Society 85(1), 87–94 (1922)

    Article  Google Scholar 

  9. Fisher, R.: Statistical methods for research workers, 13th edn. Biological monographs and manuals, vol. 5. Oliver and Boyd (1967)

    Google Scholar 

  10. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  11. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD Conference, pp. 1–12. ACM, New York (2000)

    Google Scholar 

  12. Huang, D., Sherman, B., Tan, Q., Collins, J., Alvord, W.G., Roayaei, J., Stephens, R., Baseler, M., Lane, H.C., Lempicki, R.: The DAVID gene functional classification tool: A novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biology 8(9), R183+ (2007)

    Google Scholar 

  13. Leung, E., Bushel, P.R.: PAGE: phase-shifted analysis of gene expression. Bioinformatics 22(3), 367–368 (2006)

    Article  Google Scholar 

  14. Lu, Y., Rosenfeld, R., Simon, I., Nau, G.J., Bar-Joseph, Z.: A probabilistic generative model for go enrichment analysis. Nucl. Acids Res., 434+ (2008)

    Google Scholar 

  15. Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez gene: gene-centered information at ncbi. Nucleic Acids Res. 33(Database issue) (2005)

    Google Scholar 

  16. Mao, X., Cai, T., Olyarchuk, J.G.G., Wei, L.: Automated genome annotation and pathway identification using the kegg orthology (ko) as a controlled vocabulary. Bioinformatics 21(19), 3787–3793 (2005)

    Article  Google Scholar 

  17. MyGrid: Taverna workbench 2.0 (2008), http://taverna.sourceforge.net/

  18. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999)

    Article  Google Scholar 

  19. Online, C.D.: Ontology definition in information science (2007), http://www.computer-dictionary-online.org/ontology.htm?q=ontology

  20. Pruitt, K.D., Tatusova, T., Maglott, D.R.: Ncbi reference sequences (refseq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35(Database issue), D61–D65 (2007)

    Google Scholar 

  21. Todorova, C., Stefanov, K.: Selection and use of domain ontologies in learning networks for lifelong competence development. In: Proceedings of the 2006 International Workshop on Learning Networks for Lifelong Competence Development, pp. 11–17. Springer, Heidelberg (2006)

    Google Scholar 

  22. Trajkovski, I., Lavrač, N., Tolar, J.: Segs: Search for enriched gene sets in microarray data. J. Biomed. Inform., 588–601 (2007)

    Google Scholar 

  23. Trajkovski, I., Zelezný, F., Tolar, J., Lavrac, N.: Relational subgroup discovery for descriptive analysis of microarray data. In: Berthold, M.R., Glen, R.C., Fischer, I. (eds.) CompLife 2006. LNCS (LNBI), vol. 4216, pp. 86–96. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Vastrik, I., D’Eustachio, P., Schmidt, E., Joshi-Tope, G., Gopinath, G., Croft, D., de Bono, B., Gillespie, M., Jassal, B., Lewis, S., Matthews, L., Wu, G., Birney, E., Stein, L.: Reactome: A knowledgebase of biological pathways and processes. Genome Biology 8, 39+ (2007)

    Google Scholar 

  25. W3C, T.W.W.W.C.: Resource description framework, rdf (2004), http://www.w3.org/RDF/

  26. W3C, T.W.W.W.C.: Web ontology language, OWL (2004), http://www.w3.org/2004/OWL/

  27. Wain, H.M., Lush, M., Ducluzeau, F., Povey, S.: Genew: The human gene nomenclature database. Nucleic Acids Research 30(3), 169–171 (2002)

    Article  Google Scholar 

  28. Zheng, Q., Wang, X.J.J.: GOEAST: A web-based software toolkit for gene ontology enrichment analysis. Nucleic Acids Research, 358–363 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

de Bruin, J.S., Kok, J.N. (2010). Combining Subgroup Discovery and Permutation Testing to Reduce Reduncancy. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification, and Validation. ISoLA 2010. Lecture Notes in Computer Science, vol 6415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16558-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16558-0_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16557-3

  • Online ISBN: 978-3-642-16558-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics