Abstract
Scientific workflows are becoming more popular in the research community, due to their ease of creation and use, and because of the benefits of repeatability of such workflows. In this paper we investigate the benefits of workflows in a genomics experiment which requires intensive computing as well as parallelization, and show that substantial optimizations in rule redundancy reduction can be achieved by simple workflow parallelization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216. ACM Press, New York (1993)
Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000)
Balaban, M.: The F-logic Approach for Description Languages. Annals of Mathematics and Artificial Intelligence 15, 19–60 (1995)
for Biotechnology Information, N.C.: GeneRIF – Gene Reference Into Functions (2009), http://www.ncbi.nlm.nih.gov/projects/GeneRIF/
Bodon, F.: A fast apriori implementation. In: FIMI 2003, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations (2003)
Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)
Fisher, R.A.: On the interpretation of χ 2 from contingency tables, and the calculation of p. Journal of the Royal Statistical Society 85(1), 87–94 (1922)
Fisher, R.: Statistical methods for research workers, 13th edn. Biological monographs and manuals, vol. 5. Oliver and Boyd (1967)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD Conference, pp. 1–12. ACM, New York (2000)
Huang, D., Sherman, B., Tan, Q., Collins, J., Alvord, W.G., Roayaei, J., Stephens, R., Baseler, M., Lane, H.C., Lempicki, R.: The DAVID gene functional classification tool: A novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biology 8(9), R183+ (2007)
Leung, E., Bushel, P.R.: PAGE: phase-shifted analysis of gene expression. Bioinformatics 22(3), 367–368 (2006)
Lu, Y., Rosenfeld, R., Simon, I., Nau, G.J., Bar-Joseph, Z.: A probabilistic generative model for go enrichment analysis. Nucl. Acids Res., 434+ (2008)
Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez gene: gene-centered information at ncbi. Nucleic Acids Res. 33(Database issue) (2005)
Mao, X., Cai, T., Olyarchuk, J.G.G., Wei, L.: Automated genome annotation and pathway identification using the kegg orthology (ko) as a controlled vocabulary. Bioinformatics 21(19), 3787–3793 (2005)
MyGrid: Taverna workbench 2.0 (2008), http://taverna.sourceforge.net/
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999)
Online, C.D.: Ontology definition in information science (2007), http://www.computer-dictionary-online.org/ontology.htm?q=ontology
Pruitt, K.D., Tatusova, T., Maglott, D.R.: Ncbi reference sequences (refseq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35(Database issue), D61–D65 (2007)
Todorova, C., Stefanov, K.: Selection and use of domain ontologies in learning networks for lifelong competence development. In: Proceedings of the 2006 International Workshop on Learning Networks for Lifelong Competence Development, pp. 11–17. Springer, Heidelberg (2006)
Trajkovski, I., Lavrač, N., Tolar, J.: Segs: Search for enriched gene sets in microarray data. J. Biomed. Inform., 588–601 (2007)
Trajkovski, I., Zelezný, F., Tolar, J., Lavrac, N.: Relational subgroup discovery for descriptive analysis of microarray data. In: Berthold, M.R., Glen, R.C., Fischer, I. (eds.) CompLife 2006. LNCS (LNBI), vol. 4216, pp. 86–96. Springer, Heidelberg (2006)
Vastrik, I., D’Eustachio, P., Schmidt, E., Joshi-Tope, G., Gopinath, G., Croft, D., de Bono, B., Gillespie, M., Jassal, B., Lewis, S., Matthews, L., Wu, G., Birney, E., Stein, L.: Reactome: A knowledgebase of biological pathways and processes. Genome Biology 8, 39+ (2007)
W3C, T.W.W.W.C.: Resource description framework, rdf (2004), http://www.w3.org/RDF/
W3C, T.W.W.W.C.: Web ontology language, OWL (2004), http://www.w3.org/2004/OWL/
Wain, H.M., Lush, M., Ducluzeau, F., Povey, S.: Genew: The human gene nomenclature database. Nucleic Acids Research 30(3), 169–171 (2002)
Zheng, Q., Wang, X.J.J.: GOEAST: A web-based software toolkit for gene ontology enrichment analysis. Nucleic Acids Research, 358–363 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Bruin, J.S., Kok, J.N. (2010). Combining Subgroup Discovery and Permutation Testing to Reduce Reduncancy. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification, and Validation. ISoLA 2010. Lecture Notes in Computer Science, vol 6415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16558-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-16558-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16557-3
Online ISBN: 978-3-642-16558-0
eBook Packages: Computer ScienceComputer Science (R0)