Combining Subgroup Discovery and Permutation Testing to Reduce Reduncancy

de Bruin, Jeroen S.; Kok, Joost N.

doi:10.1007/978-3-642-16558-0_25

Jeroen S. de Bruin^18,19 &
Joost N. Kok¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6415))

Included in the following conference series:

International Symposium On Leveraging Applications of Formal Methods, Verification and Validation

1947 Accesses

Abstract

Scientific workflows are becoming more popular in the research community, due to their ease of creation and use, and because of the benefits of repeatability of such workflows. In this paper we investigate the benefits of workflows in a genomics experiment which requires intensive computing as well as parallelization, and show that substantial optimizations in rule redundancy reduction can be achieved by simple workflow parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216. ACM Press, New York (1993)
Chapter Google Scholar
Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002)
Article Google Scholar
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000)
Article Google Scholar
Balaban, M.: The F-logic Approach for Description Languages. Annals of Mathematics and Artificial Intelligence 15, 19–60 (1995)
Article MathSciNet MATH Google Scholar
for Biotechnology Information, N.C.: GeneRIF – Gene Reference Into Functions (2009), http://www.ncbi.nlm.nih.gov/projects/GeneRIF/
Bodon, F.: A fast apriori implementation. In: FIMI 2003, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations (2003)
Google Scholar
Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)
Article Google Scholar
Fisher, R.A.: On the interpretation of χ ² from contingency tables, and the calculation of p. Journal of the Royal Statistical Society 85(1), 87–94 (1922)
Article Google Scholar
Fisher, R.: Statistical methods for research workers, 13th edn. Biological monographs and manuals, vol. 5. Oliver and Boyd (1967)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD Conference, pp. 1–12. ACM, New York (2000)
Google Scholar
Huang, D., Sherman, B., Tan, Q., Collins, J., Alvord, W.G., Roayaei, J., Stephens, R., Baseler, M., Lane, H.C., Lempicki, R.: The DAVID gene functional classification tool: A novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biology 8(9), R183+ (2007)
Google Scholar
Leung, E., Bushel, P.R.: PAGE: phase-shifted analysis of gene expression. Bioinformatics 22(3), 367–368 (2006)
Article Google Scholar
Lu, Y., Rosenfeld, R., Simon, I., Nau, G.J., Bar-Joseph, Z.: A probabilistic generative model for go enrichment analysis. Nucl. Acids Res., 434+ (2008)
Google Scholar
Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez gene: gene-centered information at ncbi. Nucleic Acids Res. 33(Database issue) (2005)
Google Scholar
Mao, X., Cai, T., Olyarchuk, J.G.G., Wei, L.: Automated genome annotation and pathway identification using the kegg orthology (ko) as a controlled vocabulary. Bioinformatics 21(19), 3787–3793 (2005)
Article Google Scholar
MyGrid: Taverna workbench 2.0 (2008), http://taverna.sourceforge.net/
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999)
Article Google Scholar
Online, C.D.: Ontology definition in information science (2007), http://www.computer-dictionary-online.org/ontology.htm?q=ontology
Pruitt, K.D., Tatusova, T., Maglott, D.R.: Ncbi reference sequences (refseq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35(Database issue), D61–D65 (2007)
Google Scholar
Todorova, C., Stefanov, K.: Selection and use of domain ontologies in learning networks for lifelong competence development. In: Proceedings of the 2006 International Workshop on Learning Networks for Lifelong Competence Development, pp. 11–17. Springer, Heidelberg (2006)
Google Scholar
Trajkovski, I., Lavrač, N., Tolar, J.: Segs: Search for enriched gene sets in microarray data. J. Biomed. Inform., 588–601 (2007)
Google Scholar
Trajkovski, I., Zelezný, F., Tolar, J., Lavrac, N.: Relational subgroup discovery for descriptive analysis of microarray data. In: Berthold, M.R., Glen, R.C., Fischer, I. (eds.) CompLife 2006. LNCS (LNBI), vol. 4216, pp. 86–96. Springer, Heidelberg (2006)
Chapter Google Scholar
Vastrik, I., D’Eustachio, P., Schmidt, E., Joshi-Tope, G., Gopinath, G., Croft, D., de Bono, B., Gillespie, M., Jassal, B., Lewis, S., Matthews, L., Wu, G., Birney, E., Stein, L.: Reactome: A knowledgebase of biological pathways and processes. Genome Biology 8, 39+ (2007)
Google Scholar
W3C, T.W.W.W.C.: Resource description framework, rdf (2004), http://www.w3.org/RDF/
W3C, T.W.W.W.C.: Web ontology language, OWL (2004), http://www.w3.org/2004/OWL/
Wain, H.M., Lush, M., Ducluzeau, F., Povey, S.: Genew: The human gene nomenclature database. Nucleic Acids Research 30(3), 169–171 (2002)
Article Google Scholar
Zheng, Q., Wang, X.J.J.: GOEAST: A web-based software toolkit for gene ontology enrichment analysis. Nucleic Acids Research, 358–363 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

LIACS, Leiden University, Niels Bohrweg 1, 2333, CA, Leiden, The Netherlands
Jeroen S. de Bruin & Joost N. Kok
Department of Parasitology, LUMC, Biomolecular Mass Spectrometry unit, Einthovenweg 20, Postbus 9600, 2300, RC Leiden, The Netherlands
Jeroen S. de Bruin

Authors

Jeroen S. de Bruin
View author publications
You can also search for this author in PubMed Google Scholar
Joost N. Kok
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Informatics, University of Potsdam, August-Bebel-Str. 89, 14482, Potsdam, Germany
Tiziana Margaria
Technical University of Dortmund, Otto-Hahn-Str. 14, 44227, Dortmund, Germany
Bernhard Steffen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Bruin, J.S., Kok, J.N. (2010). Combining Subgroup Discovery and Permutation Testing to Reduce Reduncancy. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification, and Validation. ISoLA 2010. Lecture Notes in Computer Science, vol 6415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16558-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-16558-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16557-3
Online ISBN: 978-3-642-16558-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics