Skip to main content
Log in

A Dynamic Workflow Approach for the Integration of Bioinformatics Services

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Modern biological and chemical studies rely on life science databases as well as sophisticated software tools (e.g., homology search tools, modeling and visualization tools). These tools often have to be combined and integrated in order to support a given study. SIBIOS (System for the Integration of Bioinformatics Services) serves this purpose. The services are both life science database search services and software tools. The task engine is the core component of SIBIOS. It supports the execution of dynamic workflows that incorporate multiple bioinformatics services. The architecture of SIBIOS, the approaches to addressing the heterogeneity as well as interoperability of bioinformatics services, including data integration are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher and S. Mock, Kepler: An extensible system for design and execution of scientific workflows, in 16th Intl. Conference on Scientific and Statistical Database Management (SSDBM) (Santorini Island, Greece, 2004).

  2. S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D.J. Lipman, Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res 25(17) (1997) 3389–3402.

    Article  Google Scholar 

  3. BLOCKS, J.G. Henikoff, E.A. Greene, S. Pietrokovski and S. Henikoff, Increased coverage of protein families with the blocks database servers, Nucl. Acids Res. 28 (2000) 228–230.

    Google Scholar 

  4. Z. Ben Miled, N. Li, G. Kellett, B. Sipes and O. Bukhres, Complex life science multidatabase queries, in: Proceedings of the IEEE, vol. 90, no. 11, (2002).

  5. D. Buttler, M. Coleman, T. Critchlow, R. Fileto, W. Han, C. Pu, D. Rocco and L. Xiong, Querying multiple bioinformatics information sources: Can semantic web research help?, SIGMOD Record 31(4) (2002).

  6. A. Bairoch, The ENZYME database in 2000, Nucleic Acids Res. 28 (2000) 304–305.

    Google Scholar 

  7. T. Berners-Lee, J. Hendler and O. Lassila, The semantic web, Scientific American (2001).

  8. D. Booth, M. Champion, C. Ferris, F. McCabe, E. Newcomer and D. Orchard, Web services architecture, W3C Working Draft (2003).

  9. D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte and M.D. Winer, Simple object access protocol (SOAP) 1.1, W3C Note (2000).

  10. T. Bellwood et al., UDDI Spec, Technical Committee Specification, (2002).

  11. S. Brin and L. Page, The anatomy of a large scale hypertextual web search engine, 7th WWW Conference, (1998).

  12. E. Christensen, F. Curbera, G. Meredith and S. Weerawarana, Web services description language (WSDL) 1.1, W3C Note (2001).

  13. S.B. Davidson, O.P. Buneman, J. Crabtree, V. Tannen, G.C. Overton and L. Wong, BioKleisli: Integrating biomedical data and analysis packages, in: Bioinformatics: Databases and Systems, S. Letovsky (ed.), Kluwer Academic Publishers, Norwell, MA pp. 201–211 (1999).

    Google Scholar 

  14. DoubleTwist, Inc., http://www.doubletwist.com

  15. eMOTIF, J.Y. Huang and D.L. brutlag, The EMOTIF database, Nucleic Acid Res., 21(1) (2000) 202–204.

    Google Scholar 

  16. T. Etzold, A. Ulyanov and P. Argos, SRS: Information retrieval system for molecular biology data banks, Methods Enzymol 266 (1996) 114–128.

    Google Scholar 

  17. Entigen Corporation (eBioinformatics, Inc., and Empatheon, Inc.), http://www.entigen.com/

  18. Entrez, Entrez's 3D-structure database, Nucl. Acids. Res. 31 (2003) 474–477.

    Google Scholar 

  19. GenBank, GenBank, Nucl. Acids. Res. 31 (2003) 23–27.

    Google Scholar 

  20. Genome resources and searches, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome

  21. INCOGEN, Inc., VIBE: Visual integrated bioinformatics, white paper, http://www.incogen.com

  22. Java Web Start, http://java.sun.com/products/javawebstart/

  23. JavaScript, http://wp.netscape.com/eng/mozilla/3.0/handbook/java-script/

  24. K. Kochut and J. Arnold, et al., IntelliGEN: A distributed workflow system for discovering protein-protein interactions, Distributed and Parallel Databases 13 (2003) 43–72.

    Article  Google Scholar 

  25. M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori, The KEGG resource for deciphering the genome, Nucl. Acids. Res. 32 (2004) D277–D280.

    Article  Google Scholar 

  26. L. Moreau, S. Miles, C. Goble, M. Greenwood, V. Dialani, M. Addis, N. Alpdemir, R. Cawley, D. De Roure, J. Ferris, R. Gaizauskas, K. Glover, C. Greenhalgh, M. Greenwood, P. Li, X. Liu, P. Lord, M. Luck, D. Marvin, T. Oinn, N. Paton, S. Pettifer, M.V Radenkovic, A. Roberts, A. Robinson, T. Rodden, M. Senger, N. Sharman, R. Stevens, B. Warboys, A. Wipat and C. Wroe, On the use of agents in a bioinformatics grid, in: Proceedings of the Third IEEE/ACM CCGRID'2003 Workshop on Agent Based Cluster and Grid Computing, Sangsan Lee, Satoshi Sekguchi, Satoshi Matsuoka, and Mitsuhisa Sato (eds.), Tokyo, Japan, (2003) pp. 653–661.

  27. L. Moreau, S. Miles, C. Goble, M. Greenwood, V. Dialani, M. Addis, N. Alpdemir, R. Cawley, D. De Roure, J. Ferris, R. Gaizauskas, K. Glover, C. Greenhalgh, M. Greenwood, P. Li, X. Liu, P. Lord, M. Luck, D. Marvin, T. Oinn, N. Paton, S. Pettifer, M. V Radenkovic, A. Roberts, A. Robinson, T. Rodden, M. Senger, N. Sharman, R. Stevens, B. Warboys, A. Wipat and C. Wroe, On the Use of Agents in a bioInformatics grid, in: Proceedings of the Third IEEE/ACM CCGRID'2003 Workshop on Agent Based Cluster and Grid Computing, Sangsan Lee, Satoshi Sekguchi, Satoshi Matsuoka, and Mitsuhisa Sato (eds.), Tokyo, Japan, (2003) pp. 653–661.

  28. OWL, A non-redundant composite protein sequence database, Nucl. Acids. Res. 22 (1994) 3574–3577.

  29. Protein Sequence Analysis, a practical guide. http://www.bioinf.man.ac.uk/dbbrowser/bioactivity/

  30. PIR, The protein information resource (PIR), Nucl. Acids. Res. 28 (2000) 41–44.

  31. PROSITE, The PROSITE database, Nucl. Acids. Res. 30 (2002) 235–238.

  32. Profiles, http://hits.isb-sib.ch/cgi-bin/PFSCAN

  33. Pfam, The Pfam protein families database, Nucl. Acids. Res. 32 (2004) D138–D141.

    Google Scholar 

  34. W.R. Pearson and D.J. Lipman, improved tools for biological sequence comparison, PNAS 85 (1988) 2444–2448, W.R. Pearson, Rapid and sensitive sequence comparison with FASTP and FASTA, Methods in Enzymology 183 (1990) 63–98.

  35. P. Rice, I. Longden and A. Bleasby, EMBOSS: The European molecular biology open software suite, Trends in Genetics, 16(6) (2000) 276–277.

    Article  Google Scholar 

  36. D. Rocco and T. Critchlow, Discovery and Classification of Bioinformatics Web Services, Lawrence Livermore National Laboratory Technical Report. UCRL-JC-149963 (2002).

  37. R. Stevens, P. Baker, S. Bechhofer, G. Ng, A. Jacoby, N.W. Paton, C.A. Goble and A. Brass, TAMBIS: Transparent access to multiple bioinformatics information sources, Bioinformatics 16(2) (2000) 184–186.

    Article  Google Scholar 

  38. A. Siepel, A. Tolopko, A. Farmer, P. Steadman, F. Schilkey, B.D. Perry and W. Beavis, An integration platform for heterogeneous bioinformatics software components, IBM Systems Journal 40(2) 570–591.

  39. Swiss-Prot, The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucl. Acids. Res. 31 (2003) 365–370.

    Google Scholar 

  40. S. Schulze-Kremer, Ontologies for molecular biology, Third Pacific Symposium on Biocomputing (1998) 695–706.

  41. Transeq, EMBOSS tool for translating DNA/RNA into protein. http://www.ebi.ac.uk/emboss/transeq/

  42. TurboWorxTM, http://www.turboworx.com

  43. The workflow portal, The Workflow Handbook 2004, Published in association with the Workflow Management Coalition (WfMC), Layna Fischer (ed.).

  44. Ubertool, http://www.science-factory.com/products.html

  45. M.D. Wilkinson and M. Links, BioMOBY: An open-source biological web services proposal, Briefings in Bioinformatics 3(4) (2002) 331–341.

    Article  Google Scholar 

  46. GCG® Wisconsin PackageTM, http://www.accelrys.com/products/seqweb

  47. C. Wroe, R. Stevens, C. Goble, A. Boberts and M. Greenwood, A suite of DAML + OIL ontologies to describe bioinformatics web services and data, International Journal of Cooperative Information Systems 12(2) (2003).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malika Mahoui.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahoui, M., Lu, L., Gao, N. et al. A Dynamic Workflow Approach for the Integration of Bioinformatics Services. Cluster Comput 8, 279–291 (2005). https://doi.org/10.1007/s10586-005-4095-1

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-005-4095-1

Keywords

Navigation