Skip to main content
Log in

IntelliGEN: A Distributed Workflow System for Discovering Protein-Protein Interactions

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

A large genomics project involves a significant number of researchers and technicians performing dozens of tasks, either manual (e.g. performing laboratory experiments), computer assisted (e.g. looking for genes in the GENBANK database), or sometimes performed entirely automatically by the computer (e.g. sequence assembly). It has become apparent that managing such projects poses overwhelming problems and may lead to results of lower or even unacceptable quality, or possibly drastically increased project costs. In this paper, we present a design and an initial implementation of a distributed workflow system created to schedule and support activities in a genomics laboratory. The focus of the activities in the laboratory is the discovery of protein-protein interactions of fungi, specifically Neurospora crassa. We present our approach of developing, adapting and applying workflow technology in the genomics lab and illustrate it using one distinct part of a larger workflow to discover protein-protein interactions. Novel features of our system include the ability to monitor the quality and timeliness of the results and if necessary, suggesting and incorporating changes to the selected tasks and their scheduling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. W. Aalst and T. Basten, “Inheritance of workflows: An approach to tackling problems related to change,” Computing Science Reports 99/06, Eindhoven University of Technology, Eindhoven, 1999.

    Google Scholar 

  2. W. Aalst and K. Hee, Workflow Management: Models, Methods, and Systems, MIT Press: Cambridge, MA, 2002.

    Google Scholar 

  3. W. Aalst and S. Jablonski, “Dealing withworkflowchange: Identification of issues and solutions,” International Journal of Computer Systems, Science, and Engineering, vol. 15, no. 5, pp. 267–276, 2000.

    Google Scholar 

  4. S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, “Gapped BLAST and PST-BLAST: A new generation of protein database search programs,” Nucleic Acis Research, vol. 25, pp. 3389–3402, 1997.

  5. M. Ansari, L. Ness, M. Rusinkiewicz, and A. Sheth, “Using flexible transactions to support multisystem telecommunication applications,” in Proceedings of the 18th Intl. Conference on Very Large Data-bases, Aug. 1992, pp. 65–76.

  6. J. Arnold, Editorial. Fungal Genetics and Biology, vol. 21, pp. 254–257, 1997.

    Google Scholar 

  7. J. Arnold and M.T. Cushion, “Constructing a physical map of the Pneumocystis genome,” J. Euk. Microbiol., vol. 44, p. 8S, 1997.

    Google Scholar 

  8. B. Arpinar, J. Miller, and A. Sheth, “An efficient data extraction and storage utility for XML documents,” 39th ACM Southeast Conference, Athens, GA, March 2001, pp. 293–295.

  9. G.W. Beadle and E.L. Tatum, “Genetic control of biochemical reactions in Neurospora,” in Proceedings of the National Academy of Sciences, USA, vol. 27, pp. 499–506, 1941.

    Google Scholar 

  10. J.W. Bennett and J. Arnold, “Genomics of fungi. The Mycota VIII,” in Biology of the Fungal Cell, Howard and Gow (Eds.), Springer-Verlag: NY, 2001, pp. 267–297.

    Google Scholar 

  11. P.M. Berry and K.L. Myers, “Adaptive process management: An AI perspective,” in ACM Conference on Computer Supported Cooperative Work, Seattle, Washington, 1998.

  12. U.S. Bhalla and R. Iyengar, “Emergent properties of networks of biological signaling pathways,” Science, vol. 283, pp. 381–387, 1999.

    Google Scholar 

  13. S.M. Bhandarkar and J. Arnold, “Parallel simulated annealing on the hypercube for chromosome reconstruction, invited paper,” in Proc 14th IMACS World Congress on Computational and Applied Mathematics, Atlanta, GA, vol. 3, pp. 1109–1112, 1994.

    Google Scholar 

  14. S.M. Bhandarkar, S. Chirravuri, S. Machaka, and J. Arnold, “Parallel computing for chromosome reconstruction via ordering of DNA sequences,” Parallel Computing, vol. 24, pp. 1177–1204, 1998.

    Google Scholar 

  15. S.M. Bhandarkar, S.A. Machaka, S.S. Shete, and R.N. Kota, “Parallel computation of a maximum likelihood estimator of a physical map,” Genetics, vol. 157, pp. 1021–1043, 2001.

    Google Scholar 

  16. A.J. Bonner, A. Shrufi, and S. Rozen, “LabFlow-1: A Database benchmark for high-throughput workflow management,” in Proceedings, Fifth International Conference on Extending Database Technology (EDBT), Avignon, France, March 1996, pp. 463–478. Springer-Verlag, Lecture Notes in Computer Science, vol. 1057.

    Google Scholar 

  17. J. Cardoso, J. Miller, and A. Sheth, “Workflowquality of service: Its specification and computation,” Technical Report, LSDIS Lab, Computer Science, University of Georgia, April 2002.

  18. Y. Chen, “Design and implementation of dynamic process definition modifications in OrbWork enactment system,” Masters Thesis, UGA, 2000.

  19. A. Cichocki and M. Rusinkiewicz, “Migrating workflows,” Advances in Workflow Management Systems and Interoperability, Istanbul, Turkey, 1997.

  20. A.J. Cuticchia, J. Arnold, H. Brody, and W.E. Timberlake, “CMAP: Contig mapping and analysis package: A relational database for chromosome reconstruction,” CABIOS, vol. 8, pp. 467–474, 1992.

    Google Scholar 

  21. R.H. Davis, Neurospora Contributions of a Model Organism, Oxford University Press, New York, 2000.

    Google Scholar 

  22. J.L. DeRisi, V.R. Iyer, and P.O. Brown, “Exploring the metabolic and genetic control of gene expression on a genomic scale,” Science, vol. 278, pp. 680–686, 1997.

    Google Scholar 

  23. L. Dogac, A. Kalinechenko, T. Ozsu, and A Sheth (Eds.), “Workflow management systems and interoperability,” NATO ASI Series F, vol. 164, Springer Verlag: Berlin, 1998, p. 524.

    Google Scholar 

  24. C. Ellis, K. Keddara, and G. Rozenberg, “Dynamic changes within workflow systems,” in Proc. of the Conf. on Organizational Computing Systems (COOCS'95), 1995.

  25. B. Ewing and P. Green, “Base calling of automated sequencer traces using Phred II: Error probability,” Genome Research, vol. 8, pp. 186–194, 1998.

    Google Scholar 

  26. X. Fang, J. Arnold, and J.A. Miller, “J3DV: A java-based 3D database visualization tool,” Software--Practice and Experience, vol. 32, no. 5, pp. 443–463, 2002.

    Google Scholar 

  27. R.F. Geever, L. Huiet, J.A. Baum, B.M. Tyler, V.B. Patel, B.J. Rutledge, M.E. Case, and N.H. Giles, “DNA sequence, organization and regulation of the qa gene cluster of Neurospora crassa,” J. Mol. Biol., vol. 207, pp. 15–34, 1989.

    Google Scholar 

  28. D. Georgakopoulos, M. Hornick, and A. Sheth, “An overview of workflow management: From process modeling to infrastructure for automation,” Distributed and Parallel Databases Journal, vol. 3, no. 2, pp. 119–153, 1995.

    Google Scholar 

  29. N. Goodman, S. Rozen, and L.D. Stein, “The labflow system for workflow management in large scale biology research laboratories,” in 6th Int. Conf. on Intelligent Systems for Molecular Biology, Montreal, Canada, AAAI Press: Menlo Park, 1998, pp. 69–77.

    Google Scholar 

  30. D. Gordon, C. Abajian, and P. Green, “Consed: A graphical tool for sequence finishing,” Genome Research, vol. 8, pp. 195–202, 1998.

    Google Scholar 

  31. N. Guimaraes, P. Antunes, and A. Pereira, “The integration of workflow systems and collaboration tools,” Advances in Workflow Management Systems and Interoperability, Istanbul, Turkey, 1997.

  32. D. Hall, “New computational tools for genome mapping,” Ph.D. Dissertation, University of Georgia, 1999.

  33. R.D. Hall, S. Bhandarkar, and J. Arnold, “ODS2:Amulti-platform software application for creating integrated physical and genetic maps,” Genetics, vol. 157, pp. 1045–1056, 2001a. Also in Hall, RD “New computational tools for genome mapping,” Ph.D. Dissertation, University of Georgia, 1999.

    Google Scholar 

  34. R.D. Hall, J.A. Miller, J. Arnold, K.J. Kochut, A.P. Sheth, and M.J. Weise, “Using workflow to build an information management system for a geographically distributed genome sequencing initiative,” in Genomics of Plants and Fungi, R.A. Prade and H.J. Bohnert (Eds.), Marcel Dekker: New York, in press.

  35. D. Hall, J. Miller, M. Weise, J. Arnold, K. Kochut, and A. Sheth, “Using workflow to build an information management system for a geographically distributed genome initiative,” submitted. In Hall, RD “New computational tools for genome mapping,” Ph.D. Dissertation, University of Georgia, 1999.

  36. Y. Han and A. Sheth, “On adaptive workflow modeling,” in 4th International Conference on Information Systems Analysis and Synthesis, Orlando, Florida, July 1998.

  37. C. Hensinger, M. Reichert, Th. Bauer, Th. Strzeletz, and P. Dadam, “ADEPTworkflow--Advanced workflow technology for the efficient support of adaptive, enterprise-wide processes,” in Conference on Extending Database Technology, Konstanz, Germany, March 2000.

  38. T. Hermann, “Workflow management systems: Ensuring organizational flexibility by possibilities of adaptation and negotiation,” in Proc. of the Conf. on Organizational Computing Systems (COOCS'95), 1995.

  39. D. Hollingsworth, “The Workflow Reference Model,” The Workflow Management Coalition, 1994.

  40. http://gene.genetics.uga.edu. Fungal Genome Resource.

  41. J.R. Hudson, E.P. Dawson, K.L. Rushing, C.H. Jackson, D. Lockshon, D. Conover, C. Lanciault, J.R. Harris, S.J. Simmons, R. Rothstein, and S. Fields, “The complete set of predicted genes from Saccharomyces cerevisiae in a readily usable form,” Genome Research, vol. 7, pp. 1169–1173, 1997.

    Google Scholar 

  42. C.A. Hutchison, S.N. Peterson, S.R. Gill et al., “Global transposon mutagenesis and a minimal Mycoplasma genome,” Science, vol. 286, pp. 2165–2169, 1999.

    Google Scholar 

  43. International Human Genome Sequencing Consortium, “Initial sequencing and analysis of the human genome,” Nature, vol. 409, pp. 860–918, 2001.

    Google Scholar 

  44. T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y. Sakaki, “Toward a protein-protein interaction map of the budding yeast: A comprehensive system to to examine twohybrid interactions in all possible combinations between the yeast proteins,” PNAS USA, vol. 97, pp. 1143–1147, 2000.

    Google Scholar 

  45. S. Jablonski, K. Stein, and M. Teschke, “Experiences in workflow management for scientific computing,” in Proceedings of the Workshop on Workflow Management in Scientific and Engineering Applications (at DEXA97), Toulouse, France, 1997.

  46. JDO, “Java data object expert group,” Java Data Object. 2000. JSR000012, Version 0.8. http://java.sun.com/aboutJava/communityprocess/review/jsr012/index.html.

  47. J. Kececioglu, H.-P. Lenhof, K. Mehlhorn, P. Mutzel, K. Reinert, and M. Vingron, “A polyhedral approach to sequence alignment problems,” Discrete Applied Mathematics, vol. 104, pp. 143–186, 2000.

    Google Scholar 

  48. J.D. Kececioglu and E.W. Myers, “Combinatorial algorithms for DNA sequence assembly,” Algorithmica, vol. 13, pp. 7–51, 1995.

    Google Scholar 

  49. H.S. Kelkar, J. Griffith, M.E. Case, S.F. Covert, R.D. Hall, C.H. Keith, J.S. Oliver, M.J. Orbach, M.S. Sachs, J.R. Wagner, M.J. Weise, J. Wunderlich, and J. Arnold, “The Neurospora crassa genome: Cosmid libraries sorted by chromosome,” Genetics, vol. 157, pp. 979–990, 2001.

    Google Scholar 

  50. K.J. Kochut, J. Arnold, J.A. Miller, and W.D. Potter, “Design of an object-oriented database for reverse genetics,” in Proceedings, First International Conference on Intelligent Systems for Molecular Biology, L. Hunter, D. Searls, and J. Shavlik (Eds.), AAAI Press: Menlo Park, CA, 1993, pp. 234–242.

    Google Scholar 

  51. K.J. Kochut, A.P. Sheth, and J.A. Miller, “Optimizing workflows,” Component Strategies, vol. 1, pp. 45–57 (SIGS Publications), 1999.

    Google Scholar 

  52. E. Kraemer, J. Wang, J. Guo, S. Hopkins, and J. Arnold, “An analysis of gene-finding approaches for Neurospora crassa,” Bioinformatics, vol. 17, pp. 901–912, 2001.

    Google Scholar 

  53. N. Krishnakumar and A. Sheth, “Managing heterogeneous multi-system tasks to support enterprise-wide operations,” Distributed and Parallel Databases Journal, vol. 3, no. 2, 1995.

  54. K. Lee, J.J. Loros, and J.C. Dunlap, “Interconnected feedback loops in the Neurospora Circadian system,” Science, vol. 289, pp. 107–110, 2000.

    Google Scholar 

  55. Z. Luo, A. Sheth, K. Kochut, and B. Arpinar, “Exception handling for conflict resolution in crossorganizational workflows,” Technical Report, LSDIS Lab, Computer Science, University of Georgia, April 2002.

  56. Z. Luo, A. Sheth, K.J. Kochut, and J.A. Miller, “Exception handling in workflow systems,” Applied Intelligence: The International Journal of AI, Neural Networks, and Complex Problem-Solving Technologies, vol. 13, no. 2, pp. 125–147, 2000.

    Google Scholar 

  57. R. McClatchey, J.-M. Le Geoff, N. Baker, W. Harris, and Z. Kovacs, “A distributed workflow and product data management application for the construction of large scale scientific apparatus,” Advances in Workflow Management Systems and Interoperability, Istanbul, Turkey, 1997.

  58. METEOR project home page, http://lsdis.cs.uga.edu/proj/meteor/meteor.html

  59. J.A. Miller, J. Arnold, K.J. Kochut, A.J. Cuticchia, and W.D. Potter, “Query driven simulation as a tool for genetic engineers,” in Proceedings of the International Conference on Simulation in Engineering Education, Newport Beach, CA, 1991, pp. 67–72. Also at http://chief.cs.uga.edu/∼miller/papers

  60. D. Miller, J. Guo, E. Kraemer, and Y. Xiong, “On-the-fly calculation and verification of consistent steering transactions,” in Proceedings of the Supercomputing Conference (SC2001), Denver, Colorado, 2001.

  61. J. Miller, D. Palaniswami, A. Sheth, K. Kochut, and H. Singh, “WebWork: METEOR's web-based workflow management system,” Journal of Intelligent Information Systems (JIIS), vol. 10, pp. 186–215, 1998.

    Google Scholar 

  62. J.A. Miller, A. Sheth, K.J. Kochut, and X. Wang, “CORBA-based run time architectures for workflow management systems,” Journal of Database Management, Special Issue on Multidatabases, vol. 7, no. 1, pp. 16–27, 1996.

    Google Scholar 

  63. J.A. Miller, A. Sheth, K.J. Kochut, X. Wang, and A. Murugan, “Simulation modeling with workflow technology,” in Proceedings of the 1995 Winter Simulation Conference, Dec. 1995, pp. 612–619. Also at http://chief.cs.uga.edu/∼miller/papers.

  64. OMG 2001. OMG, UML Resources Page, http://www.omg.org/technology/uml.

  65. D.D. Perkins, “Neurospora: The organism behind the molecular revolution,” Genetics, vol. 130, pp. 687–701, 1992.

    Google Scholar 

  66. D.D. Perkins, “Neurospora crassa genetic maps,” in Genetic Maps: Locus Maps of Complex Genomes, S.J. O'Brien (Ed.), Cold Spring Harbor Press: Cold Spring Harbor, NY, pp. 3.11–3.20, 1993.

    Google Scholar 

  67. D.D. Perkins, M.A. Sachs, and A. Radford, “The neuorspora compendium chromosomal loci,” Academic Press: New York.

  68. D.D. Perkins, B.C. Turner, and E.G. Barry, “Strains of Neurospora collected from nature,” Evolution, vol. 30, pp. 281–313, 1976.

    Google Scholar 

  69. R.A. Prade, J. Griffith, K. Kochut, J. Arnold, and W.E. Timberlake, “In vitro reconstruction of the Aspergillus(=Emericella) nidulans genome,” in Proceedings of the National Academy of Sciences USA, vol. 94, pp. 14564–14569, 1997.

    Google Scholar 

  70. M. Reichert and P. Dadam, “ADEPTflex--Supporting dynamic changes of workflows without losing control,” Journal of Intelligent Information Systems--Special Issue on Workflow Managament, vol. 10, no. 2, pp. 93–129, 1998.

    Google Scholar 

  71. J. Rumbaugh, Ivar Jacobson, and Grady Booch, The Unified Modeling Language Reference Manual, Addison-Wesley: Reading, MA, 1998.

    Google Scholar 

  72. A. Sheth, “From contemporary workflow process automation to adaptive and dynamic work activity coordination and collaboration,” in Proceedings of the Workshop on Workflows in Scientific and Engineering Applications, Toulouse, France, 1997.

  73. A. Sheth, W. Aalst, and I. Arpinar, “Processes driving the networked economy,” IEEE Concurrency, vol. 7, no. 3, pp. 18–31, 1999.

    Google Scholar 

  74. A. Sheth and K.J. Kochut, “Workflow applications to research agenda: Scalable and dynamic work coordination and collaboration systems,” Workflow Management Systems and Interoperability, A. Dogac et al. (Eds.), Springer Verlag: Berlin, 1998, pp. 35–60.

    Google Scholar 

  75. A. Sheth, K.J. Kochut, J.A. Miller, D. Worah, S. Das, D. Lin, D. Pallaniswami, J. Lynch, and I. Shevchenko, “Supporting state-wide immunization tracking using multi-paradigm workflow technology,” in Proceedings of the 22nd International Conference on Very Large Data Bases, Bombay, India, 1996, pp. 263–273.

  76. A. Sheth, D. Worah, K.J. Kochut, J.A. Miller, K.E. Zheng, D. Palaniswami, and S. Das, “The METEOR workflow management system and its use in prototyping significant healthcare applications,” in Proceedings Toward an Electronic Patient Record Conference (TEPR' 97), vol. 2, Nashville, TN, 1997, pp. 267–278.

    Google Scholar 

  77. J. Skolnick, J.S. Fetrow, and A. Kolinski, “Structural genomics and its importance for gene function analysis,” Nature Biotechnology, vol. 18, pp. 283–287, 2000.

    Google Scholar 

  78. S.H. Strogatz, “Exploring complex networks,” Nature, vol. 410, pp. 268–276, 2001.

    Google Scholar 

  79. Tian, Hui, “Storage management issues for high performance database visualization,” in Proceedings of the 39th Annual Southeastern ACM Conference, Athens, Georgia, March 2001, pp. 251–256.

  80. P. Uetz, L. Glot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, B. Godwin, D. Conover, T. Kalbfleish, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J.M. Rothberg, “A comprehensive analysis of protein-protein interactions in Sacharomyces cerevisiae,” Nature, vol. 403, pp. 623–627, 2001.

    Google Scholar 

  81. J.C. Venter, M.D. Adams, and E.W. Myers et al., “The sequence of the human genome,” Science, vol. 291, pp. 13040–1351, 2001.

    Google Scholar 

  82. M. Vidal, “Protein-protein interactions,” Encyclopedia of Genetics, Academic Press, vol. 3, pp. 1551–1552, 2002.

    Google Scholar 

  83. R.T. Watson, G.M. Zinkhan, and L.F. Pitt, “Object-orientation: A new perspective on strategy,” Paper read at Academic Industry Working Conference on Research Challenges, April 27–29, 2000at Buffalo, NY.

  84. D. Worah, A. Sheth, K. Kochut, and J. Miller, “An error handling framework for the ORBWork workflow enactment service of METEOR,” Technical Report, LSDIS Lab. Department of Computer Science, University of Georgia.

  85. Workflow Management Coalition Standards, http://www.aiim.org/wfmc/mainframe.htm

  86. S. Wu, A. Sheth, J.A. Miller, and Z. Luo, “Authorization and access control of application data in work-flow systems,” Journal of Intelligent Information Systems: Integrating Artificial Intelligence and Database Technologies (JIIS), vol. 18, no. 1, pp. 71–94, 2002.

    Google Scholar 

  87. Z. Xu, B. Lance, C. Vargas, B. Arpinar, S. Bhandarkar, E. Kraemer, K. Kochut. J. Miller, J. Wagner, M. Weise, J. Wunderlich, J. Stringer, G. Smulian, M. Cushion, and J. Arnold, “Mapping by sequencing the Pneumocystis genome using the ODS3 tool,” Genetics, in press.

  88. Y. Zhang, “A visualization system for protein interaction mapping using Java 3D technology,” Masters Thesis, UGA, 2001.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kochut, K., Arnold, J., Sheth, A. et al. IntelliGEN: A Distributed Workflow System for Discovering Protein-Protein Interactions. Distributed and Parallel Databases 13, 43–72 (2003). https://doi.org/10.1023/A:1021565722755

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021565722755

Navigation