Skip to main content
Log in

Composing, optimizing, and executing plans for bioinformatics web services

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The emergence of a large number of bioinformatics datasets on the Internet has resulted in the need for flexible and efficient approaches to integrate information from multiple bioinformatics data sources and services. In this paper, we present our approach to automatically generate composition plans for web services, optimize the composition plans, and execute these plans efficiently. While data integration techniques have been applied to the bioinformatics domain, the focus has been on answering specific user queries. In contrast, we focus on automatically generating parameterized integration plans that can be hosted as web services that respond to a range of inputs. In addition, we present two novel techniques that improve the execution time of the generated plans by reducing the number of requests to the existing data sources and by executing the generated plan more efficiently. The first optimization technique, called tuple-level filtering, analyzes the source/service descriptions in order to automatically insert filtering conditions in the composition plans that result in fewer requests to the component web services. To ensure that the filtering conditions can be evaluated, this technique may include sensing operations in the integration plan. The savings due to filtering significantly exceed the cost of the sensing operations. The second optimization technique consists in mapping the integration plans into programs that can be executed by a dataflow-style, streaming execution engine. We use real-world bioinformatics web services to show experimentally that (1) our automatic composition techniques can efficiently generate parameterized plans that integrate data from large numbers of existing services and (2) our optimization techniques can significantly reduce the response time of the generated integration plans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bright, L., Gruser, J.-R., Raschid, L., Vidal, M.E.: A wrapper generation toolkit to specify and construct wrappers for web accessible data sources (web sources). J. Comput. Syst. Sci. Eng. 14(2), (1999)

  2. Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. In: Proceedings of the International Conference on Artificial Intelligence, IJCAI-97 (1997)

  3. Muslea, I., Minton, S., Knoblock, C.A.: Selective sampling with redundant views. In: Proceedings of the 17th National Conference on Artificial Intelligence (2000)

  4. Schoppers, M.: Universal plans for reactive robots in unpredictable environments. In: Proceedings of the International Conference on Artificial Intelligence, IJCAI-87 (1987)

  5. Thakkar, S., Ambite, J.L., Knoblock, C.A.: A view integration approach to dynamic composition of web services. In: Proceedings of 2003 ICAPS Workshop on Planning for Web Services. Trento, Italy (2003)

  6. Thakkar, S., Ambite, J.L., Knoblock, C.A.: A data integration approach to automatically composing and optimizing web services. In: Proceedings of 2004 ICAPS Workshop on Planning and Scheduling for Web and Grid Services (2004)

  7. Thakkar, S., Knoblock, C.A.: Efficient execution of recursive integration plans. In: Proceeding of 2003 IJCAI Workshop on Information Integration on the Web. Acapulco, Mexico (2003)

  8. Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: Proceedings of the Eighth ACM SIGKDD International Conference. Edmonton, Alberta, Canada (2002)

  9. Bayardo, R.J., Jr., Bohrer, W., Brice, R.S., Cichocki, A., Flower, J., Helal, A., Kashyap, V., Ksiezyk, T., Martin, G., Nodine, M., Rashid, M., Rusinkiewicz, M., Shea, R., Unnikrishnan, C., Unruh, A., Woelk, D.: Infosleuth: agent-based semantic integration of information in open and dynamic environments. In: Proceedings of ACM SIGMOD-97 (1997)

  10. Genesereth, M.R., Keller, A.M., Duschka, O.M.: Infomaster: an information integration system. In: Proceedings of ACM SIGMOD-97 (1997)

  11. Knoblock, C.A., Minton, S., Ambite, J.-L., Ashish, N., Muslea, I., Philpot, A., Tejada, S.: The ariadne approach to web-based information integration. Int. J. Intell. Cooperative Inform. Syst. (IJCIS) 10(1–2), 145–169 (2001)

    Article  Google Scholar 

  12. Levy, A.Y., Rajaraman, A., Ordille, J.J.: Query-answering algorithms for information agents. In: Proceedings of AAAI-96 (1996)

  13. Duschka, O.M.: Query planning and optimization in information integration. PhD thesis, Stanford University (1997)

  14. Levy, A.: Logic-based techniques in data integration. In: Minker, J. (ed.) Logic Based Artificial Intelligence. Kluwer, Boston (2000)

    Google Scholar 

  15. Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J., Widom, J.: Integrating and accessing heterogeneous information sources in tsimmis. In: Proceedings of the AAAI Symposium on Information Gathering. Stanford, CA (1995)

  16. Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of ACM Symposium on Principles of Database Systems. Madison, WI, USA (2002)

  17. Golden, K.: Leap before you look: information gathering in the puccini planner. In: Proceedings of the 4th International Conference on Artificial Intelligence Planning Systems (1998)

  18. Haas, L.M., Kodali, P., Rice, J.E., Schwarz, P.M., Swope, W.C.: Integrating life sciences data-with a little garlic. In: Proceedings of the IEEE International Symposium on Bio-Informatics and Biomedical Engineering (BIBE'00), pp. 5–13 (2000)

  19. Kambhampati, S., Lambrecht, E., Nambiar, U., Nie, Z., Gnanaprakasam, S.: Optimizing recursive information gathering plans in emerac. J. Intell. Inform. Syst. (2003)

  20. Lacroix, Z., Raschid, L., Eckman, B.A.: Techniques for optimization of queries on integrated biological resources. J. Bioinform. Comput. Biol. 2(2), 375–411 (2004)

    Article  PubMed  Google Scholar 

  21. Kifer, M., Lozinskii, E.L.: On compile-time query optimization in deductive databases by means of static filtering. ACM Trans. Database Syst. 15(3), 385–426 (1990)

    Article  Google Scholar 

  22. Levy, A.Y., Suciu, D.: Deciding containment for queries with complex objects. In: Proceedings of the 16th ACM SIGACT–SIGMOD–SIGART Symposium on Principles of Database Systems, pp. 20–31 (1997)

  23. Lacroix, Z., Raschid, L.: A map of biological resources to support a complete characterization of scientific entities. Technical report, University of Maryland (2002)

  24. Michalowski, M., Thakkar, S., Knoblock, C.: Automatically utilizing secondary sources to align information across sources, special issue on semantic integration. AI Mag. 26(1), 33–45 (2005)

    Google Scholar 

  25. Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: ACM SIGMOD Conference (1999)

  26. Barish, G., Knoblock, C.A.: An expressive language and efficient execution system for software agents. J. Artif. Intell. Res. 23, 625–666 (2005)

    Google Scholar 

  27. Pottinger, R., Levy, A.: A scalable algorithm for answering queries using views. VLDB J. 484–495 (2000)

  28. Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.A.: Adaptive query processing: technology in evolution. IEEE Data Eng. Bull. 23(2), 7–18 (2000)

    Google Scholar 

  29. Naughton, J.F., DeWitt, D.J., Maier, D., Aboulnaga, A., Chen, J., Galanis, L., Kang, J., Krishnamurthy, R., Luo, Q., Prakash, N., Ramamurthy, R., Shanmugasundaram, J., Tian, F., Tufte, K., Viglas, S., Wang, Y., Zhang, C., Jackson, B., Gupta, A., Chen, R.: The niagara Internet query system. IEEE Data Eng. Bull. 24(2), 27–33 (2001)

    Google Scholar 

  30. Mork, P., Halevy, A., Tarczy-Hornoch, P.: A model for data integration systems of biomedical data applied to online genetic databases. In: Proceedings of the American Medical Informatics Association Fall Symposium (AMIA) (2001)

  31. Mork, P., Shaker, R., Halevy, A., Tarczy-Hornoch, P.: Pql: a declarative query language over dynamic biological schemata. In: Proceedings of the American Medical Informatics Association Fall Symposium (AMIA). San Antonio, TX (2002)

  32. Buneman, P., Crabtree, J., Davidson, S.B., Overton, C., Tannen, V., Wong, L., BioKleisli: Integrating biomedical data and analysis packages. In: Letovsky, S. (ed.) Bioinformatics: Databases and Systems. Kluwer Academic Publishers, pp. 201–217 (1999)

  33. Davidson, S.B., Overton, G.C., Tannen, V., Wong, L.: Biokleisli: a digital library for biomedical researchers. Int. J. Digital Libraries 1(1), 36–53 (1997)

    Google Scholar 

  34. Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent access to multiple bioinformatics information sources, special issue on deep computing for the life sciences. IBM Syst. J. 40(2), 532–552 (2001)

    Google Scholar 

  35. Stevens, R., Goble, C., Paton, N.W., Bechhofer, S., Ng, G., Baker, P., Brass, A.: Complex query formulation over diverse information sources in TAMBIS. In: Lacroix, Z., Critchlow, T. (eds.) Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco, CA (2003)

    Google Scholar 

  36. Eckman, B.A., Kosky, A.S., Laroco, L.A., Jr.: Extending traditional query-based integration approaches for functional characterization of post-genomic data. Bioinformatics 17(7), 587–601 (2001)

    Article  PubMed  Google Scholar 

  37. Eckman, B.A., Lacroix, Z., Raschid, L.: Optimized seamless integration of biomolecular data. In: Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE'01), pp. 23–32 (2001)

  38. Ashish, N., Knoblock, C.A., Levy, A.: Information gathering plans with sensing actions. In: European Conference on Planning, ECP-97. Toulouse, France (1997)

  39. Ullman, J.: Principles of Data and Knowledge-Base Systems. Computer Science Press, New York (1988)

    Google Scholar 

  40. Bultan, T., Fu, X., Hull, R., Su, J.: Conversation specification: a new approach to design and analysis of e-service composition. In: Proceedings of 12th International World Wide Web Conference (WWW) (2003)

  41. McIlraith, S., Son, T.C.: Adapting golog for composition of semantic web services. In: Proceedings of the 8th International Conference on Knowledge Representation and Reasoning (KR'02). Toulouse, France (2002)

  42. Wu, D., Parsia, B., Sirin, E., Hendler, J., Nau, D.: Automating daml-s web services composition using shop2. In: 2nd International Semantic Web Conference (ISWC2003) (2003)

  43. Levesque, H.J., Reiter, R., Lesperance, Y., Lin, F., Scherl, R.B.: GOLOG: a logic programming language for dynamic domains. J. Logic Program. 31(1–3), 59–83 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snehal Thakkar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thakkar, S., Ambite, J.L. & Knoblock, C.A. Composing, optimizing, and executing plans for bioinformatics web services. The VLDB Journal 14, 330–353 (2005). https://doi.org/10.1007/s00778-005-0158-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-005-0158-4

Keywords

Navigation