Skip to main content

Exploiting Parallelism to Accelerate Keyword Search on Deep-Web Sources

  • Conference paper
Data Integration in the Life Sciences (DILS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5647))

Included in the following conference series:

Abstract

Increasingly, biological data is being shared over the deep web. Many biological queries can only be answered by successively searching a number of distinct web-sites. This paper introduces a system that exploits parallelization for accelerating search over multiple deep web data sources. An interactive, two-stage multi-threading system is developed to achieve task parallelization, thread parallelization, and pipelined parallelization. We show the effectiveness of our system by considering a number of queries involving SNP datasets. We show that most of the queries can be accelerated significantly by exploiting these three forms of parallelism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babu, P., Boddepalli, R., Lakshmi, V., Rao, G.: Dod: Database of databases–updated molecular biology databases. Silico. Biol. 5 (2005)

    Google Scholar 

  2. Wang, F., Agrawal, G., Jin, R., Piontkivska, H.: Snpminer: A domain-specific deep web mining tool. In: Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, pp. 192–199 (2007)

    Google Scholar 

  3. Wang, F., Agrawal, G., Jin, R.: Query planning for searching inter-dependent deep-web databases. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 24–41. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Wang, F., Agrawal, G.: Seedeep: A system for exploring and enquiring scientific deep web data sources. In: Proceedings of SSDBM 2009 (2009) (to appear)

    Google Scholar 

  5. He, B., Zhang, Z., Chang, K.C.C.: Knocking the door to the deep web: Integrating web query interfaces. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of Data, pp. 913–914 (2004)

    Google Scholar 

  6. Chang, K., He, B., Zhang, Z.: Toward large scale integration: Building a metaquerier over databases on the web (2005)

    Google Scholar 

  7. He, H., Meng, W., Yu, C., Wu, Z.: Automatic integration of web search interfaces with wise_integrator. The international Journal on Very Large Data Bases 12, 256–273 (2004)

    Google Scholar 

  8. Wang, F., Agrawal, G., Jin, R.: A system for relational keyword searches over deep web data sources. Technical Report OSU-CISRC-03/08-TR10, The Ohio State University (March 2008)

    Google Scholar 

  9. Warnick, W.L., Lederman, A., Scott, R.L., Spence, K.J., Johnson, L.A., Allen, V.S.: Searching the deep web: Directed query engine applications at the department of energy. Technical report (2001)

    Google Scholar 

  10. Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. In: Proceedings of VLDB 2008, pp. 562–573 (2008)

    Google Scholar 

  11. Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: ICDE, p. 2004 (2002)

    Google Scholar 

  12. Deshpande, A., Hellerstein, L.: Flow algorithms for parallel query optimization. In: IEEE 24th International Conference on Data Engineering, 2008. ICDE 2008, pp. 754–763 (2008)

    Google Scholar 

  13. Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB 2006: Proceedings of the 32nd international conference on Very large data bases, VLDB Endowment, pp. 355–366 (2006)

    Google Scholar 

  14. Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming Scientific and Distributed Workflow with Triana Services. Concurrency and Computation: Practice and Experience (Special Issue: Workflow in Grid Systems) 18(10), 1021–1037 (2006)

    Article  Google Scholar 

  15. Rasolofo, Y.: Approaches to collection selection and results merging for distributed information retrieval. In: CIKM, pp. 191–198 (2001)

    Google Scholar 

  16. Orlando, S., Perego, R., Silvestri, F.: Design of a parallel and distributed web search engine. In: Proceedings of Parallel Computing (ParCo) 2001 conference, pp. 197–204. College Press, Imperial (2001)

    Google Scholar 

  17. Chaudhuri, S.: An overview of query optimization in relational systems. In: PODS, AC, pp. 34–43 (1998)

    Google Scholar 

  18. Hong, W., Stonebraker, M.: Optimization of parallel query execution plans in xprs. Technical Report UCB/ERL M91/50, EECS Department. University of California, Berkeley (1991)

    Google Scholar 

  19. Hasan, W.: Optimization of sql queries for parallel machines. PhD thesis, Stanford University (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, T., Wang, F., Agrawal, G. (2009). Exploiting Parallelism to Accelerate Keyword Search on Deep-Web Sources. In: Paton, N.W., Missier, P., Hedeler, C. (eds) Data Integration in the Life Sciences. DILS 2009. Lecture Notes in Computer Science(), vol 5647. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02879-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02879-3_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02878-6

  • Online ISBN: 978-3-642-02879-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics