Skip to main content

Approximate Content Summary for Database Selection in Deep Web Data Integration

  • Conference paper
Book cover Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6185))

Included in the following conference series:

Abstract

In Deep Web data integration, the metaquerier provides a unified interface for each domain, which can dispatch the user query to the most relevant Web databases. Traditional database selection algorithms are often based on content summaries. However, many web-accessible databases are uncooperative. The only way of accessing the contents of these databases is via querying. In this paper, we propose an approximate content summary approach for database selection. Furthermore, the real-life databases are not always static and, accordingly, the statistical content summary needs to be updated periodically to reflect database content changes. Therefore, we also propose a survival function approach to give appropriate schedule to regenerate approximate content summary. We conduct extensive experiments to illustrate the accuracy and efficiency of our techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Deep Web: Surfacing Hidden Value, http://www.completeplanet.com/Tutorials/DeepWeb/

  2. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons, Chichester (2001)

    MATH  Google Scholar 

  3. Jiang, F., Meng, W., Meng, X.: Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds.) DASFAA 2009. LNCS, vol. 5667, pp. 595–600. Springer, Heidelberg (2009)

    Google Scholar 

  4. Meng, W., Liu, K., Yu, C., Wang, X., Chang, Y.: Determining Text Databases to Search in the Internet. In: VLDB 1998, New York, pp.14–25 (1998)

    Google Scholar 

  5. Wu, W., Yu, C., Meng, W.: Database Selection for Longer Queries. In: The 2004 Meeting of the International Federation of Classification Societies, Chicago, pp. 575–584 (2004)

    Google Scholar 

  6. Callan, J.P., Connell, M.E.: Query-based sampling of text databases. J. ACM Transactions on Information Systems (TOIS) 19(2), 97–130 (2001)

    Article  Google Scholar 

  7. Ipeirotis, P., Gravano, L.: Classification-Aware Hidden-Web Text Database Selection. J. ACM Transactions on Information Systems (TOIS) article 6 26(2) (2008)

    Google Scholar 

  8. Nie, Z., Kambhampati, S.: A Frequency-based Approach for Mining Coverage Statistics in Data Integration. In: ICDE 2004, Boston, pp. 387–398 (2004)

    Google Scholar 

  9. Dasgupta, A., Das, G., Mannila, H.: A random walk approach to sampling hidden databases. In: SIGMOD 2007, Beijing, pp. 629–640 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, F., Li, Y., Zhao, J., Yang, N. (2010). Approximate Content Summary for Database Selection in Deep Web Data Integration. In: Shen, H.T., et al. Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16720-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16720-1_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16719-5

  • Online ISBN: 978-3-642-16720-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics