Skip to main content

Organizing Structured Deep Web by Clustering Query Interfaces Link Graph

  • Conference paper
Advanced Data Mining and Applications (ADMA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

Abstract

There are a lot of pages on internet that are generated dynamically by the back-end database and the traditional searching engines can’t reach these pages, which are called Deep Web. These sources are structured and provide structured query interfaces and results. Organizing structured Deep Web sources by their domain can let users browse these valuable resources and is one of the critical steps toward the large-scale Deep Web information integration. We propose a new strategy that automatically and accurately classifies Deep Web sources based on the form link graph, which can be easily constructed from web forms, and apply Fuzzy partition technique which is proved to be better suited for the features of Deep Web. Experiments using real Deep Web data show that our approach provides an effective and scalable solution for organizing Deep Web sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bergman, M.K.: The deep web: surfacing hidden value. Journal of electronic publishing 7(1), 8912–8914 (2002)

    Google Scholar 

  2. Zadeh, L.A.: Fuzzy sets. Information and Control 8(3), 338–353 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bezdek, J.C.: Cluster validity with fuzzy sets. Cybernet 3(3), 58–73 (1974)

    Article  MathSciNet  Google Scholar 

  4. He, B., Tao, T., Chang, K.C.: Clustering structured Web sources: a schema-based, model-differentiation Approach. EDBT, 536–546 (2004)

    Google Scholar 

  5. Li, R.P., Mukaidon, M.: A maximum entropy approach to fuzzy clustering. In: Proc. Of the 4th IEEE Int’l Conf. on Fuzzy System, pp. 2227–2232. IEEE, Yokohama (1995)

    Google Scholar 

  6. Peng, Q., Meng, W., He, H., Yu, C.T.: WISE-cluster: clustering e-commerce search engines automatically. In: Proceedings of the 6th ACM International Workshop on Web Information and Data Management, Washington, pp. 104–111 (2004)

    Google Scholar 

  7. He, H., Meng, W., Yu, C.T., Wu, Z.: Automatic integration of Web search interfaces with WISE-Integrator. VLDB Journal 13(3), 256–273 (2004)

    Article  Google Scholar 

  8. Dong, X., Halevy, A., Yu, C.: Data Integration with Uncertainties. In: VLDB (2007)

    Google Scholar 

  9. Ru, Y., Horowitz, E.: Indexing the invisible Web: a survey. Online Information Review 29(3), 249–265 (2005)

    Article  Google Scholar 

  10. Barbosa, L., Freire, J., Silva, A.: Organizing Hidden-Web Databases by Clustering Visible Web Documents. In: ICDE (2007)

    Google Scholar 

  11. Barbosa, L., Freire, J.: Combining Classifiers to Identify Online Databases. In: WWW (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, P., Huang, L., Fang, W., Cui, Z. (2008). Organizing Structured Deep Web by Clustering Query Interfaces Link Graph. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_72

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics