Skip to main content
Log in

Automatic integration of Web search interfaces with WISE-Integrator

  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

An increasing number of databases are becoming Web accessible through form-based search interfaces, and many of these sources are database-driven e-commerce sites. It is a daunting task for users to access numerous Web sites individually to get the desired information. Hence, providing a unified access to multiple e-commerce search engines selling similar products is of great importance in allowing users to search and compare products from multiple sites with ease. One key task for providing such a capability is to integrate the Web search interfaces of these e-commerce search engines so that user queries can be submitted against the integrated interface. Currently, integrating such search interfaces is carried out either manually or semiautomatically, which is inefficient and difficult to maintain. In this paper, we present WISE-Integrator - a tool that performs automatic integration of Web Interfaces of Search Engines. WISE-Integrator explores a rich set of special metainformation that exists in Web search interfaces and uses the information to identify matching attributes from different search interfaces for integration. It also resolves domain differences of matching attributes. In this paper, we also discuss how to automatically extract information from search interfaces that is needed by WISE-Integrator to perform automatic interface integration. Our experimental results, based on 143 real-world search interfaces in four different domains, indicate that WISE-Integrator can achieve high attribute matching accuracy and can produce high-quality integrated search interfaces without human interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. C. Batini, M. Lenzerini, S. Navathe (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4):323-364

    Article  Google Scholar 

  2. I. Benetti, D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini (2002) An information integration framework for e-commerce. IEEE Intell Sys 17(1):18-25

    Google Scholar 

  3. S. Bergamaschi, S. Castano, M. Vincini, D. Beneventano (2001) Semantic integration of heterogeneous information sources. Data Knowl Eng 36(3):215-249

    Article  Google Scholar 

  4. W. Cohen (1998) Integration of heterogeneous databases without common domains using queries based on textual similarity. In: Proc. 17th ACM SIGMOD international conference on management of data, pp 201-212

  5. H. Do, E. Rahm (2002) COMA - a system for flexible combination of schema matching approaches. In: Proc. 28th international conference on very large data bases, pp 610-621

  6. A. Doan, P. Domingos, A. Halevy (2001) Reconciling schemas of disparate data sources: a machine-learning approach. In: Proc. 20th ACM SIGMOD international conference on management of data, pp 509-520

  7. R.B. Doorenbos, O. Etzioni, D.S. Weld (1996) A scalable comparision-shopping agent for the World Wide Web. Technical Report UW-CSE-96-01-03, University of Washington

  8. W. Frakes, R. Baeza-Yates (1992) Information retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ

    Google Scholar 

  9. M.R. Genesereth, A.M. Keller, O.M. Duschka (1997) Infomaster: an information integration system. In: Proc. 16th ACM SIGMOD international conference on management of data, pp 539-542

  10. B. He, K. Chang (2003) Statistical schema matching across Web query interfaces. In: Proc. 22nd ACM SIGMOD international conference on management of data, pp 217-228

  11. H. He, W. Meng, C. Yu, Z. Wu (2003) WISE-Integrator: an automatic integrator of Web search interfaces for e-commerce. In: Proc. 29th international conference on very large data bases, pp 357-368

  12. H. He, W. Meng, C. Yu, Z. Wu (2003) WISE-iExtractor: extracting and modeling Web search interfaces for e-commerce metasearch. Technical report, Computer Science, State University of New York at Binghamton

  13. HTML4: http://www.w3.org/TR/html4/

  14. O. Kaljuvee, O. Buyukkokten, H. Garcia-Molina, A. Paepcke (2001) Efficient Web form entry on PDAs. In: Proc. 10th international World Wide Web conference, pp 663-672

  15. J.A. Larson, S.B. Navathe, R. Elmasri (1989) A theory of attribute equivalence in databases with application to schema integration. IEEE Trans Softw Eng 15(4):449-463

    Article  Google Scholar 

  16. W. Li, C. Clifton (2000) SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng 33(1):49-84

    Article  Google Scholar 

  17. J. Madhavan, P. Bernstein, E. Rahm (2001) Generic schema matching with Cupid. In: Proc. 27th international conference on very large data bases, pp 49-58

  18. S. Melnik, H. Garcia-Molina, E. Rahm (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proc. 18th IEEE international conference on data engineering, pp 117-128

  19. G.A. Miller (1995) WordNet: a lexical database for English. Commun ACM 38(11):39-41

    Google Scholar 

  20. S. Raghavan, H. Garcia-Molina (2001) Crawling the hidden Web. In: Proc. 27th international conference on very large data bases, pp 129-138

  21. E. Rahm, P. Bernstein (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334-350

    Article  Google Scholar 

  22. G. Salton, M. McGill (1983) Introduction to modern information retrieval. McGraw-Hill, New York

  23. WordNet: http://www.cogsci.princeton.edu

  24. J. Wang, F.H. Lochovsky (2003) Data extraction and label assignment for Web databases. In: Proc. 12th international World Wide Web conference, pp 187-196

  25. S. Wu, U. Manber (1992) Fast text searching allowing errors. Commun ACM 35(10):83-91

    Google Scholar 

  26. C. Yu, W. Meng (1998) Principles of database query processing for advanced applications. Morgan Kaufmann, San Francisco

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai He.

Additional information

Received: 2 January 2004, Accepted: 25 March 2004, Published online: 12 August 2004

Edited by: M. Carey

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, H., Meng, W., Yu, C. et al. Automatic integration of Web search interfaces with WISE-Integrator. VLDB 13, 256–273 (2004). https://doi.org/10.1007/s00778-004-0126-4

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-004-0126-4

Keywords:

Navigation