Abstract.
An increasing number of databases are becoming Web accessible through form-based search interfaces, and many of these sources are database-driven e-commerce sites. It is a daunting task for users to access numerous Web sites individually to get the desired information. Hence, providing a unified access to multiple e-commerce search engines selling similar products is of great importance in allowing users to search and compare products from multiple sites with ease. One key task for providing such a capability is to integrate the Web search interfaces of these e-commerce search engines so that user queries can be submitted against the integrated interface. Currently, integrating such search interfaces is carried out either manually or semiautomatically, which is inefficient and difficult to maintain. In this paper, we present WISE-Integrator - a tool that performs automatic integration of Web Interfaces of Search Engines. WISE-Integrator explores a rich set of special metainformation that exists in Web search interfaces and uses the information to identify matching attributes from different search interfaces for integration. It also resolves domain differences of matching attributes. In this paper, we also discuss how to automatically extract information from search interfaces that is needed by WISE-Integrator to perform automatic interface integration. Our experimental results, based on 143 real-world search interfaces in four different domains, indicate that WISE-Integrator can achieve high attribute matching accuracy and can produce high-quality integrated search interfaces without human interactions.
Similar content being viewed by others
References
C. Batini, M. Lenzerini, S. Navathe (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4):323-364
I. Benetti, D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini (2002) An information integration framework for e-commerce. IEEE Intell Sys 17(1):18-25
S. Bergamaschi, S. Castano, M. Vincini, D. Beneventano (2001) Semantic integration of heterogeneous information sources. Data Knowl Eng 36(3):215-249
W. Cohen (1998) Integration of heterogeneous databases without common domains using queries based on textual similarity. In: Proc. 17th ACM SIGMOD international conference on management of data, pp 201-212
H. Do, E. Rahm (2002) COMA - a system for flexible combination of schema matching approaches. In: Proc. 28th international conference on very large data bases, pp 610-621
A. Doan, P. Domingos, A. Halevy (2001) Reconciling schemas of disparate data sources: a machine-learning approach. In: Proc. 20th ACM SIGMOD international conference on management of data, pp 509-520
R.B. Doorenbos, O. Etzioni, D.S. Weld (1996) A scalable comparision-shopping agent for the World Wide Web. Technical Report UW-CSE-96-01-03, University of Washington
W. Frakes, R. Baeza-Yates (1992) Information retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ
M.R. Genesereth, A.M. Keller, O.M. Duschka (1997) Infomaster: an information integration system. In: Proc. 16th ACM SIGMOD international conference on management of data, pp 539-542
B. He, K. Chang (2003) Statistical schema matching across Web query interfaces. In: Proc. 22nd ACM SIGMOD international conference on management of data, pp 217-228
H. He, W. Meng, C. Yu, Z. Wu (2003) WISE-Integrator: an automatic integrator of Web search interfaces for e-commerce. In: Proc. 29th international conference on very large data bases, pp 357-368
H. He, W. Meng, C. Yu, Z. Wu (2003) WISE-iExtractor: extracting and modeling Web search interfaces for e-commerce metasearch. Technical report, Computer Science, State University of New York at Binghamton
HTML4: http://www.w3.org/TR/html4/
O. Kaljuvee, O. Buyukkokten, H. Garcia-Molina, A. Paepcke (2001) Efficient Web form entry on PDAs. In: Proc. 10th international World Wide Web conference, pp 663-672
J.A. Larson, S.B. Navathe, R. Elmasri (1989) A theory of attribute equivalence in databases with application to schema integration. IEEE Trans Softw Eng 15(4):449-463
W. Li, C. Clifton (2000) SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng 33(1):49-84
J. Madhavan, P. Bernstein, E. Rahm (2001) Generic schema matching with Cupid. In: Proc. 27th international conference on very large data bases, pp 49-58
S. Melnik, H. Garcia-Molina, E. Rahm (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proc. 18th IEEE international conference on data engineering, pp 117-128
G.A. Miller (1995) WordNet: a lexical database for English. Commun ACM 38(11):39-41
S. Raghavan, H. Garcia-Molina (2001) Crawling the hidden Web. In: Proc. 27th international conference on very large data bases, pp 129-138
E. Rahm, P. Bernstein (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334-350
G. Salton, M. McGill (1983) Introduction to modern information retrieval. McGraw-Hill, New York
WordNet: http://www.cogsci.princeton.edu
J. Wang, F.H. Lochovsky (2003) Data extraction and label assignment for Web databases. In: Proc. 12th international World Wide Web conference, pp 187-196
S. Wu, U. Manber (1992) Fast text searching allowing errors. Commun ACM 35(10):83-91
C. Yu, W. Meng (1998) Principles of database query processing for advanced applications. Morgan Kaufmann, San Francisco
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 2 January 2004, Accepted: 25 March 2004, Published online: 12 August 2004
Edited by: M. Carey
Rights and permissions
About this article
Cite this article
He, H., Meng, W., Yu, C. et al. Automatic integration of Web search interfaces with WISE-Integrator. VLDB 13, 256–273 (2004). https://doi.org/10.1007/s00778-004-0126-4
Issue Date:
DOI: https://doi.org/10.1007/s00778-004-0126-4