Abstract
Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying on a single aspect of information is not sufficient and the matching results of individual matchers are often inaccurate and uncertain. The evidence theory is the state-of-the-art approach for combining multiple sources of uncertain information. However, traditional evidence theory has the limitations of treating individual matchers in different matching tasks equally for query interface matching, which reduces matching performance. This paper proposes a novel query interface matching approach based on extended evidence theory for Deep Web. Our approach firstly introduces the dynamic prediction procedure of different matchers' credibilities. Then, it extends traditional evidence theory with the credibilities and uses exponentially weighted evidence theory to combine the results of multiple matchers. Finally, it performs matching decision in terms of some heuristics to obtain the final matches. Our approach overcomes the shortage of traditional method and can adapt to different matching tasks. Experimental results demonstrate the feasibility and effectiveness of our proposed approach.
Similar content being viewed by others
References
Dragut E C, Yu C, Meng W. Meaningful labeling of integrated query interfaces. In Proc. the 32nd International Conference on Very Large Data Bases, Seoul, Korea, Sept. 12-15, 2006, pp.679-690.
He B, Chang K C. Statistical schema matching across Web query interfaces. In Proc. the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, USA, June 9-12, 2003, pp.217-228.
Wu W, Yu C, Doan A H, Meng W. An interactive clustering-based approach to integrating source query interfaces on the Deep Web. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, June 13-18, 2004, pp.95-106.
Wu W, Doan A H, Yu C. Merging interface schemas on the Deep Web via clustering aggregation. In Proc. the Fifth IEEE International Conference on Data Mining, Houston, USA, Nov. 27-30, 2005, pp.801-804.
Hong J, He Z, Bell D. An evidential approach to query interface matching on the deep Web. In Proc. the International Workshop on New Trends in Information Integration, Auckland, New Zealand, Aug. 23, 2008, pp.20-23.
He Z, Hong J, Bell D. Schema matching across query interfaces on the Deep Web. In Proc. the 25th British National Conference on Databases (BNCOD2008), Cardiff, UK, July 7-10, 2008, pp.51-62.
He H, Meng W, Yu C T, Wu Z. Wise-integrator: An automatic integrator of web search interfaces for e-commerce. In Proc the 29th International Conference on Very Large Data Bases, Berlin, Germany, Sept. 9-12, 2003, pp.357-368.
Dempster A P. Upper and lower probabilities induced by multivalued mapping. The Annals of Mathematical Statistics, 1967, 38(2): 325-339.
Rahm E, Bernstein P A. A survey of approaches to automatic schema matching. The VLDB Journal, 2001, 10(4): 334-350.
He B, Chang K C, Han J. Discovering complex matchings across web query interfaces: A correlation mining approach. In Proc. the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, Aug. 22-25, 2004, pp.148-157.
Do H H, Rahm E. COMA: A system for flexible combination of schema matching approaches. In Proc. the 28th International Conference on Very Large Data Bases, Hong Kong, China, Aug. 20-23, 2002, pp.610-621.
Madhavan J, Bernstein P A, Rahm E. Generic schema matching with cupid. In Proc. the 27th International Conference on Very Large Data Bases, Rome, Italy, Sept. 11-14, 2001, pp.49-58.
Yong K T. CMC: Combining multiple schema-matching strategies based on credibility prediction. In Proc. the 10th International Database Systems for Advanced Applications, Beijing, China, Apr. 17-20, 2005, pp.888-893.
Doan A, Domingos P, Halvey A. Reconciling schemas of disparate data sources: A machine-learning approach. In Proc. the 2001 SIGMOD International Conference on Management of Data, Santa Barbara, USA, May 21-24, 2001, pp.509-520.
Shafer G. A Mathematical Theory of Evidence. Princeton University Press, 1976.
Hall P A, Dowling G R. Approximate string matching. ACM Computing Surveys, 1980, 12(4): 381-402.
Cohen W, Ravikumar P, Fienberg S. A comparison of string distance metrics for name-matching tasks. In Proc. the 2nd International Workshop on Information Integration on the Web, Acapulco, Mexico, Aug. 9-10, 2003, pp.73-78.
ICQ Query Interfaces dataset. http://metaquerier.cs.uiuc.edu/repository/datasets/icq/index.html.
van Rijsbergen C J. Information Retrieval, Butterworths, 1979.
Wu W, Doan A H, Yu C. WebIQ: Learning from the Web to match Deep-Web query interfaces. In Proc. the 22nd International Conference on Data Engineering, Atlanta, GA, USA, April 3-8, 2006, pp.44-53.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Grant No. 90818001 and the Natural Science Foundation of Shandong Province of China under Grant No. Y2007G24.
Rights and permissions
About this article
Cite this article
Dong, YQ., Li, QZ., Ding, YH. et al. A Query Interface Matching Approach Based on Extended Evidence Theory for Deep Web. J. Comput. Sci. Technol. 25, 537–547 (2010). https://doi.org/10.1007/s11390-010-9343-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-010-9343-z