ABSTRACT
In contrast to classical databases and IR systems, real-world information systems have to deal increasingly with very vague and diverse structures for information management and storage that cannot be adequately handled yet. While current object-relational database systems require clear and unified data schemas, IR systems usually ignore the structured information completely. Malleable schemas, as recently introduced, provide a novel way to deal with vagueness, ambiguity and diversity by incorporating imprecise and overlapping definitions of data structures. In this paper, we propose a novel query relaxation scheme that enables users to find best matching information by exploiting malleable schemas to effectively query vaguely structured information. Our scheme utilizes duplicates in differently described data sets to discover the correlations within a malleable schema, and then uses these correlations to appropriately relax the users' queries. In addition, it ranks results of the relaxed query according to their respective probability of satisfying the original query's intent. We have implemented the scheme and conducted extensive experiments with real-world data to confirm its performance and practicality.
- Special issue on data transformations. IEEE Data Eng. Bull., 22.Google Scholar
- S. Amer-Yahia, S. Cho, and D. Srivastava. Tree pattern relaxation. In EDBT, pages 496--513, 2002. Google ScholarDigital Library
- S. Amer-Yahia, L. V. S. Lakshmanan, and S. Pandit. Flexpath: Flexible structure and full-text querying for xml. In SIGMOD, pages 83--94, 2004. Google ScholarDigital Library
- W. T. Balke and M. Wagner. Through different eyes: assessing multiple conceptual views for querying web services. In WWW, pages 196--205, 2004. Google ScholarDigital Library
- A. Bilke and F. Naumann. Schema matching using duplicates. In ICDE, pages 69--80, 2005. Google ScholarDigital Library
- A. Z. Broder and A. C. Ciccolo. Towards the next generation of enterprise search technology. IBM Systems Journal, 43(3):451--454, 2004. Google ScholarDigital Library
- D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer.Searching xml documents via xml fragments. In SIGIR, pages 151--158, 2003. Google ScholarDigital Library
- W. W. Chu, H. Yang, K. Chiang, M. Minock, G. Chow, and C. Larson. Cobase: A scalable and extensible cooperative information system. Journal of Intelligent Information Systems, 6(2/3):223--259, 1996. Google ScholarDigital Library
- T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, pages 240--251, 2002. Google ScholarDigital Library
- A. Doan, P. Domingos, and A. Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD Conference, 2001. Google ScholarDigital Library
- A. Doan, J. Madhavan, P. Domingos, and A. Y. Halevy. Learning to map between ontologies on the semantic web. In WWW, pages 662--673, 2002. Google ScholarDigital Library
- X. Dong and A. Y. Halevy. Malleable schemas: A preliminary report. In WebDB, pages 139--144, 2005.Google Scholar
- X. Dong and A. Y. Halevy. A platform for personal information management and integration. In CIDR, pages 119--130, 2005.Google Scholar
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. accepted by The IEEE Transations on knowledge and Data Engineering (TKDE), Jan 2007. Google ScholarDigital Library
- N. Fuhr and K. Grosjohann. XIRQL: A query language for information retrieval in XML documents. In Research and Development in Information Retrieval, pages 172--180, 2001. Google ScholarDigital Library
- P. Godfrey. Minimization in cooperative response to failing database queries. International Journal of Cooperative Information Systems, 6(2):95--149, 1997.Google ScholarCross Ref
- B. He and K. C. C. Chang. Statistical schema matching across web query interfaces. In Sigmod, 2003. Google ScholarDigital Library
- D. Karger, K. Bakshi, D. Huynh, D. Quan, and V. Sinha. Haystack: A customizable general-purpose information management tool for end users of semistructured. In CIDR, 2003.Google Scholar
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- N. Koudas, C. Li, A. K. H. Tung, and R. Vernica. Relaxing join and selection queries. In VLDB, pages 199--210, 2006. Google ScholarDigital Library
- D. Lee. Query relaxation for xml model. PhD thesis, 2002. University of California. Google ScholarDigital Library
- Y. Li, C. Yu, and H. V. Jagadish. Schema free xquery. In VLDB, 2004. Google ScholarDigital Library
- J. Madhavan, P. A. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE, pages 57--68, 2005. Google ScholarDigital Library
- N. Polyzotis, M. N. Garofalakis, and Y. E. Ioannidis. Approximate xml query answers. In SIGMOD Conference, pages 263--274, 2004. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A Language modeling Approach to Information Retrieval. 1998. Google ScholarDigital Library
- E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001. Google ScholarDigital Library
- G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975. Google ScholarDigital Library
Index Terms
Query relaxation using malleable schemas
Recommendations
XPath Query Relaxation through Rewriting Rules
Query relaxation is the process of weakening a query to a more general one, and it is frequently employed to support approximate query answering. In this paper, rewriting systems for a wide fragment of XPath are investigated, which accomplish query ...
Cooperative Answering through Controlled Query Relaxation
This article presents methods to guide and control heuristically the relaxation of deductive and relational database queries. These methods enable a database system to compose responses that align with user needs. Query relaxation provides a user with ...
Adaptive query relaxation and top-k result ranking over autonomous web databases
Internet users may suffer the empty or too little answer problem when they post a strict query to the Web database. To address this problem, we develop a general framework to enable automatically query relaxation and top-k result ranking. Our framework ...
Comments