Article

Query relaxation using malleable schemas

Authors:

Wolf-Tilo Balke,

Wolfgang NejdlAuthors Info & Claims

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Pages 545 - 556

https://doi.org/10.1145/1247480.1247541

Published: 11 June 2007 Publication History

Abstract

In contrast to classical databases and IR systems, real-world information systems have to deal increasingly with very vague and diverse structures for information management and storage that cannot be adequately handled yet. While current object-relational database systems require clear and unified data schemas, IR systems usually ignore the structured information completely. Malleable schemas, as recently introduced, provide a novel way to deal with vagueness, ambiguity and diversity by incorporating imprecise and overlapping definitions of data structures. In this paper, we propose a novel query relaxation scheme that enables users to find best matching information by exploiting malleable schemas to effectively query vaguely structured information. Our scheme utilizes duplicates in differently described data sets to discover the correlations within a malleable schema, and then uses these correlations to appropriately relax the users' queries. In addition, it ranks results of the relaxed query according to their respective probability of satisfying the original query's intent. We have implemented the scheme and conducted extensive experiments with real-world data to confirm its performance and practicality.

References

[1]

Special issue on data transformations. IEEE Data Eng. Bull., 22.

[2]

S. Amer-Yahia, S. Cho, and D. Srivastava. Tree pattern relaxation. In EDBT, pages 496--513, 2002.

Digital Library

[3]

S. Amer-Yahia, L. V. S. Lakshmanan, and S. Pandit. Flexpath: Flexible structure and full-text querying for xml. In SIGMOD, pages 83--94, 2004.

Digital Library

[4]

W. T. Balke and M. Wagner. Through different eyes: assessing multiple conceptual views for querying web services. In WWW, pages 196--205, 2004.

Digital Library

[5]

A. Bilke and F. Naumann. Schema matching using duplicates. In ICDE, pages 69--80, 2005.

Digital Library

[6]

A. Z. Broder and A. C. Ciccolo. Towards the next generation of enterprise search technology. IBM Systems Journal, 43(3):451--454, 2004.

Digital Library

[7]

D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer.Searching xml documents via xml fragments. In SIGIR, pages 151--158, 2003.

Digital Library

[8]

W. W. Chu, H. Yang, K. Chiang, M. Minock, G. Chow, and C. Larson. Cobase: A scalable and extensible cooperative information system. Journal of Intelligent Information Systems, 6(2/3):223--259, 1996.

Digital Library

[9]

T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, pages 240--251, 2002.

Digital Library

[10]

A. Doan, P. Domingos, and A. Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD Conference, 2001.

Digital Library

[11]

A. Doan, J. Madhavan, P. Domingos, and A. Y. Halevy. Learning to map between ontologies on the semantic web. In WWW, pages 662--673, 2002.

Digital Library

[12]

X. Dong and A. Y. Halevy. Malleable schemas: A preliminary report. In WebDB, pages 139--144, 2005.

[13]

X. Dong and A. Y. Halevy. A platform for personal information management and integration. In CIDR, pages 119--130, 2005.

[14]

A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. accepted by The IEEE Transations on knowledge and Data Engineering (TKDE), Jan 2007.

Digital Library

[15]

N. Fuhr and K. Grosjohann. XIRQL: A query language for information retrieval in XML documents. In Research and Development in Information Retrieval, pages 172--180, 2001.

Digital Library

[16]

P. Godfrey. Minimization in cooperative response to failing database queries. International Journal of Cooperative Information Systems, 6(2):95--149, 1997.

[17]

B. He and K. C. C. Chang. Statistical schema matching across web query interfaces. In Sigmod, 2003.

Digital Library

[18]

D. Karger, K. Bakshi, D. Huynh, D. Quan, and V. Sinha. Haystack: A customizable general-purpose information management tool for end users of semistructured. In CIDR, 2003.

[19]

J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.

Digital Library

[20]

N. Koudas, C. Li, A. K. H. Tung, and R. Vernica. Relaxing join and selection queries. In VLDB, pages 199--210, 2006.

Digital Library

[21]

D. Lee. Query relaxation for xml model. PhD thesis, 2002. University of California.

Digital Library

[22]

Y. Li, C. Yu, and H. V. Jagadish. Schema free xquery. In VLDB, 2004.

Digital Library

[23]

J. Madhavan, P. A. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE, pages 57--68, 2005.

Digital Library

[24]

N. Polyzotis, M. N. Garofalakis, and Y. E. Ioannidis. Approximate xml query answers. In SIGMOD Conference, pages 263--274, 2004.

Digital Library

[25]

J. M. Ponte and W. B. Croft. A Language modeling Approach to Information Retrieval. 1998.

Digital Library

[26]

E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001.

Digital Library

[27]

G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975.

Digital Library

Cited By

Yun JTak BHan W(2024)ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize FrameworkProceedings of the VLDB Endowment10.14778/3681954.368201917:11(3538-3550)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3682019
Frosini RPoulovassilis AWood PCalí A(2022)Optimisation Techniques for Flexible SPARQL QueriesACM Transactions on the Web10.1145/353285516:4(1-44)Online publication date: 16-Nov-2022
https://dl.acm.org/doi/10.1145/3532855
Wang MLiu JWei BYao SZeng HShi L(2019)Answering why-not questions on SPARQL queriesKnowledge and Information Systems10.1007/s10115-018-1155-458:1(169-208)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s10115-018-1155-4
Show More Cited By

Index Terms

Query relaxation using malleable schemas
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

XPath Query Relaxation through Rewriting Rules

Query relaxation is the process of weakening a query to a more general one, and it is frequently employed to support approximate query answering. In this paper, rewriting systems for a wide fragment of XPath are investigated, which accomplish query ...
Cooperative Answering through Controlled Query Relaxation

This article presents methods to guide and control heuristically the relaxation of deductive and relational database queries. These methods enable a database system to compose responses that align with user needs. Query relaxation provides a user with ...
Adaptive query relaxation and top-k result ranking over autonomous web databases

Internet users may suffer the empty or too little answer problem when they post a strict query to the Web database. To address this problem, we develop a general framework to enable automatically query relaxation and top-k result ranking. Our framework ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

June 2007

1210 pages

ISBN:9781595936868

DOI:10.1145/1247480

General Chairs:
Lizhu Zhou
Tsinghua University, China
,
Tok Wang Ling
National University of Singapore, Singapore
,
Program Chair:
Beng Chin Ooi
National University of Singapore, Singapore

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGMOD/PODS07

Sponsor:

SIGMOD/PODS07: International Conference on Management of Data

June 11 - 14, 2007

Beijing, China

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
39
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yun JTak BHan W(2024)ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize FrameworkProceedings of the VLDB Endowment10.14778/3681954.368201917:11(3538-3550)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3682019
Frosini RPoulovassilis AWood PCalí A(2022)Optimisation Techniques for Flexible SPARQL QueriesACM Transactions on the Web10.1145/353285516:4(1-44)Online publication date: 16-Nov-2022
https://dl.acm.org/doi/10.1145/3532855
Wang MLiu JWei BYao SZeng HShi L(2019)Answering why-not questions on SPARQL queriesKnowledge and Information Systems10.1007/s10115-018-1155-458:1(169-208)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s10115-018-1155-4
Naseriparsa MIslam MLiu CMoser I(2018)No-but-semantic-matchWorld Wide Web10.1007/s11280-017-0503-821:5(1223-1257)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1007/s11280-017-0503-8
Poulovassilis A(2018)Applications of Flexible Querying to Graph DataGraph Data Management10.1007/978-3-319-96193-4_4(97-142)Online publication date: 1-Nov-2018
https://doi.org/10.1007/978-3-319-96193-4_4
Catania BGuerrini G(2017)25 $$+$$ Years of Query Processing - From a Single, Stored Data Set to Big Data (and Beyond)A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years10.1007/978-3-319-61893-7_5(77-91)Online publication date: 31-May-2017
https://doi.org/10.1007/978-3-319-61893-7_5
Shi RWang HWang THou YTang YLi JGao H(2017)Similarity Search Combining Query Relaxation and DiversificationDatabase Systems for Advanced Applications10.1007/978-3-319-55699-4_5(65-84)Online publication date: 22-Mar-2017
https://doi.org/10.1007/978-3-319-55699-4_5
Yahya MBarbosa DBerberich KWang QWeikum GBennett PJosifovski VNeville JRadlinski F(2016)Relationship Queries on Extended Knowledge GraphsProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835795(605-614)Online publication date: 8-Feb-2016
https://dl.acm.org/doi/10.1145/2835776.2835795
(2016)Approximation and relaxation of semantic web path queriesWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2016.08.00140:C(1-21)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.websem.2016.08.001
Catania BGuerrini G(2015)Adaptively Approximate Techniques in Distributed ArchitecturesSOFSEM 2015: Theory and Practice of Computer Science10.1007/978-3-662-46078-8_7(65-77)Online publication date: 2015
https://doi.org/10.1007/978-3-662-46078-8_7
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten