skip to main content
10.1145/1247480.1247541acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Query relaxation using malleable schemas

Published: 11 June 2007 Publication History

Abstract

In contrast to classical databases and IR systems, real-world information systems have to deal increasingly with very vague and diverse structures for information management and storage that cannot be adequately handled yet. While current object-relational database systems require clear and unified data schemas, IR systems usually ignore the structured information completely. Malleable schemas, as recently introduced, provide a novel way to deal with vagueness, ambiguity and diversity by incorporating imprecise and overlapping definitions of data structures. In this paper, we propose a novel query relaxation scheme that enables users to find best matching information by exploiting malleable schemas to effectively query vaguely structured information. Our scheme utilizes duplicates in differently described data sets to discover the correlations within a malleable schema, and then uses these correlations to appropriately relax the users' queries. In addition, it ranks results of the relaxed query according to their respective probability of satisfying the original query's intent. We have implemented the scheme and conducted extensive experiments with real-world data to confirm its performance and practicality.

References

[1]
Special issue on data transformations. IEEE Data Eng. Bull., 22.
[2]
S. Amer-Yahia, S. Cho, and D. Srivastava. Tree pattern relaxation. In EDBT, pages 496--513, 2002.
[3]
S. Amer-Yahia, L. V. S. Lakshmanan, and S. Pandit. Flexpath: Flexible structure and full-text querying for xml. In SIGMOD, pages 83--94, 2004.
[4]
W. T. Balke and M. Wagner. Through different eyes: assessing multiple conceptual views for querying web services. In WWW, pages 196--205, 2004.
[5]
A. Bilke and F. Naumann. Schema matching using duplicates. In ICDE, pages 69--80, 2005.
[6]
A. Z. Broder and A. C. Ciccolo. Towards the next generation of enterprise search technology. IBM Systems Journal, 43(3):451--454, 2004.
[7]
D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer.Searching xml documents via xml fragments. In SIGIR, pages 151--158, 2003.
[8]
W. W. Chu, H. Yang, K. Chiang, M. Minock, G. Chow, and C. Larson. Cobase: A scalable and extensible cooperative information system. Journal of Intelligent Information Systems, 6(2/3):223--259, 1996.
[9]
T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, pages 240--251, 2002.
[10]
A. Doan, P. Domingos, and A. Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD Conference, 2001.
[11]
A. Doan, J. Madhavan, P. Domingos, and A. Y. Halevy. Learning to map between ontologies on the semantic web. In WWW, pages 662--673, 2002.
[12]
X. Dong and A. Y. Halevy. Malleable schemas: A preliminary report. In WebDB, pages 139--144, 2005.
[13]
X. Dong and A. Y. Halevy. A platform for personal information management and integration. In CIDR, pages 119--130, 2005.
[14]
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. accepted by The IEEE Transations on knowledge and Data Engineering (TKDE), Jan 2007.
[15]
N. Fuhr and K. Grosjohann. XIRQL: A query language for information retrieval in XML documents. In Research and Development in Information Retrieval, pages 172--180, 2001.
[16]
P. Godfrey. Minimization in cooperative response to failing database queries. International Journal of Cooperative Information Systems, 6(2):95--149, 1997.
[17]
B. He and K. C. C. Chang. Statistical schema matching across web query interfaces. In Sigmod, 2003.
[18]
D. Karger, K. Bakshi, D. Huynh, D. Quan, and V. Sinha. Haystack: A customizable general-purpose information management tool for end users of semistructured. In CIDR, 2003.
[19]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[20]
N. Koudas, C. Li, A. K. H. Tung, and R. Vernica. Relaxing join and selection queries. In VLDB, pages 199--210, 2006.
[21]
D. Lee. Query relaxation for xml model. PhD thesis, 2002. University of California.
[22]
Y. Li, C. Yu, and H. V. Jagadish. Schema free xquery. In VLDB, 2004.
[23]
J. Madhavan, P. A. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE, pages 57--68, 2005.
[24]
N. Polyzotis, M. N. Garofalakis, and Y. E. Ioannidis. Approximate xml query answers. In SIGMOD Conference, pages 263--274, 2004.
[25]
J. M. Ponte and W. B. Croft. A Language modeling Approach to Information Retrieval. 1998.
[26]
E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001.
[27]
G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975.

Cited By

View all
  • (2024)ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize FrameworkProceedings of the VLDB Endowment10.14778/3681954.368201917:11(3538-3550)Online publication date: 30-Aug-2024
  • (2022)Optimisation Techniques for Flexible SPARQL QueriesACM Transactions on the Web10.1145/353285516:4(1-44)Online publication date: 16-Nov-2022
  • (2019)Answering why-not questions on SPARQL queriesKnowledge and Information Systems10.1007/s10115-018-1155-458:1(169-208)Online publication date: 1-Jan-2019
  • Show More Cited By

Index Terms

  1. Query relaxation using malleable schemas

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
    June 2007
    1210 pages
    ISBN:9781595936868
    DOI:10.1145/1247480
    • General Chairs:
    • Lizhu Zhou,
    • Tok Wang Ling,
    • Program Chair:
    • Beng Chin Ooi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. malleable schema
    2. query relaxation

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize FrameworkProceedings of the VLDB Endowment10.14778/3681954.368201917:11(3538-3550)Online publication date: 30-Aug-2024
    • (2022)Optimisation Techniques for Flexible SPARQL QueriesACM Transactions on the Web10.1145/353285516:4(1-44)Online publication date: 16-Nov-2022
    • (2019)Answering why-not questions on SPARQL queriesKnowledge and Information Systems10.1007/s10115-018-1155-458:1(169-208)Online publication date: 1-Jan-2019
    • (2018)No-but-semantic-matchWorld Wide Web10.1007/s11280-017-0503-821:5(1223-1257)Online publication date: 1-Sep-2018
    • (2018)Applications of Flexible Querying to Graph DataGraph Data Management10.1007/978-3-319-96193-4_4(97-142)Online publication date: 1-Nov-2018
    • (2017)25 $$+$$ Years of Query Processing - From a Single, Stored Data Set to Big Data (and Beyond)A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years10.1007/978-3-319-61893-7_5(77-91)Online publication date: 31-May-2017
    • (2017)Similarity Search Combining Query Relaxation and DiversificationDatabase Systems for Advanced Applications10.1007/978-3-319-55699-4_5(65-84)Online publication date: 22-Mar-2017
    • (2016)Relationship Queries on Extended Knowledge GraphsProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835795(605-614)Online publication date: 8-Feb-2016
    • (2016)Approximation and relaxation of semantic web path queriesWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2016.08.00140:C(1-21)Online publication date: 1-Oct-2016
    • (2015)Adaptively Approximate Techniques in Distributed ArchitecturesSOFSEM 2015: Theory and Practice of Computer Science10.1007/978-3-662-46078-8_7(65-77)Online publication date: 2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media