skip to main content
10.1145/1247480.1247541acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Query relaxation using malleable schemas

Published:11 June 2007Publication History

ABSTRACT

In contrast to classical databases and IR systems, real-world information systems have to deal increasingly with very vague and diverse structures for information management and storage that cannot be adequately handled yet. While current object-relational database systems require clear and unified data schemas, IR systems usually ignore the structured information completely. Malleable schemas, as recently introduced, provide a novel way to deal with vagueness, ambiguity and diversity by incorporating imprecise and overlapping definitions of data structures. In this paper, we propose a novel query relaxation scheme that enables users to find best matching information by exploiting malleable schemas to effectively query vaguely structured information. Our scheme utilizes duplicates in differently described data sets to discover the correlations within a malleable schema, and then uses these correlations to appropriately relax the users' queries. In addition, it ranks results of the relaxed query according to their respective probability of satisfying the original query's intent. We have implemented the scheme and conducted extensive experiments with real-world data to confirm its performance and practicality.

References

  1. Special issue on data transformations. IEEE Data Eng. Bull., 22.Google ScholarGoogle Scholar
  2. S. Amer-Yahia, S. Cho, and D. Srivastava. Tree pattern relaxation. In EDBT, pages 496--513, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Amer-Yahia, L. V. S. Lakshmanan, and S. Pandit. Flexpath: Flexible structure and full-text querying for xml. In SIGMOD, pages 83--94, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. T. Balke and M. Wagner. Through different eyes: assessing multiple conceptual views for querying web services. In WWW, pages 196--205, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bilke and F. Naumann. Schema matching using duplicates. In ICDE, pages 69--80, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Z. Broder and A. C. Ciccolo. Towards the next generation of enterprise search technology. IBM Systems Journal, 43(3):451--454, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer.Searching xml documents via xml fragments. In SIGIR, pages 151--158, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. W. Chu, H. Yang, K. Chiang, M. Minock, G. Chow, and C. Larson. Cobase: A scalable and extensible cooperative information system. Journal of Intelligent Information Systems, 6(2/3):223--259, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, pages 240--251, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Doan, P. Domingos, and A. Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Doan, J. Madhavan, P. Domingos, and A. Y. Halevy. Learning to map between ontologies on the semantic web. In WWW, pages 662--673, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. X. Dong and A. Y. Halevy. Malleable schemas: A preliminary report. In WebDB, pages 139--144, 2005.Google ScholarGoogle Scholar
  13. X. Dong and A. Y. Halevy. A platform for personal information management and integration. In CIDR, pages 119--130, 2005.Google ScholarGoogle Scholar
  14. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. accepted by The IEEE Transations on knowledge and Data Engineering (TKDE), Jan 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Fuhr and K. Grosjohann. XIRQL: A query language for information retrieval in XML documents. In Research and Development in Information Retrieval, pages 172--180, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Godfrey. Minimization in cooperative response to failing database queries. International Journal of Cooperative Information Systems, 6(2):95--149, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  17. B. He and K. C. C. Chang. Statistical schema matching across web query interfaces. In Sigmod, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Karger, K. Bakshi, D. Huynh, D. Quan, and V. Sinha. Haystack: A customizable general-purpose information management tool for end users of semistructured. In CIDR, 2003.Google ScholarGoogle Scholar
  19. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Koudas, C. Li, A. K. H. Tung, and R. Vernica. Relaxing join and selection queries. In VLDB, pages 199--210, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Lee. Query relaxation for xml model. PhD thesis, 2002. University of California. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Li, C. Yu, and H. V. Jagadish. Schema free xquery. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Madhavan, P. A. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE, pages 57--68, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Polyzotis, M. N. Garofalakis, and Y. E. Ioannidis. Approximate xml query answers. In SIGMOD Conference, pages 263--274, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. M. Ponte and W. B. Croft. A Language modeling Approach to Information Retrieval. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Query relaxation using malleable schemas

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
      June 2007
      1210 pages
      ISBN:9781595936868
      DOI:10.1145/1247480
      • General Chairs:
      • Lizhu Zhou,
      • Tok Wang Ling,
      • Program Chair:
      • Beng Chin Ooi

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader