skip to main content
10.1145/2588555.2594515acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
demonstration

CrowdMatcher: crowd-assisted schema matching

Authors Info & Claims
Published:18 June 2014Publication History

ABSTRACT

Schema matching is a central challenge for data integration systems. Due to the inherent uncertainty arose from the inability of schema in fully capturing the semantics of the represented data, automatic tools are often uncertain about suggested matching results. However, human is good at understanding data represented in various forms and crowdsourcing platforms are making the human annotation process more affordable. Thus in this demo, we will show how to utilize the crowd to find the right matching. In order to do that, we need to make the tasks posted on the crowdsouricng platforms extremely simple, to be performed by non-expert people, and reduce the number of tasks as less as possible to save the cost.

We demonstrate CrowdMatcher, a hybrid machine-crowd system for schema matching. The machine-generated matchings are verified by correspondence correctness queries (CCQs), which is to ask the crowd to determine whether a given correspondence is correct or not. CrowdMatcher includes several original features: it integrates different matchings generated from classical schema matching tools; in order to minimize the cost of crowdsourcing, it automatically selects the most informative set of CCQs from the possible matchings; it is able to manage inaccurate answers provided by the workers; the crowdsourced answers are used to improve matching results.

References

  1. William W. Cohen, Pradeep D. Ravikumar, and Stephen E. Fienberg. A comparison of string distance metrics for name-matching t asks. In IIWeb, pages 73--78, 2003.Google ScholarGoogle Scholar
  2. Xin Luna Dong, Alon Y. Halevy, and Cong Yu. Data integrati on with uncertainty. VLDB J., 18(2):469--500, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, and V. S. Subrahmanian. Aggregate query answering under uncertain s chema mappings. In ICDE, pages 940--951, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Zoltán Miklós, and Karl Aberer. On leveraging crowdsourcing techniques for sc hema matching networks. In DASFAA (2), pages 139--154, 2013.Google ScholarGoogle Scholar
  5. Andrew McCallum, Kamal Nigam, and Lyle H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In KDD, pages 169--178, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Robert McCann, Warren Shen, and AnHai Doan. Matching schemas in online communities: A web 2.0 approach. In ICDE, pages 110--119, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Renée J. Miller, Laura M. Haas, and Mauricio A. Hernández. Schema mapping as query discovery. In VLDB, pages 77--88, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Erhard Rahm and Philip A. Bernstein. A survey of approach es to automatic schema matching. VLDB J., 10(4):334--350, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Anish Das Sarma, Xin Dong, and Alon Y. Halevy. Bootstrapping pay-as-you-go data integration systems. In SIGMOD Conference, pages 861--874, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yongxin Tong, Caleb Chen Cao, Chen Jason Zhang, Yatao Li, and Lei Chen. Crowdcleaner: Data cleaning for multi-version data on the web via crowdsourcing. In ICDE 2014.Google ScholarGoogle ScholarCross RefCross Ref
  11. Chen Jason Zhang, Lei Chen, H. V. Jagadish, and Chen Caleb Cao. Reducing uncertainty of schema matching via crowdsourcing. Proc. VLDB Endow., 6(9):757--768, July 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CrowdMatcher: crowd-assisted schema matching

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
      June 2014
      1645 pages
      ISBN:9781450323765
      DOI:10.1145/2588555

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 June 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • demonstration

      Acceptance Rates

      SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader