demonstration

CrowdMatcher: crowd-assisted schema matching

Authors:
Chen Jason Zhang

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

,
Ziyuan Zhao

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

,
Lei Chen

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

,
H. V. Jagadish

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Chen Caleb Cao

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataJune 2014Pages 721–724https://doi.org/10.1145/2588555.2594515

Published:18 June 2014Publication History

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Pages 721–724

ABSTRACT

Schema matching is a central challenge for data integration systems. Due to the inherent uncertainty arose from the inability of schema in fully capturing the semantics of the represented data, automatic tools are often uncertain about suggested matching results. However, human is good at understanding data represented in various forms and crowdsourcing platforms are making the human annotation process more affordable. Thus in this demo, we will show how to utilize the crowd to find the right matching. In order to do that, we need to make the tasks posted on the crowdsouricng platforms extremely simple, to be performed by non-expert people, and reduce the number of tasks as less as possible to save the cost.

We demonstrate CrowdMatcher, a hybrid machine-crowd system for schema matching. The machine-generated matchings are verified by correspondence correctness queries (CCQs), which is to ask the crowd to determine whether a given correspondence is correct or not. CrowdMatcher includes several original features: it integrates different matchings generated from classical schema matching tools; in order to minimize the cost of crowdsourcing, it automatically selects the most informative set of CCQs from the possible matchings; it is able to manage inaccurate answers provided by the workers; the crowdsourced answers are used to improve matching results.

References

William W. Cohen, Pradeep D. Ravikumar, and Stephen E. Fienberg. A comparison of string distance metrics for name-matching t asks. In IIWeb, pages 73--78, 2003.Google Scholar
Xin Luna Dong, Alon Y. Halevy, and Cong Yu. Data integrati on with uncertainty. VLDB J., 18(2):469--500, 2009. Google ScholarDigital Library
Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, and V. S. Subrahmanian. Aggregate query answering under uncertain s chema mappings. In ICDE, pages 940--951, 2009. Google ScholarDigital Library
Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Zoltán Miklós, and Karl Aberer. On leveraging crowdsourcing techniques for sc hema matching networks. In DASFAA (2), pages 139--154, 2013.Google Scholar
Andrew McCallum, Kamal Nigam, and Lyle H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In KDD, pages 169--178, 2000. Google ScholarDigital Library
Robert McCann, Warren Shen, and AnHai Doan. Matching schemas in online communities: A web 2.0 approach. In ICDE, pages 110--119, 2008. Google ScholarDigital Library
Renée J. Miller, Laura M. Haas, and Mauricio A. Hernández. Schema mapping as query discovery. In VLDB, pages 77--88, 2000. Google ScholarDigital Library
Erhard Rahm and Philip A. Bernstein. A survey of approach es to automatic schema matching. VLDB J., 10(4):334--350, 2001. Google ScholarDigital Library
Anish Das Sarma, Xin Dong, and Alon Y. Halevy. Bootstrapping pay-as-you-go data integration systems. In SIGMOD Conference, pages 861--874, 2008. Google ScholarDigital Library
Yongxin Tong, Caleb Chen Cao, Chen Jason Zhang, Yatao Li, and Lei Chen. Crowdcleaner: Data cleaning for multi-version data on the web via crowdsourcing. In ICDE 2014.Google ScholarCross Ref
Chen Jason Zhang, Lei Chen, H. V. Jagadish, and Chen Caleb Cao. Reducing uncertainty of schema matching via crowdsourcing. Proc. VLDB Endow., 6(9):757--768, July 2013. Google ScholarDigital Library

Index Terms

CrowdMatcher: crowd-assisted schema matching
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Data model extensions

Recommendations

XML Matchers: approaches and challenges

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely ...
Read More
Tuning the ensemble selection process of schema matchers

Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema ...
Read More
Matching large schemas: Approaches and evaluation

Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
General Chairs:
Curtis Dyreson
Utah State University, USA
,
Feifei Li
University of Utah, USA
,
Program Chair:
M. Tamer Özsu
University of Waterloo, Canada
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
schema matching
Qualifiers
- demonstration
Conference

Acceptance Rates
SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 616
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CrowdMatcher: crowd-assisted schema matching

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

XML Matchers: approaches and challenges

Tuning the ensemble selection process of schema matchers

Matching large schemas: Approaches and evaluation