ABSTRACT
Schema matching is a central challenge for data integration systems. Due to the inherent uncertainty arose from the inability of schema in fully capturing the semantics of the represented data, automatic tools are often uncertain about suggested matching results. However, human is good at understanding data represented in various forms and crowdsourcing platforms are making the human annotation process more affordable. Thus in this demo, we will show how to utilize the crowd to find the right matching. In order to do that, we need to make the tasks posted on the crowdsouricng platforms extremely simple, to be performed by non-expert people, and reduce the number of tasks as less as possible to save the cost.
We demonstrate CrowdMatcher, a hybrid machine-crowd system for schema matching. The machine-generated matchings are verified by correspondence correctness queries (CCQs), which is to ask the crowd to determine whether a given correspondence is correct or not. CrowdMatcher includes several original features: it integrates different matchings generated from classical schema matching tools; in order to minimize the cost of crowdsourcing, it automatically selects the most informative set of CCQs from the possible matchings; it is able to manage inaccurate answers provided by the workers; the crowdsourced answers are used to improve matching results.
- William W. Cohen, Pradeep D. Ravikumar, and Stephen E. Fienberg. A comparison of string distance metrics for name-matching t asks. In IIWeb, pages 73--78, 2003.Google Scholar
- Xin Luna Dong, Alon Y. Halevy, and Cong Yu. Data integrati on with uncertainty. VLDB J., 18(2):469--500, 2009. Google ScholarDigital Library
- Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, and V. S. Subrahmanian. Aggregate query answering under uncertain s chema mappings. In ICDE, pages 940--951, 2009. Google ScholarDigital Library
- Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Zoltán Miklós, and Karl Aberer. On leveraging crowdsourcing techniques for sc hema matching networks. In DASFAA (2), pages 139--154, 2013.Google Scholar
- Andrew McCallum, Kamal Nigam, and Lyle H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In KDD, pages 169--178, 2000. Google ScholarDigital Library
- Robert McCann, Warren Shen, and AnHai Doan. Matching schemas in online communities: A web 2.0 approach. In ICDE, pages 110--119, 2008. Google ScholarDigital Library
- Renée J. Miller, Laura M. Haas, and Mauricio A. Hernández. Schema mapping as query discovery. In VLDB, pages 77--88, 2000. Google ScholarDigital Library
- Erhard Rahm and Philip A. Bernstein. A survey of approach es to automatic schema matching. VLDB J., 10(4):334--350, 2001. Google ScholarDigital Library
- Anish Das Sarma, Xin Dong, and Alon Y. Halevy. Bootstrapping pay-as-you-go data integration systems. In SIGMOD Conference, pages 861--874, 2008. Google ScholarDigital Library
- Yongxin Tong, Caleb Chen Cao, Chen Jason Zhang, Yatao Li, and Lei Chen. Crowdcleaner: Data cleaning for multi-version data on the web via crowdsourcing. In ICDE 2014.Google ScholarCross Ref
- Chen Jason Zhang, Lei Chen, H. V. Jagadish, and Chen Caleb Cao. Reducing uncertainty of schema matching via crowdsourcing. Proc. VLDB Endow., 6(9):757--768, July 2013. Google ScholarDigital Library
Index Terms
- CrowdMatcher: crowd-assisted schema matching
Recommendations
XML Matchers: approaches and challenges
Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely ...
Tuning the ensemble selection process of schema matchers
Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema ...
Matching large schemas: Approaches and evaluation
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high ...
Comments