research-article

What is the IQ of your data transformation system?

Authors:
Giansalvatore Mecca

Università della Basilicata, Potenza, Italy

Università della Basilicata, Potenza, Italy
View Profile

,
Paolo Papotti

Qatar Computing Research Institute, Doha, Qatar

Qatar Computing Research Institute, Doha, Qatar
View Profile

,
Salvatore Raunich

University of Leipzig, Leipzig, Germany

University of Leipzig, Leipzig, Germany
View Profile

,
Donatello Santoro

Università della Basilicata, Potenza, Italy

Università della Basilicata, Potenza, Italy
View Profile

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementOctober 2012Pages 872–881https://doi.org/10.1145/2396761.2396872

Published:29 October 2012Publication History

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 872–881

ABSTRACT

Mapping and translating data across different representations is a crucial problem in information systems. Many formalisms and tools are currently used for this purpose, to the point that developers typically face a difficult question: "what is the right tool for my translation task?" In this paper, we introduce several techniques that contribute to answer this question. Among these, a fairly general definition of a data transformation system, a new and very efficient similarity measure to evaluate the outputs produced by such a system, and a metric to estimate user efforts. Based on these techniques, we are able to compare a wide range of systems on many translation tasks, to gain interesting insights about their effectiveness, and, ultimately, about their "intelligence".

References

S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
B. Alexe, W. Tan, and Y. Velegrakis. Comparing and Evaluating Mapping Systems with STBenchmark. PVLDB, 1(2):1468--1471, 2008. Google ScholarDigital Library
B. Alexe, W. Tan, and Y. Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. PVLDB, 1(1):230--244, 2008. Google ScholarDigital Library
N. Augsten, M. Bohlen, and J. Gamper. Approximate Matching of Hierarchical Data Using pq-Grams. In VLDB, pages 301--312, 2005. Google ScholarDigital Library
A. Bernstein, E. Kaufmann, C. Kiefer, and C. Bürki. SimPack: A Generic Java Library for Similiarity Measures in Ontologies. Technical report, Department of Informatics, University of Zurich, 2005.Google Scholar
P. A. Bernstein and S. Melnik. Model Management 2.0: Manipulating Richer Mappings. In SIGMOD, pages 1--12, 2007. Google ScholarDigital Library
P. Bille. A Survey on Tree Edit Distance and Related Problems. TCS, 337:217--239, 2005. Google ScholarDigital Library
A. Bonifati, G. Mecca, A. Pappalardo, S. Raunich, and G. Summa. Schema Mapping Verification: The Spicy Way. In EDBT, pages 85--96, 2008. Google ScholarDigital Library
S. Dessloch, M. A. Hernandez, R. Wisnesky, A. Radwan, and J. Zhou. Orchid: Integrating Schema Mapping and ETL. In ICDE, pages 1307--1316, 2008. Google ScholarDigital Library
R. Fagin, P. Kolaitis, R. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. TCS, 336(1):89--124, 2005. Google ScholarDigital Library
R. Fagin, P. Kolaitis, and L. Popa. Data Exchange: Getting to the Core. ACM TODS, 30(1):174--210, 2005. Google ScholarDigital Library
F. Fortin. The Graph Isomorphism Problem. Technical report, Department of Computer Science, University of Alberta, 1996.Google Scholar
. X.Gao, B. Xiao, D. Tao, and X. Li. A Survey of Graph Edit Distance. Pattern Analysis & Application, 13:113--129, 2010. Google ScholarDigital Library
Gartner. Magic Quadrant for Data Integration Tools. http://www.gartner.com/technology/, 2011.Google Scholar
G. Gottlob and A. Nash. Efficient Core Computation in Data Exchange. J. of the ACM, 55(2):1--49, 2008. Google ScholarDigital Library
L. M. Haas. Beauty and the Beast: The Theory and Practice of Information Integration. In ICDT, pages 28--43, 2007. Google ScholarDigital Library
R. Hull and M. Yoshikawa. ILOG: Declarative Creation and Manipulation of Object Identifiers. In VLDB, pages 455--468, 1990. Google ScholarDigital Library
R. Kimball and J. Caserta. The Data Warehouse ETL Toolkit. Wiley and Sons, 2004.Google Scholar
D. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. Google ScholarDigital Library
T. A. Majchrzak, T. Jansen, and H. Kuchen. Efficiency evaluation of open source etl tools. In SAC, pages 287--294, 2011. Google ScholarDigital Library
B. Marnette, G. Mecca, and P. Papotti. Scalable data exchange with functional dependencies. PVLDB, 3(1):105--116, 2010. Google ScholarDigital Library
B. Marnette, G. Mecca, P. Papotti, S. Raunich, and D. Santoro. ++SPICY: an opensource tool for second-generation schema mapping and data exchange. PVLDB, 4(11):1438--1441, 2011.Google Scholar
G. Mecca, P. Papotti, and S. Raunich. Core Schema Mappings. In SIGMOD, pages 655--668, 2009. Google ScholarDigital Library
R. J. Miller, L. M. Haas, and M. A. Hernandez. Schema Mapping as Query Discovery. In VLDB, pages 77--99, 2000. Google ScholarDigital Library
L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernandez, and R. Fagin. Translating Web Data. In VLDB, pages 598--609, 2002. Google ScholarDigital Library
M. A. Roth, H. F. Korth, and A. Silberschatz. Extended Algebra and Calculus for Nested Relational Databases. ACM TODS, 13:389--417, October 1988. Google ScholarDigital Library
G. Rull Fort, F. C., E. Teniente, and T. Urpí. Validation of Mappings between Schemas. Data and Know. Eng., 66(3):414--437, 2008. Google ScholarDigital Library
L. Seligman, P. Mork, A. Halevy, K. Smith, M. J. Carey, K. Chen, C. Wolf, J. Madhavan, A. Kannan, and D. Burdick. OpenII: an Open Source Information Integration Toolkit. In SIGMOD, pages 1057--1060, 2010. Google ScholarDigital Library
A. Simitsis, P. Vassiliadis, U. Dayal, A. Karagiannis, and V. Tziovara. Benchmarking etl workflows. In TPCTC, pages 199--220, 2009. Google ScholarDigital Library
B. ten Cate, L. Chiticariu, P. Kolaitis, and W. C. Tan. Laconic Schema Mappings: Computing Core Universal Solutions by Means of SQL Queries. PVLDB, 2(1):1006--1017, 2009. Google ScholarDigital Library
C. J. Van Rijsbergen. Information Retrieval. Butterworths (London, Boston), 1979. Google ScholarDigital Library
L. Wyatt, B. Caufield, and D. Pol. Principles for an etl benchmark. In TPCTC, pages 183--198, 2009. Google ScholarDigital Library

Index Terms

What is the IQ of your data transformation system?
1. Information systems
  1. Data management systems
    1. Information integration
      1. Extraction, transformation and loading
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Interoperability

Recommendations

The SPARQL2XQuery interoperability framework

In the context of the emergent Web of Data, a large number of organizations, institutes and companies (e.g., DBpedia, Data.gov, GeoNames, PubMed) adopt the Linked Data practices. Utilizing the Semantic Web (SW) technologies, they publish their data and ...
Read More
Spreadsheet-based complex data transformation
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Spreadsheets are used by millions of users as a routine all-purpose data management tool. It is now increasingly necessary for external applications and services to consume spreadsheet data. In this paper, we investigate the problem of transforming ...
Read More
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance Computing

Performance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ETL
benchmarks
data transformation
schema mappings
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 245
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

What is the IQ of your data transformation system?

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

The SPARQL2XQuery interoperability framework

Spreadsheet-based complex data transformation

Large System Performance of SPEC OMP2001 Benchmarks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

What is the IQ of your data transformation system?

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

The SPARQL2XQuery interoperability framework

Spreadsheet-based complex data transformation

Large System Performance of SPEC OMP2001 Benchmarks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media