skip to main content
10.1145/2396761.2396872acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

What is the IQ of your data transformation system?

Published:29 October 2012Publication History

ABSTRACT

Mapping and translating data across different representations is a crucial problem in information systems. Many formalisms and tools are currently used for this purpose, to the point that developers typically face a difficult question: "what is the right tool for my translation task?" In this paper, we introduce several techniques that contribute to answer this question. Among these, a fairly general definition of a data transformation system, a new and very efficient similarity measure to evaluate the outputs produced by such a system, and a metric to estimate user efforts. Based on these techniques, we are able to compare a wide range of systems on many translation tasks, to gain interesting insights about their effectiveness, and, ultimately, about their "intelligence".

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Alexe, W. Tan, and Y. Velegrakis. Comparing and Evaluating Mapping Systems with STBenchmark. PVLDB, 1(2):1468--1471, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Alexe, W. Tan, and Y. Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. PVLDB, 1(1):230--244, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Augsten, M. Bohlen, and J. Gamper. Approximate Matching of Hierarchical Data Using pq-Grams. In VLDB, pages 301--312, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bernstein, E. Kaufmann, C. Kiefer, and C. Bürki. SimPack: A Generic Java Library for Similiarity Measures in Ontologies. Technical report, Department of Informatics, University of Zurich, 2005.Google ScholarGoogle Scholar
  6. P. A. Bernstein and S. Melnik. Model Management 2.0: Manipulating Richer Mappings. In SIGMOD, pages 1--12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Bille. A Survey on Tree Edit Distance and Related Problems. TCS, 337:217--239, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Bonifati, G. Mecca, A. Pappalardo, S. Raunich, and G. Summa. Schema Mapping Verification: The Spicy Way. In EDBT, pages 85--96, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Dessloch, M. A. Hernandez, R. Wisnesky, A. Radwan, and J. Zhou. Orchid: Integrating Schema Mapping and ETL. In ICDE, pages 1307--1316, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Fagin, P. Kolaitis, R. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. TCS, 336(1):89--124, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Fagin, P. Kolaitis, and L. Popa. Data Exchange: Getting to the Core. ACM TODS, 30(1):174--210, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Fortin. The Graph Isomorphism Problem. Technical report, Department of Computer Science, University of Alberta, 1996.Google ScholarGoogle Scholar
  13. . X.Gao, B. Xiao, D. Tao, and X. Li. A Survey of Graph Edit Distance. Pattern Analysis & Application, 13:113--129, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gartner. Magic Quadrant for Data Integration Tools. http://www.gartner.com/technology/, 2011.Google ScholarGoogle Scholar
  15. G. Gottlob and A. Nash. Efficient Core Computation in Data Exchange. J. of the ACM, 55(2):1--49, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. M. Haas. Beauty and the Beast: The Theory and Practice of Information Integration. In ICDT, pages 28--43, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Hull and M. Yoshikawa. ILOG: Declarative Creation and Manipulation of Object Identifiers. In VLDB, pages 455--468, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kimball and J. Caserta. The Data Warehouse ETL Toolkit. Wiley and Sons, 2004.Google ScholarGoogle Scholar
  19. D. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. A. Majchrzak, T. Jansen, and H. Kuchen. Efficiency evaluation of open source etl tools. In SAC, pages 287--294, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Marnette, G. Mecca, and P. Papotti. Scalable data exchange with functional dependencies. PVLDB, 3(1):105--116, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Marnette, G. Mecca, P. Papotti, S. Raunich, and D. Santoro. ++SPICY: an opensource tool for second-generation schema mapping and data exchange. PVLDB, 4(11):1438--1441, 2011.Google ScholarGoogle Scholar
  23. G. Mecca, P. Papotti, and S. Raunich. Core Schema Mappings. In SIGMOD, pages 655--668, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. J. Miller, L. M. Haas, and M. A. Hernandez. Schema Mapping as Query Discovery. In VLDB, pages 77--99, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernandez, and R. Fagin. Translating Web Data. In VLDB, pages 598--609, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. A. Roth, H. F. Korth, and A. Silberschatz. Extended Algebra and Calculus for Nested Relational Databases. ACM TODS, 13:389--417, October 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Rull Fort, F. C., E. Teniente, and T. Urpí. Validation of Mappings between Schemas. Data and Know. Eng., 66(3):414--437, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Seligman, P. Mork, A. Halevy, K. Smith, M. J. Carey, K. Chen, C. Wolf, J. Madhavan, A. Kannan, and D. Burdick. OpenII: an Open Source Information Integration Toolkit. In SIGMOD, pages 1057--1060, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Simitsis, P. Vassiliadis, U. Dayal, A. Karagiannis, and V. Tziovara. Benchmarking etl workflows. In TPCTC, pages 199--220, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. ten Cate, L. Chiticariu, P. Kolaitis, and W. C. Tan. Laconic Schema Mappings: Computing Core Universal Solutions by Means of SQL Queries. PVLDB, 2(1):1006--1017, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. J. Van Rijsbergen. Information Retrieval. Butterworths (London, Boston), 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Wyatt, B. Caufield, and D. Pol. Principles for an etl benchmark. In TPCTC, pages 183--198, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. What is the IQ of your data transformation system?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
        October 2012
        2840 pages
        ISBN:9781450311564
        DOI:10.1145/2396761

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 October 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader