Abstract
The optimization capabilities of RDBMSs make them attractive for executing data transformations that support ETL, data cleaning and integration activities. Despite the fact that many useful data transformations can be expressed as relational queries, an important class of data transformations that produces several output tuples for a single input tuple are not adequately supported by RDBMSs.
In this paper we address the issue of extending a RDBMS to include the mapper operator. In particular, we propose an SQL-like syntax together with several logical optimizations involving relational operators and the mapper. Finally, we experimentally compare the mapper operator with RDBMS implementations of one-to-many data transformations and validate the logical optimizations proposed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Ullman, J.D.: Universality of Data Retrieval Languages. In: Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 110–119. ACM Press, New York (1979)
Amer-Yahia, S., Cluet, S.: A Declarative Approach to Optimize Bulk Loading into Databases. ACM Transactions of Database Systems 29(2), 233–281 (2004)
Apache. Derby homepage (2005), http://db.apache.org/derby
Carreira, P., Galhardas, H.: Efficient Development of Data Migration Transformations. In: Proceedings of the ACM SIGMOD International Conference on the Management of Data (2004)
Carreira, P., Galhardas, H., Lopes, A., Pereira, J.: One-to-many Transformation Through Data Mappers. Data and Knowledge Engineering Journal (DKE) (2006)
Carreira, P., Galhardas, H., Pereira, J., Martins, F., Silva, M.J.: On the Performance of One-to-many Data Transformations. In: Proc. of the 5th International Workshop on Quality in Databases at VLDB (QDB 2007) (2007)
Chaudhuri, S.: An Overview of Query Optimization in Relational Systems. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1998), pp. 34–43. ACM Press, New York (1998)
Chaudhuri, S., Shim, K.: Query Optimization in the Presence of Foreign Functions. In: Proceedings of the International Conference on Very Large Data Bases (VLDB 1993), pp. 529–542 (1993)
Cui, Y., Widom, J.: Lineage Tracing for General Data Warehouse Transformation. In: Proceedings of the International Conference on Very Large Data Bases (VLDB 2001) (2001)
Cunningham, C., Graefe, G., Galindo-Legaria, C.A.: PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS. In: Proceedings of the International Conference on Very Large Data Bases (VLDB 2004), pp. 998–1009. Morgan Kaufmann, San Francisco (2004)
Eisenberg, A., Melton, J., Kulkarni, K., Michels, J.-E., Zemke, F.: SQL:2003 has been published. In: Proceedings of the ACM SIGMOD Record, vol. 33(1), pp. 119–126 (2004)
Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming, 4th edn. O’Reilly & Associates, Sebastopol (2005)
Galhardas, H., Florescu, D., Shasha, D., Simon, E.: AJAX: An Extensible Data Cleaning Tool. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, vol. 2(29) (2000)
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.A.: Declarative Data Cleaning: Language, Model, and Algorithms. In: Proceedings of the International Conference on Very Large Data Bases (VLDB 2001) (2001)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems – The Complete Book. Prentice-Hall, Englewood Cliffs (2002)
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1(1), 29–53 (1997)
Gupta, A., Subramanian, S., Bellamkonda, S., Bozkaya, T., Folkert, N., Sheng, L., Witkowski, A.: Data Densification in a Relational Database System. In: Proc. of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), pp. 855–859. ACM, New York (2004)
Haas, L.M., Miller, R.J., Niswonger, B., Roth, M.T., Schwarz, P.M., Wimmers, E.L.: Transforming Heterogeneous Data with Database Middleware: Beyond Integration. IEEE Data Engineering Bulletin 22(1), 31–36 (1999)
Hellerstein, J.M.: Optimization Techniques for Queries with Expensive Methods. ACM Transactions on Database Systems 22(2), 113–157 (1998)
Klug, A.: Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions. Journal of the ACM 29(3), 699–717 (1982)
Lomet, D., Rundensteiner, E.A. (eds.): Special Issue on Data Transformations. IEEE Data Engineering Bulletin 22 (1999)
Melton, J., Simon, A.R.: SQL:1999 Understanding Relational Language Components. Morgan Kaufmann Publishers, Inc., San Francisco (2002)
Miller, R.J., Haas, L.M., Hernandéz, M., Ho, C.T.H., Fagin, R., Popa, L.: The Clio Project: Managing Heterogeneity. SIGMOD Record 1(30) (2001)
Neumann, T., Helmer, S., Moerkotte, G.: On the Optimal Ordering of Maps, Selections, and Joins under Factorization (2005)
Rahm, E., Do, H.-H.: Data Cleaning: Problems and Current Approaches. IEEE Bulletin of the Technical Committee on Data Engineering 24(4) (2000)
Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: Proceedings of the International Conference on Very Large Data Bases (VLDB 2001) (2001)
Shan, M.-C., Neimat, M.-A.: Optimization of Relational Algebra Expressions Containing Recursion Operators. In: Proceedings of the 19th Annual Conference on Computer Science (CSC 1991), pp. 332–341. ACM, New York (1991)
Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL Processes in Data Warehouses. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005) (2005)
TPC. Benchmark H Standard Specification (1999), http://www.tpc.org
van den Bercken, J., Dittrich, J.P., Kräamer, J., Schäafer, T., Schneider, M., Seeger, B.: XXL – A Library Approach to Supporting Efficient Implementations of Advanced Database Queries. In: Proceedings of the International Conference on Very Large Data Bases (VLDB 2001) (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carreira, P., Galhardas, H., Pereira, J.D., Wichert, A. (2008). On Handling One-to-Many Transformations in Relational Systems. In: Filipe, J., Cordeiro, J., Cardoso, J. (eds) Enterprise Information Systems. ICEIS 2007. Lecture Notes in Business Information Processing, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88710-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-88710-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88709-6
Online ISBN: 978-3-540-88710-2
eBook Packages: Computer ScienceComputer Science (R0)