Abstract
The inefficiency of integration processes—as an abstraction of workflow-based integration tasks—is often reasoned by low resource utilization and significant waiting times for external systems. With the aim to overcome these problems, we proposed the concept of process vectorization. There, instance-based integration processes are transparently executed with the pipes-and-filters execution model. Here, the term vectorization is used in the sense of processing a sequence (vector) of messages by one standing process. Although it has been shown that process vectorization achieves a significant throughput improvement, this concept has two major drawbacks. First, the theoretical performance of a vectorized integration process mainly depends on the performance of the most cost-intensive operator. Second, the practical performance strongly depends on the number of available threads. In this paper, we present an advanced optimization approach that addresses the mentioned problems. Therefore, we generalize the vectorization problem and explain how to vectorize process plans in a cost-based manner. Due to the exponential complexity, we provide a heuristic computation approach and formally analyze its optimality. In conclusion of our evaluation, the message throughput can be significantly increased compared to both the instance-based execution as well as the rule-based process vectorization.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Biornstad, B., Pautasso, C., Alonso, G.: Control the flow: How to safely compose streaming services into business processes. In: IEEE SCC (2006)
Boehm, M., Habich, D., Lehner, W., Wloka, U.: Vectorizing instance-based integration processes. In: ICEIS (2009), http://wwwdb.inf.tu-dresden.de/team/archives/2007/04/dipl_wirtinf_ma.php
Boehm, M., Habich, D., Lehner, W., Wloka, U.: An advanced transaction model for recovery processing of integration processes. In: ADBIS (2008)
Boehm, M., Habich, D., Wloka, U., Bittner, J., Lehner, W.: Towards self-optimization of message transformation processes. In: ADBIS (2007)
Boehm, M., Habich, D., Lehner, W., Wloka, U.: Dipbench toolsuite: A framework for benchmarking integration systems. In: ICDE (2008)
Dalvi, N.N., Sanghai, S.K., Roy, P., Sudarshan, S.: Pipelining in multi-query optimization. In: PODS (2001)
Hasan, W., Motwani, R.: Optimization algorithms for exploiting the parallelism-communication tradeoff in pipelined parallelism. In: VLDB (1994)
Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD (2000)
Wilschut, A.N., van Gils, S.A.: A model for pipelined query execution. In: MASCOTS (1993)
Johnson, R., Hardavellas, N., Pandis, I., Mancheril, N., Harizopoulos, S., Sabirli, K., Ailamaki, A., Falsafi, B.: To share or not to share? In: VLDB (2007)
Harizopoulos, S., Ailamaki, A.: A case for staged database systems. In: CIDR (2003)
Gao, K., Harizopoulos, S., Pandis, I., Shkapenyuk, V., Ailamaki, A.: Simultaneous pipelining in qpipe: Exploiting work sharing opportunities across queries. In: ICDE (2006)
Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: Qpipe: A simultaneously pipelined relational query engine. In: SIGMOD (2005)
Ives, Z.G., Florescu, D., Friedman, M., Levy, A.Y., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD (1999)
Lee, R., Zhou, M., Liao, H.: Request window: an approach to improve throughput of rdbms-based data integration system. In: VLDB (2007)
Schmidt, S., Berthold, H., Lehner, W.: Qstream: Deterministic querying of data streams. In: VLDB (2004)
Boehm, A., Marth, E., Kanne, C.C.: The demaq system: declarative development of distributed applications. In: SIGMOD (2008)
Abadi, D.J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J.H., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.B.: The design of the borealis stream processing engine. In: CIDR (2005)
Babcock, B., Babu, S., Datar, M., Motwani, R., Thomas, D.: Operator scheduling in data stream systems. VLDB J. 13(4) (2004)
Carney, D., Çetintemel, U., Rasin, A., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Operator scheduling in a data stream manager. In: VLDB (2003)
Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: VLDB (2004)
Schmidt, S., Legler, T., Schaller, D., Lehner, W.: Real-time scheduling for data stream management systems. In: ECRTS (2005)
Cammert, M., Heinz, C., Krämer, J., Seeger, B., Vaupel, S., Wolske, U.: Flexible multi-threaded scheduling for continuous queries over data streams. In: ICDE Workshops (2007)
Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB (2006)
Gounaris, A., Yfoulis, C., Sakellariou, R., Dikaiakos, M.D.: Robust runtime optimization of data transfer in queries over web services. In: ICDE (2008)
Lemos, M., Casanova, M.A., Furtado, A.L.: Process pipeline scheduling. J. Syst. Softw. 81(3) (2008)
Simitsis, A., Vassiliadis, P., Sellis, T.: Optimizing etl processes in data warehouses. In: ICDE (2005)
Hull, R., Llirbat, F., Kumar, B., Zhou, G., Dong, G., Su, J.: Optimization techniques for data-intensive decision flows. In: ICDE (2000)
Li, H., Zhan, D.: Workflow timed critical path optimization. Nature and Science 3(2) (2005)
Vrhovnik, M., Schwarz, H., Suhre, O., Mitschang, B., Markl, V., Maier, A., Kraft, T.: An approach to optimize data processing in business processes. In: VLDB (2007)
Boehm, M., Habich, D., Lehner, W., Wloka, U.: Workload-based optimization of integration processes. In: CIKM (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boehm, M., Habich, D., Preissler, S., Lehner, W., Wloka, U. (2009). Cost-Based Vectorization of Instance-Based Integration Processes. In: Grundspenkis, J., Morzy, T., Vossen, G. (eds) Advances in Databases and Information Systems. ADBIS 2009. Lecture Notes in Computer Science, vol 5739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03973-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-03973-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03972-0
Online ISBN: 978-3-642-03973-7
eBook Packages: Computer ScienceComputer Science (R0)