Automatic optimization of stream programs via source program operator graph transformations

Dayarathna, Miyuru; Suzumura, Toyotaro

doi:10.1007/s10619-013-7130-x

Automatic optimization of stream programs via source program operator graph transformations

Published: 10 August 2013

Volume 31, pages 543–599, (2013)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Miyuru Dayarathna¹ &
Toyotaro Suzumura²

653 Accesses
17 Citations
6 Altmetric
Explore all metrics

Abstract

Distributed data stream processing is a data analysis paradigm where massive amounts of data produced by various sources are analyzed online within real-time constraints. Execution performance of a stream program/query executed on such middleware is largely dependent on the ability of the programmer to fine tune the program to match the topology of the stream processing system. However, manual fine tuning of a stream program is a very difficult, error prone process that demands huge amounts of programmer time and expertise which are expensive to obtain. We describe an automated process for stream program performance optimization that uses semantic preserving automatic code transformation to improve stream processing job performance. We first identify the structure of the input program and represent the program structure in a Directed Acyclic Graph. We transform the graph using the concepts of Tri-OP Transformation and Bi-Op Transformation. The resulting sample program space is pruned using both empirical as well as profiling information to obtain a ranked list of sample programs which have higher performance compared to their parent program. We successfully implemented this methodology on a prototype stream program performance optimization mechanism called Hirundo. The mechanism has been developed for optimizing SPADE programs which run on System S stream processing run-time. Using five real world applications (called VWAP, CDR, Twitter, Apnoea, and Bargain) we show the effectiveness of our approach. Hirundo was able to identify a 31.1 times higher performance version of the CDR application within seven minutes time on a cluster of 4 nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Big data analytics on Apache Spark

Article 13 October 2016

Salman Salloum, Ruslan Dautov, … Joshua Zhexue Huang

A brief introduction to distributed systems

Article Open access 16 August 2016

Maarten van Steen & Andrew S. Tanenbaum

A survey on the evolution of stream processing systems

Article Open access 22 November 2023

Marios Fragkoulis, Paris Carbone, … Asterios Katsifodimos

References

Ahmed, R., Lee, A., Witkowski, A., Das, D., Su, H., Zait, M., Cruanes, T.: Cost-based query transformation in oracle. In: VLDB ’06, pp. 1026–1036 (2006)
Google Scholar
Aho, A.V., Ullman, J.D.: Node listings for reducible flow graphs. In: Proceedings of Seventh Annual ACM Symposium on Theory of Computing (STOC ’75), pp. 177–185. ACM, New York (1975)
Chapter Google Scholar
Akram, S., Marazakis, M., Bilas, A.: Understanding and improving the cost of scaling distributed event processing. In: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS ’12), pp. 290–301. ACM, New York (2012)
Chapter Google Scholar
Andrade, H., Gedik, B., Wu, K.-L., Yu, P.S.: Scale-up strategies for processing high-rate data streams in systems. In: IEEE 25th International Conference on Data Engineering (ICDE ’09), 29 March 2009–2 April 2009, pp. 1375–1378 (2009)
Chapter Google Scholar
Appel, A.W.: Modern Compiler Implementation in Java. Cambridge University Press, Cambridge (2002)
Book MATH Google Scholar
Babu, S.: Towards automatic optimization of mapreduce programs. In: SoCC ’10, pp. 137–142 (2010)
Chapter Google Scholar
Backman, N., Fonseca, R., Çetintemel, U.: Managing parallelism for stream processing in the cloud. In: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing (HotCDP ’12), pp. 1:1–1:5. ACM, New York (2012)
Google Scholar
Ballard, C., et al.: IBM Infosphere Streams: Harnessing Data in Motion. IBM (2010)
Banerjee, P., Chandy, J.A., Gupta, M., Hodges, E.W. IV, Holm, J.G., Lain, A., Palermo, D.J., Ramaswamy, S., Su, E.: The paradigm compiler for distributed-memory multicomputers. Computer 28, 37–47 (1995)
Article Google Scholar
Bellamkonda, S., Ahmed, R., Witkowski, A., Amor, A., Zait, M., Lin, C.-C.: Enhanced subquery optimizations in oracle. Proc. VLDB Endow. 2, 1366–1377 (2009)
Google Scholar
Biem, A., Elmegreen, B., Verscheure, O., Turaga, D., Andrade, H., Cornwell, T.: A streaming approach to radio astronomy imaging. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), March 2010, pp. 1654–1657 (2010)
Chapter Google Scholar
Biem, A., Bouillet, E., Feng, H., Ranganathan, A., Riabov, A., Verscheure, O., Koutsopoulos, H., Moran, C.: IBM infosphere streams for scalable, real-time, intelligent transportation services. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD ’10), pp. 1093–1104. ACM, New York (2010)
Chapter Google Scholar
Blount, M., Ebling, M.R., Eklund, J.M., James, A.G., McGregor, C., Percival, N., Smith, K.P., Sow, D.: Real-time analysis for intensive care: development and deployment of the Artemis analytic system. IEEE Eng. Med. Biol. Mag. 29(2), 110–118 (2010)
Article Google Scholar
Bouillet, E., Kothari, R., Kumar, V., Mignet, L., Nathan, S., Ranganathan, A., Turaga, D.S., Udrea, O., Verscheure, O.: Processing 6 billion cdrs/day: from research to production. Experience report. In: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS ’12), pp. 264–267. ACM, New York (2012)
Chapter Google Scholar
Catley, C., Smith, K., McGregor, C., James, A., Eklund, J.M.: A framework to model and translate clinical rules to support complex real-time analysis of physiological and clinical data. In: Proceedings of the 1st ACM International Health Informatics Symposium (IHI ’10), pp. 307–315. ACM, New York (2010)
Google Scholar
Chapman, B.M., Herbeck, H., Zima, H.P.: Automatic support for data distribution. In: DMCC, May, pp. 51–58 (1991)
Google Scholar
Cook, D.: Gold parsing system. URL: http://www.goldparser.org/. December (2011)
Dave, C., Eigenmann, R.: Automatically tuning parallel and parallelized programs. In: Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing (LCPC ’09), pp. 126–139. Springer, Berlin (2010)
Chapter Google Scholar
Dayarathna, M., Suzumura, T.: Hirundo: a mechanism for automated production of optimized data stream graphs. In: Proceedings of the Third Joint WOSP/SIPEW International Conference on Performance Engineering (ICPE ’12), pp. 335–346. ACM, New York (2012)
Chapter Google Scholar
Dayarathna, M., Suzumura, T.: A mechanism for stream program performance recovery in resource limited compute clusters. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) Database Systems for Advanced Applications. Lecture Notes in Computer Science, vol. 7826, pp. 164–178. Springer, Berlin (2013)
Chapter Google Scholar
Dennis, J.: Data flow graphs. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 512–518. Springer, New York (2011)
Google Scholar
Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, New York (2002)
MATH Google Scholar
Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.: Spade: the system s declarative stream processing engine. In: SIGMOD ’08, pp. 1123–1134 (2008)
Chapter Google Scholar
Gedik, B., Andrade, H., Wu, K.-L.: A code generation approach to optimizing high-performance distributed data stream processing. In: CIKM ’09, pp. 847–856 (2009)
Chapter Google Scholar
Hall, M., Chame, J., Chen, C., Shin, J., Rudy, G., Khan, M.: Loop transformation recipes for code generation and auto-tuning. In: Languages and Compilers for Parallel Computing, pp. 50–64 (2010)
Chapter Google Scholar
Herodotou, H., Borisov, N., Babu, S.: Query optimization techniques for partitioned tables. In: SIGMOD ’11, pp. 49–60 (2011)
Google Scholar
Hill, M., Campbell, M., Chang, Y.-C., Iyengar, V.: Event detection in sensor networks for modern oil fields. In: Proceedings of the Second International Conference on Distributed Event-Based Systems (DEBS ’08), pp. 95–102. ACM, New York (2008)
Chapter Google Scholar
Hirzel, M., Andrade, H., Gedik, B., Kumar, V., Losa, G., Mendell, M., Nasgaard, H., Soule, R., Wu, K.-L.: Spl stream processing language specification. November (2009)
IBM: IBM infosphere streams version 1.2: programming model and language reference. February (2010)
Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD ’98, pp. 106–117 (1998)
Chapter Google Scholar
Karcher, T., Pankratius, V.: Run-time automatic performance tuning for multicore applications. In: Proceedings of the 17th International Conference on Parallel Processing, Part I (Euro-Par ’11), pp. 3–14. Springer, Berlin (2011)
Google Scholar
Kasyanov, V.N., Evstigneev, V.A.: Graph Theory for Programmers, Algorithms for Processing Trees. Kluwer Academic, Norwell (2000)
Book MATH Google Scholar
Khandekar, R., Hildrum, K., Parekh, S., Rajan, D., Wolf, J., Wu, K.-L., Andrade, H., Gedik, B.: Cola: Optimizing stream processing applications via graph partitioning. In: Middleware 2009, pp. 308–327 (2009)
Chapter Google Scholar
Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, New York (2002)
Book MATH Google Scholar
Liew, C.S., Atkinson, M.P., van Hemert, J.I., Han, L.: Towards optimising distributed data streaming graphs using parallel streams. In: HPDC ’10, pp. 725–736. ACM, New York (2010)
Google Scholar
Marsland, S.: Machine Learning: An Algorithmic Perspective. Chapman & Hall/CRC, London (2009)
Google Scholar
Metwally, A., Agrawal, D., El Abbadi, A.: Duplicate detection in click streams. In: Proceedings of the 14th International Conference on World Wide Web (WWW ’05), pp. 12–21. ACM, New York (2005)
Chapter Google Scholar
Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: KDCloud 2010, December 2010 (2010)
Google Scholar
Palermo, D., Hodges, E., Banerjee, P.: Compiler optimization of dynamic data distributions for distributed-memory multicomputers. In: Compiler Optimizations for Scalable Parallel Systems, vol. 1808, pp. 445–484 (2001)
Chapter Google Scholar
Park, Y., King, R., Nathan, S., Most, W., Andrade, H.: Evaluation of a high-volume, low-latency market data processing system implemented with IBM middleware. Softw. Pract. Exp. 42(1), 37–56 (2012)
Article Google Scholar
Qin, J., Fahringer, T., Prodan, R.: A novel graph based approach for automatic composition of high quality grid workflows. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing (HPDC ’09), pp. 167–176. ACM, New York (2009)
Chapter Google Scholar
Scipy: scientific tools for python. URL: http://www.scipy.org/, July (2011)
Skiena, S.S.: The Algorithm Design Manual, 2nd edn. Springer, Berlin (2008)
Book MATH Google Scholar
Sodhi, S., Subhlok, J., Xu, Q.: Performance prediction with skeletons. Clust. Comput. 11, 151–165 (2008)
Article Google Scholar
Suzumura, T., Yasue, T., Onodera, T.: Scalable performance of systems for extract-transform-load processing. In: SYSTOR ’10, pp. 7:1–7:14 (2010)
Google Scholar
Twitter. #numbers. URL: http://blog.twitter.com/2011/03/numbers.html (2011)
Wang, Z., O’Boyle, M.F.P.: Partitioning streaming parallelism for multi-cores: a machine learning based approach. In: PACT ’10, pp. 307–318 (2010)
Chapter Google Scholar
Yaikhom, G., Liew, C., Han, L., van Hemert, J., Atkinson, M., Krause, A.: Federated enactment of workflow patterns. In: Euro-Par 2010—Parallel Processing, vol. 6271, pp. 317–328 (2010)
Google Scholar
Yang, L.T., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: SC ’05, Washington, DC, USA (2005)
Google Scholar
Fetterly, D., Yu, Y., Isard, M., Budiu, M.: Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI ’08, pp. 1–14 (2008)
Google Scholar

Download references

Acknowledgements

This research was supported by the Japan Science and Technology Agency’s CREST project titled “Development of System Software Technologies for post-Peta Scale High Performance Computing”.

Author information

Authors and Affiliations

Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8552, Japan
Miyuru Dayarathna
Department of Computer Science, Tokyo Institute of Technology/IBM Research-Tokyo, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8552, Japan
Toyotaro Suzumura

Authors

Miyuru Dayarathna
View author publications
You can also search for this author in PubMed Google Scholar
Toyotaro Suzumura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miyuru Dayarathna.

Additional information

Communicated by Divyakant Agrawal.

Appendices

Appendix A: Data flow graphs of the high performance sample applications

This appendix lists the optimized data flow graphs of input applications that are produced by Hirundo (Figs. 31–37).

Appendix B: Some example transformations for quadri-operator transformation

An example collection of transformed operator blocks possible using Quadri-Operator Transformation (Quadri-OP transformation) are shown in Fig. 38. Note that these Quadri-OP transformations are listed only for illustrating the complexity involved in transforming four operator blocks (A_B_C_D) at a time. We list few possible transformations only for two corresponding transformations produced by Tri-OP Transformation. Transformed operator block from Tri-OP transformation shown in Fig. 38(a) has been produced by applying 1-2-1 transformation pattern. Similar possible transformed operator blocks from Quadri-OP transformation are shown in Figs. 38(a-1) to 38(a-4). Similar analogous transformed operator blocks for Tri-OP transformed operator block of 2-2-2 transformation (shown in Fig. 38(b)) are shown in Figs. 38(b-1) to 38(b-7). These transformed operator blocks are produced only for a transformation depth of 2. Hence, transformations even at smaller transformation depths produce significantly larger numbers of transformed operator blocks for Quadri-OP transformation which makes Quadri-OP transformation not suitable for Hirundo’s code transformer. Furthermore, as can be observed from these figures such as Figs. 38(a), 38(a-3) and Figs. 38(b), 38(b-1); many of the Quadri-OP transformed operator blocks can be produced by using Tri-OP transformation along with operator fusion. Due to these reasons we conduct only up to Tri-OP transformation in Hirundo’s program generator.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dayarathna, M., Suzumura, T. Automatic optimization of stream programs via source program operator graph transformations. Distrib Parallel Databases 31, 543–599 (2013). https://doi.org/10.1007/s10619-013-7130-x

Download citation

Published: 10 August 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10619-013-7130-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automatic optimization of stream programs via source program operator graph transformations

Abstract

Access this article