ABSTRACT
Pattern matching, also known as Match-Recognize in SQL, is an expensive operator of particular relevance in many event stream applications. However, because of its sequential nature and challenging latency requirements, current stream processing engines do not provide any parallel processing support for pattern matching. In addition, hardware accelerators based on dedicated GPUs also offer limited support due to the overhead of transferring data between their local and main memory. In contrast, however, integrated GPUs (iGPUs), with their ability to access main memory directly, offer great potential to accelerate pattern matching. This paper presents the first full-fledged implementation of pattern matching cooperatively using iGPUs and CPUs. Our results obtained from a preliminary experimental performance comparison confirm the potential of our iGPU-based approaches for accelerating pattern matching.
- 2016. ISO/IEC TR 19075-5:2016, Information technology - Database languages - SQL Technical Reports - Part 5: Row Pattern Recognition in SQL. Retrieved March 13, 2019 from http://standards.iso.org/ittf/PubliclyAvailableStandards/ http://standards.iso.org/ittf/PubliclyAvailableStandards/, accessed March 13, 2019.Google Scholar
- 2019. Esper CEP. Retrieved October 28, 2019 from http://www.espertech.com/esperGoogle Scholar
- Jagrati Agrawal, Yanlei Diao, Daniel Gyllstrom, and Neil Immerman. 2008. Efficient pattern matching over event streams. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD ’08. 147–160. https://doi.org/10.1145/1376616.1376634Google ScholarDigital Library
- Cagri Balkesen, Nihal Dindar, Matthias Wetter, and Nesime Tatbul. 2013. RIP: Run-based intra-query parallelism for scalable complex event processing. In Proceedings of the 2013 ACM International Conference on Distributed Event-Based Systems - DEBS ’13. 3–14. https://doi.org/10.1145/2488222.2488257Google ScholarDigital Library
- Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28–38. http://sites.computer.org/debull/A15dec/p28.pdfGoogle Scholar
- Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, and Riccardo Sisto. 2010. INFAnt: NFA Pattern Matching on GPGPU Devices. SIGCOMM Comput. Commun. Rev. 40, 5 (oct 2010), 20–26. https://doi.org/10.1145/1880153.1880157Google ScholarDigital Library
- Gianpaolo Cugola and Alessandro Margara. 2012. Low latency complex event processing on parallel hardware. J. Parallel Distributed Comput. 72, 2 (2012), 205–218. https://doi.org/10.1016/j.jpdc.2011.11.002Google ScholarDigital Library
- Alan Demers, Johannes Gehrke, Mingsheng Hong, Biswanath Panda, Mirek Riedewald, Varun Sharma, and Walker White. 2007. Cayuga: A General Purpose Event Monitoring System. In Proceedings of the 2007 Biennial Conference on Innovative Data Systems Research - CIDR ’07. 412–422. https://doi.org/10.1145/1247480.1247620Google ScholarDigital Library
- Yanlei Diao, Neil Immerman, and Daniel Gyllstrom. 2007. Sase+: An agile language for kleene closure over event streams. Technical Report. University of Massachusetts.Google Scholar
- Yanlei Diao, Neil Immerman, and Daniel Gyllstrom. 2007. Sase+: An agile language for kleene closure over event streams. Technical Report.Google Scholar
- Bugra Gedik, Rajesh Bordawekar, and Philip S. Yu. 2009. CellJoin: a parallel stream join operator for the cell processor. VLDB J. 18, 2 (2009), 501–519. https://doi.org/10.1007/s00778-008-0116-zGoogle ScholarDigital Library
- Martin Hirzel. 2012. Partition and Compose: Parallel Complex Event Processing. In Proceedings of the 2012 ACM International Conference on Distributed and Event-based Systems - DEBS ’12. 191–200. https://doi.org/10.1145/2335484.2335506Google ScholarDigital Library
- Mark Joselli, Marcelo Panaro de Moraes Zamith, Esteban Walter Gonzalez Clua, Anselmo Antunes Montenegro, Aura Conci, Regina Leal-Toledo, Luis Valente, Bruno Feijó, Marcos Cordeiro d’Ornellas, and Cesar Tadeu Pozzer. 2008. Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems. In Proceedings of the 11th IEEE International Conference on Computational Science and Engineering, CSE 2008, São Paulo, SP, Brazil, July 16-18, 2008. 48–55.Google ScholarDigital Library
- Tomas Karnagel, Dirk Habich, Benjamin Schlegel, and Wolfgang Lehner. 2013. The HELLS-join: a heterogeneous stream join for extremely large windows. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN 1013, New York, NY, USA, June 24, 2013. 2.Google ScholarDigital Library
- Ilya Kolchinsky and Assaf Schuster. 2018. Join Query Optimization Techniques for Complex Event Processing Applications. Proc. VLDB Endow. 11, 11 (2018), 1332–1345. https://doi.org/10.14778/3236187.3236189Google ScholarDigital Library
- Ilya Kolchinsky and Assaf Schuster. 2019. Real-Time Multi-Pattern Detection over Event Streams. In Proceedings of the 2019 ACM SIGMOD international conference on Management of data - SIGMOD ’19. 589–606. https://doi.org/10.1145/3299869.3319869Google ScholarDigital Library
- Ilya Kolchinsky, Izchak Sharfman, and Assaf Schuster. 2015. Lazy evaluation methods for detecting complex events. In Proceedings of the 2015 ACM International Conference on Distributed Event-Based Systems - DEBS ’15. 34–45. https://doi.org/10.1145/2675743.2771832Google ScholarDigital Library
- Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa, and Peter R. Pietzuch. 2016. SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016. 555–569.Google ScholarDigital Library
- Michael Körber, Jakob Eckstein, Nikolaus Glombiewski, and Bernhard Seeger. 2019. Event Stream Processing on Heterogeneous System Architecture. In Proceedings of the 15th International Workshop on Data Management on New Hardware, DaMoN 2019, Amsterdam, The Netherlands, 1 July 2019. 3:1–3:10.Google ScholarDigital Library
- Yuan Mei and Samuel Madden. 2009. ZStream : A Cost-based Query Processor for Adaptively Detecting Composite Events Categories and Subject Descriptors. In Proceedings of the 2009 SIGMOD international conference on Management of data - SIGMOD ’09. 193–206. https://doi.org/10.1145/1559845.1559867Google ScholarDigital Library
- Wen mei W. Hwu. 2015. Heterogeneous System Architecture: A new compute platform infrastructure. Morgan Kaufmann.Google Scholar
- Saoni Mukherjee, Yifan Sun, Paul Blinzer, Amir Kavyan Ziabari, and David R. Kaeli. 2016. A comprehensive performance analysis of HSA and OpenCL 2.0. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2016, Uppsala, Sweden, April 17-19, 2016. 183–193.Google ScholarCross Ref
- Marcus Pinnecke, David Broneske, and Gunter Saake. 2015. Toward GPU Accelerated Data Stream Processing. In Proceedings of the 27th GI-Workshop Grundlagen von Datenbanken, Gommern, Germany, May 26-29, 2015. 78–83.Google Scholar
- Medhabi Ray, Chuan Lei, and Elke A. Rundensteiner. 2016. Scalable Pattern Sharing on Event Streams. In Proceedings of the 2016 ACM SIGMOD international conference on Management of data - SIGMOD ’16. 495–510. https://doi.org/10.1145/2882903.2882947Google ScholarDigital Library
- Nicholas Poul Schultz-Møller, Matteo Migliavacca, and Peter Pietzuch. 2009. Distributed complex event processing with query rewriting. In Proceedings of the 2009 ACM International Conference on Distributed EventBased Systems - DEBS ’09. 4:1–4:12. https://doi.org/10.1145/1619258.1619264Google ScholarDigital Library
- Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, Xiangyu Li, Saoni Mukherjee, Carter McCardwell, Alejandro Villegas, and David R. Kaeli. 2016. Hetero-mark, a benchmark suite for CPU-GPU collaborative computing. In 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25-27, 2016. 13–22.Google ScholarCross Ref
- Uri Verner, Assaf Schuster, and Mark Silberstein. 2011. Processing data streams with hard real-time constraints on heterogeneous systems. In Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31 - June 04, 2011. 120–129.Google ScholarDigital Library
- Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56–65. https://doi.org/10.1145/2934664Google ScholarDigital Library
- Steffen Zeuch, Sebastian Breß, Tilmann Rabl, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, and Volker Markl. 2019. Analyzing Efficient Stream Processing on Modern Hardware. Proc. VLDB Endow. 12, 5 (2019), 516–530. https://doi.org/10.14778/3303753.3303758Google ScholarDigital Library
- Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. Analyzing efficient stream processing on modern hardware. Proceedings of the VLDB Endowment 12, 5 (2019), 516–530.Google ScholarDigital Library
- Feng Zhang, Lin Yang, Shuhao Zhang, Bingsheng He, Wei Lu, and Xiaoyong Du. 2020. FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15-17, 2020. 633–647.Google Scholar
- Feng Zhang, Jidong Zhai, Bingsheng He, Shuhao Zhang, and Wenguang Chen. 2017. Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures. IEEE Trans. Parallel Distributed Syst. 28, 3 (2017), 905–918. https://doi.org/10.1109/TPDS.2016.2586074Google ScholarDigital Library
- Feng Zhang, Jidong Zhai, Bo Wu, Bingsheng He, Wenguang Chen, and Xiaoyong Du. 2021. Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures. IEEE Trans. Knowl. Data Eng. 33, 3 (2021), 867–881. https://doi.org/10.1109/TKDE.2019.2940184Google Scholar
- Haopeng Zhang, Yanlei Diao, and Neil Immerman. 2014. On complexity and optimization of expensive queries in complex event processing. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD ’14. 217–228. https://doi.org/10.1145/2588555.2593671Google ScholarDigital Library
Recommendations
Bit-parallel approximate pattern matching
Advanced SIMD features on GPUs and Xeon Phis promote efficient long pattern search.A tiled approach to accelerating the Wu-Manber algorithm on GPUs has been proposed.Both the GPU and Xeon Phi yield two orders-of-magnitude speedup over one CPU core.The ...
GPU-accelerated string matching for database applications
Implementations of relational operators on GPU processors have resulted in order of magnitude speedups compared to their multicore CPU counterparts. Here we focus on the efficient implementation of string matching operators common in SQL queries. Due to ...
Comments