Optimization of Row Pattern Matching over Sequence Data in Spark SQL

Nakabasami, Kosuke; Kitagawa, Hiroyuki; Nasu, Yuya

doi:10.1007/978-3-030-27615-7_1

Kosuke Nakabasami^14,16,
Hiroyuki Kitagawa¹⁵ &
Yuya Nasu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11706))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1735 Accesses

Abstract

Due to the advance of information and communications technology and sensor technology, a large quantity of sequence data (time series data, log data, etc.) are generated and processed every day. Row pattern matching for the sequence data stored in relational databases was standardized as SQL/RPR in 2016. Today, in addition to relational databases, there are many frameworks for processing a large amount of data in parallel and distributed computing environments. They include MapReduce and Spark. Hive and Spark SQL enable us to code data analysis processes in SQL-like query languages. Row pattern matching is also beneficial in Hive and Spark SQL. However, computational cost of the row pattern matching process is large and it is needed to make this process efficient. In this paper, we propose two optimization methods to realize the reduction of computational cost for row pattern matching process. We focus on Spark and show design and implementation of the proposed methods for Spark SQL. We verify by the experiments that our optimization methods really contribute to the reduction of the processing time of Spark SQL queries including row pattern matching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scalable and parallel sequential pattern mining using spark

Article 10 May 2018

A Distributed Rule Engine for Streaming Big Data

Mining Uncertain Sequential Patterns in Iterative MapReduce

References

19075-5:2016(E), I.T.: Information technology - database languages - sql technical reports - part 5: row pattern recognition in sql. technical report. Technical report, ISO copyright office (2016)
Google Scholar
Agrawal, J., Diao, Y., Gyllstrom, D., Immerman, N.: Efficient pattern matching over event streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 147–160 (2008)
Google Scholar
Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394 (2015)
Google Scholar
Cadonna, B., Gamper, J., Böhlen, M.H.: Efficient event pattern matching with match windows. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2012), pp. 471–479 (2012)
Google Scholar
Demers, A., Gehrke, J., Panda, B., Riedewald, M., Sharma, V., White, W.: Cayuga: a general purpose event monitoring system. In: CIDR 2007, pp. 412–422 (2007)
Google Scholar
Foundation, T.A.S.: Hadoop (2018). http://hadoop.apache.org/
Foursquare: Foursquare (2018). https://foursquare.com
Laker, K.: A technical deep dive into pattern matching using match$\_$recognize (2016). http://www.oracle.com/technetwork/database/bi-datawarehousing/mr-deep-dive-3769287.pdf
Mei, Y., Madden, S.: ZStream: a cost-based query processor for adaptively detecting composite events. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 193–206 (2009)
Google Scholar
Thusoo, A., et al.: Hive - a petabyte scale data warehouse using Hadoop. In: Proceedings of the 26th International Conference on Data Engineering (ICDE2010) (2010)
Google Scholar
Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. In: SIGMOD 2006, pp. 407–418 (2006)
Google Scholar
Yang, D., Zhang, D., Chen, L., Qu, B.: NationTelescope: monitoring and visualizing large-scale collective behavior in LBSNs. J. Netw. Comput. Appl. (JNCA) 55, 170–180 (2015)
Article Google Scholar
Yang, D., Zhang, D., Qu, B.: Participatory cultural mapping based on collective behavior data in location based social networks. In: ACM Trans. on Intelligent Systems and Technology (TIST) (2015)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stonica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud2010), vol. 55, p. 10 (2010)
Google Scholar

Download references

Acknowledgement

This work was partly supported by Grant-in-Aid for Scientific Research (B) (#19H04114) from JSPS.

Author information

Authors and Affiliations

Railway Technical Research Institute, Hikari-cho 2-8-38, Kokubunji-shi, Tokyo, Japan
Kosuke Nakabasami
Center for Computational Sciences, University of Tsukuba, Tennodai 1-1-1, Tsukuba-shi, Ibaraki, Japan
Hiroyuki Kitagawa
Graduate School of Systems and Information Engineering, University of Tsukuba, Tennodai 1-1-1, Tsukuba-shi, Ibaraki, Japan
Kosuke Nakabasami & Yuya Nasu

Authors

Kosuke Nakabasami
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Kitagawa
View author publications
You can also search for this author in PubMed Google Scholar
Yuya Nasu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kosuke Nakabasami , Hiroyuki Kitagawa or Yuya Nasu .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
The University of Texas at Arlington, Arlington, TX, USA
Sharma Chakravarthy
Johannes Kepler University of Linz, Linz, Austria
Gabriele Anderst-Kotsis
Software Competence Center Hagenberg, Hagenberg im Mühlkreis, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakabasami, K., Kitagawa, H., Nasu, Y. (2019). Optimization of Row Pattern Matching over Sequence Data in Spark SQL. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-27615-7_1
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimization of Row Pattern Matching over Sequence Data in Spark SQL

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Scalable and parallel sequential pattern mining using spark

A Distributed Rule Engine for Streaming Big Data

Mining Uncertain Sequential Patterns in Iterative MapReduce

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optimization of Row Pattern Matching over Sequence Data in Spark SQL

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Scalable and parallel sequential pattern mining using spark

A Distributed Rule Engine for Streaming Big Data

Mining Uncertain Sequential Patterns in Iterative MapReduce

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation