Abstract
Due to the advance of information and communications technology and sensor technology, a large quantity of sequence data (time series data, log data, etc.) are generated and processed every day. Row pattern matching for the sequence data stored in relational databases was standardized as SQL/RPR in 2016. Today, in addition to relational databases, there are many frameworks for processing a large amount of data in parallel and distributed computing environments. They include MapReduce and Spark. Hive and Spark SQL enable us to code data analysis processes in SQL-like query languages. Row pattern matching is also beneficial in Hive and Spark SQL. However, computational cost of the row pattern matching process is large and it is needed to make this process efficient. In this paper, we propose two optimization methods to realize the reduction of computational cost for row pattern matching process. We focus on Spark and show design and implementation of the proposed methods for Spark SQL. We verify by the experiments that our optimization methods really contribute to the reduction of the processing time of Spark SQL queries including row pattern matching.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
19075-5:2016(E), I.T.: Information technology - database languages - sql technical reports - part 5: row pattern recognition in sql. technical report. Technical report, ISO copyright office (2016)
Agrawal, J., Diao, Y., Gyllstrom, D., Immerman, N.: Efficient pattern matching over event streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 147–160 (2008)
Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394 (2015)
Cadonna, B., Gamper, J., Böhlen, M.H.: Efficient event pattern matching with match windows. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2012), pp. 471–479 (2012)
Demers, A., Gehrke, J., Panda, B., Riedewald, M., Sharma, V., White, W.: Cayuga: a general purpose event monitoring system. In: CIDR 2007, pp. 412–422 (2007)
Foundation, T.A.S.: Hadoop (2018). http://hadoop.apache.org/
Foursquare: Foursquare (2018). https://foursquare.com
Laker, K.: A technical deep dive into pattern matching using match\(\_\)recognize (2016). http://www.oracle.com/technetwork/database/bi-datawarehousing/mr-deep-dive-3769287.pdf
Mei, Y., Madden, S.: ZStream: a cost-based query processor for adaptively detecting composite events. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 193–206 (2009)
Thusoo, A., et al.: Hive - a petabyte scale data warehouse using Hadoop. In: Proceedings of the 26th International Conference on Data Engineering (ICDE2010) (2010)
Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. In: SIGMOD 2006, pp. 407–418 (2006)
Yang, D., Zhang, D., Chen, L., Qu, B.: NationTelescope: monitoring and visualizing large-scale collective behavior in LBSNs. J. Netw. Comput. Appl. (JNCA) 55, 170–180 (2015)
Yang, D., Zhang, D., Qu, B.: Participatory cultural mapping based on collective behavior data in location based social networks. In: ACM Trans. on Intelligent Systems and Technology (TIST) (2015)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stonica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud2010), vol. 55, p. 10 (2010)
Acknowledgement
This work was partly supported by Grant-in-Aid for Scientific Research (B) (#19H04114) from JSPS.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Nakabasami, K., Kitagawa, H., Nasu, Y. (2019). Optimization of Row Pattern Matching over Sequence Data in Spark SQL. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-27615-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)