Abstract
Interactive entity centric analysis of log data can help us gain fine granularity insights on business. In this paper, firstly we describe a fiber based partitioning method for log data, which accelerate later entity centric analysis. Secondly, we present our fiber based partitioner which is used by Spark SQL query engine. Fiber based partitioner takes locations of data blocks into account when loading data from HDFS into RDD, and when shuffling data from upstream operators to downstream operators during joining, avoids data interchange between node and speeds up query processing. Finally, we present our experiment results which demonstrates that fiber based partitioner improve entity centric queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xiongpai, Q.I.N., Guodong, J.I.N., Yang, L.I.U., Yiming, C., Xiaoyong, D.U.: Entity fiber based partitioning, no loss staging and fast loading of log data. In: PDCAT, pp. 199–203. IEEE Press, New York (2016)
Acknowledgements
This work is supported by Science and Technology Project of the State Grid Corporation of China (SGBJDK00KJJS1500180) and the State Grid Information & Telecommunication Group CO., LTD. (SGITG-KJ-JSKF[2015]0010).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sun, Q., Qin, X., Deng, B., Cui, W. (2017). Interactive Entity Centric Analysis of Log Data. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10367. Springer, Cham. https://doi.org/10.1007/978-3-319-63564-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-63564-4_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63563-7
Online ISBN: 978-3-319-63564-4
eBook Packages: Computer ScienceComputer Science (R0)