Skip to main content
Log in

PLQ: An Efficient Approach to Processing Pattern-Based Log Queries

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

As software systems grow more and more complex, extensive techniques have been proposed to analyze log data to obtain the insight of the system status. However, during log data analysis, tedious manual efforts are paid to search interesting or informative log patterns from a huge volume of log data, named pattern-based queries. Although existing log management tools and database management systems can also support pattern-based queries, they suffer from low efficiency. To deal with this problem, we propose a novel approach, named PLQ (Pattern-based Log Query). First, PLQ organizes logs into disjoint chunks and builds chunk-wise bitmap indexes for log types and attribute values. Then, based on bitmap indexes, PLQ finds candidate logs with a set of efficient bit-wise operations. Finally, PLQ fetches candidate logs and validates them according to the queried pattern. Extensive experiments are conducted on real-life datasets. According to experimental results, compared with existing log management systems, PLQ is more efficient in querying log patterns and has a higher pruning rate for filtering irrelevant logs. Moreover, in PLQ, since the ratio of the index size to the data size does not exceed 2.5% for log datasets of different sizes, PLQ has a high scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Hamooni H, Debnath B, Xu J, Zhang H, Jiang G, Mueen A. LogMine: Fast pattern recognition for log analytics. In Proc. the 25th ACM International on Conference on Information and Knowledge Management, October 2016, pp.1573-1582. https://doi.org/10.1145/2983323.2983358.

  2. He P, Zhu J, Zheng Z, Lyu M R. Drain: An online log parsing approach with fixed depth tree. In Proc. the 2017 IEEE International Conference on Web Services, June 2017, pp.33-40. https://doi.org/10.1109/ICWS.2017.13.

  3. Du M, Li F, Zheng G, Srikumar V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proc. the 2017 ACM SIGSAC Conference on Computer and Communications Security, October 30-November 3, 2017, pp.1285-1298. https://doi.org/10.1145/3133956.3134015.

  4. Lou J G, Fu Q, Yang S, Li J, Wu B. Mining program workflow from interleaved traces. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2010, pp.613-622. https://doi.org/10.1145/1835804.1835883.

  5. Beschastnikh I, Brun Y, Schneider S, Sloan M, Ernst M D. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proc. the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, September 2011, pp.267-277. https://doi.org/10.1145/2025113.2025151.

  6. Yuan D, Park S, Huang P, Liu Y, Lee M M, Tang X, Zhou Y, Savagé S. Be conservative: Enhancing failure diagnosis with proactive logging. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, October 2012, pp.293-306.

  7. Nagaraj K, Killian C, Neville J. Structured comparative analysis of systems logs to diagnose performance problems. In Proc. the 9th USENIX Symposium on Networked Systems Design and Implementation, April 2012, pp.353-366.

  8. Gao D, Jensen C S, Snodgrass R T, Soo M D. Join operations in temporal databases. The VLDB Journal, 2005, 14(1): 2-29. https://doi.org/10.1007/s00778-003-0111-3.

    Article  Google Scholar 

  9. Comer D E. Ubiquitous B-tree. ACM Computing Surveys, 1979, 11(2): 121-137. https://doi.org/10.1145/356770.356776.

    Article  MathSciNet  MATH  Google Scholar 

  10. Garcia-Molina H, Ullman J, Widom J. Database Systems: The Complete Book (2nd edition). Pearson Education India, 2008.

  11. Stonebraker M, Rowe L A. The design of POSTGRES. ACM SIGMOD Record, 1986, 15(2): 340-355. https://doi.org/10.1145/16856.16888.s

    Article  Google Scholar 

  12. Zhu J, He S, Liu J, He P, Xie Q, Zheng Z, Lyu M R. Tools and benchmarks for automated log parsing. In Proc. the 41st IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, May 2019, pp.121-130. https://doi.org/10.1109/ICSE-SEIP.2019.00021.

  13. Wu K, Otoo E, Shoshani A. An efficient compression scheme for bitmap indices. Technical Report, Lawrence Berkeley National Laboratory, 2004. https://escholarship.org/uc/item/2sp907t5, November 2020.

  14. Zhang H, Diao Y, Immerman N. On complexity and optimization of expensive queries in complex event processing. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.217-228. https://doi.org/10.1145/2588555.2593671.

  15. Yu J, Sarwat M. Two birds, one stone: A fast, yet lightweight, indexing scheme for modern database systems. Proceedings of the VLDB Endowment, 2016, 10(4): 385-396. https://doi.org/10.14778/3025111.3025120.

    Article  Google Scholar 

  16. He B, Hsiao H, Liu Z, Huang Y, Chen Y. Efficient iceberg query evaluation using compressed bitmap index. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(9): 1570-1583. https://doi.org/10.1109/TKDE.2011.73.

    Article  Google Scholar 

  17. Nguyen X T, Nguyen H T, Hoang T T, Inoue K, Shimojo O, Murayama T, Tominaga K, Pham C K. An efficient FPGA-based database processor for fast database analytics. In Proc. the 2016 IEEE International Symposium on Circuits and Systems, May 2016, pp.1758-1761. https://doi.org/10.1109/ISCAS.2016.7538908.

  18. Demers A J, Gehrke J, Panda B, Riedewald M, Sharma V, White W M. Cayuga: A general purpose event monitoring system. In Proc. the 3rd Biennial Conference on Innovative Data Systems Research, January 2007, pp.412-422.

  19. Ray M, Rundensteiner E A, Liu M, Gupta C, Wang S, Ari I. High-performance complex event processing using continuous sliding views. In Proc. the 2013 Joint EDBT/ICDT Conferences, March 2013, pp.525-536. https://doi.org/10.1145/2452376.2452437.

  20. Duan L, Pang T, Nummenmaa J, Zuo J, Zhang P, Tang C. Bus-OLAP: A data management model for non-on-time events query over bus journey data. Data Science and Engineering, 2018, 3(1): 52-67. https://doi.org/10.1007/s41019-018-0061-9.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Wang.

Supplementary Information

ESM 1

(PDF 320 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Wang, P., Qiao, F. et al. PLQ: An Efficient Approach to Processing Pattern-Based Log Queries. J. Comput. Sci. Technol. 37, 1239–1254 (2022). https://doi.org/10.1007/s11390-020-0653-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-0653-5

Keywords

Navigation