Skip to main content
Log in

User-assisted code query customization and optimization

  • General
  • Special Issue: SOAP 2023
  • Published:
International Journal on Software Tools for Technology Transfer Aims and scope Submit manuscript

Abstract

Running static analysis rules in the wild as part of a commercial service demands special consideration of time limits and scalability, given the large and diverse real-world workloads that the rules are evaluated on. Furthermore, these rules do not run in isolation, which exposes opportunities for reuse of partial evaluation results across rules. In our work on Amazon CodeGuru Reviewer, and its underlying rule-authoring toolkit known as the Guru Query Language (GQL), we have encountered performance and scalability challenges, and identified corresponding optimization opportunities, such as caching, indexing, and customization of data-flow specification, which rule authors can take advantage of as built-in GQL constructs. Our experimental evaluation of a dataset of open-source GitHub repositories shows 3× speedup and perfect recall using indexing-based configurations, 2× speedup and 51% increase on the number of findings for caching-based optimization. Customizing the data-flow specification, such as expanding the tracking scope, can yield a remarkable increase in the number of findings, as much as 136%. However, this enhancement comes at the expense of a longer analysis time. Our evaluations emphasize the importance of customizing the data-flow specification, particularly when users operate under time constraints. This customization helps the analysis complete within the given time frame, ultimately leading to improved recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amazon Web Services: Boto3 - the AWS SDK for Java (2022). https://github.com/aws/aws-sdk-java

  2. Amazon Web Services: Boto3 - the AWS SDK for Python (2022). https://github.com/boto/boto3

  3. Amazon Web Services: AWS SDK for Python (Boto3) (2022). https://aws.amazon.com/sdk-for-python/

  4. Amazon Web Services: Missing pagination rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/missing-pagination/

  5. Amazon Web Services: Batch request with unchecked failures rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/aws-unchecked-batch-failures/

  6. Amazon Web Services: Inefficient polling of aws resource high rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/aws-polling-instead-of-waiter/

  7. Amazon Web Services: Check uncaught exceptions high rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/check-uncaught-exceptions/

  8. Amazon Web Services: Use of a deprecated method rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/deprecated-method/

  9. Amazon Web Services: What is Amazon CodeGuru Reviewer? (2023). https://docs.aws.amazon.com/codeguru/latest/reviewer-ug/welcome.html

  10. Amazon Web Services: Codeguru rules (2024). https://docs.aws.amazon.com/codeguru/detector-library/

  11. Arzt, S., Bodden, E.: Reviser: efficiently updating ide-/ifds-based data-flow analyses in response to incremental program changes. In: Jalote, P., Briand, L.C., van der Hoek, A. (eds.) 36th International Conference on Software Engineering, ICSE ’14, Hyderabad, India, May 31 - June 07, 2014, pp. 288–298. ACM (2014). https://doi.org/10.1145/2568225.2568243

    Chapter  Google Scholar 

  12. Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D., McDaniel, P.D.: Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In: O’Boyle, M.F.P., Pingali, K. (eds.) ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom, June 09-11, 2014, pp. 259–269. ACM (2014). https://doi.org/10.1145/2594291.2594299

    Chapter  Google Scholar 

  13. Calcagno, C., Infer, D.D.: An automatic program verifier for memory safety of C programs. In: Gheorghiu Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) Proceedings, NASA Formal Methods - Third International Symposium, NFM 2011, Pasadena, CA, USA, April 18-20, 2011. Lecture Notes in Computer Science, vol. 6617, pp. 459–465. Springer (2011). https://doi.org/10.1007/978-3-642-20398-5_33

    Chapter  Google Scholar 

  14. GitHub, Inc.: Codeql (2019). https://codeql.github.com

  15. Gu, R., Zuo, Z., Jiang, X., Yin, H., Wang, Z., Wang, L., Li, X., Huang, Y.: Towards efficient large-scale interprocedural program static analysis on distributed data-parallel computation. IEEE Trans. Parallel Distrib. Syst. 32(4), 867–883 (2021). https://doi.org/10.1109/TPDS.2020.3036190

    Article  Google Scholar 

  16. Hardekopf, B., Wiedermann, B., Churchill, B.R., Kashyap, V.: Widening for control-flow. In: McMillan, K.L., Rival, X. (eds.) Proceedings, Verification, Model Checking, and Abstract Interpretation - 15th International Conference, VMCAI 2014, San Diego, CA, USA, January 19-21, 2014. Lecture Notes in Computer Science, vol. 8318, pp. 472–491. Springer (2014). https://doi.org/10.1007/978-3-642-54013-4_26

    Chapter  Google Scholar 

  17. Kashyap, V., Dewey, K., Kuefner, E.A., Wagner, J., Gibbons, K., Sarracino, J., Wiedermann, B., Ben, H.: JSAI: a static analysis platform for javascript. In: Cheung, S.-C., Orso, A., Storey, M.-A.D. (eds.) Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16-22, 2014, pp. 121–132. ACM (2014). https://doi.org/10.1145/2635868.2635904

    Chapter  Google Scholar 

  18. Ko, Y., Lee, H., Dolby, J., Ryu, S.: Practically tunable static analysis framework for large-scale javascript applications (T). In: Cohen, M.B., Grunske, L., Whalen, M. (eds.) 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015, pp. 541–551. IEEE Computer Society (2015). https://doi.org/10.1109/ASE.2015.28

    Chapter  Google Scholar 

  19. Kulkarni, S., Mangal, R., Zhang, X., Naik, M.: Accelerating program analyses by cross-program training. In: Visser, E., Smaragdakis, Y. (eds.) Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016, pp. 359–377. ACM (2016). https://doi.org/10.1145/2983990.2984023

    Chapter  Google Scholar 

  20. Lyu, Y., Volokh, S., Halfond, W.G.J., Tripp, O.: SAND: a static analysis approach for detecting SQL antipatterns. In: Cadar, C., Zhang, X. (eds.) ISSTA ’21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Denmark, July 11-17, 2021, pp. 270–282. ACM (2021). https://doi.org/10.1145/3460319.3464818

    Chapter  Google Scholar 

  21. McPeak, S., Gros, C.-H., Ramanathan, M.K.: Scalable and incremental software bug detection. In: Meyer, B., Baresi, L., Mezini, M. (eds.) Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18–26, 2013, pp. 554–564. ACM (2013). https://doi.org/10.1145/2491411.2501854

    Chapter  Google Scholar 

  22. Mudduluru, R., Ramanathan, M.K.: Efficient incremental static analysis using path abstraction. In: Gnesi, S., Rensink, A. (eds.) Proceedings, Fundamental Approaches to Software Engineering - 17th International Conference, FASE 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Lecture Notes in Computer Science, vol. 58411, pp. 125–139. Springer (2014). https://doi.org/10.1007/978-3-642-54804-8_9

    Chapter  Google Scholar 

  23. Mukherjee, R., Tripp, O., Liblit, B., Wilson, M.: Static analysis for AWS best practices in python code. In: Ali, K., Vitek, J. (eds.) 36th European Conference on Object-Oriented Programming, ECOOP 2022, Berlin, Germany, June 6-10, 2022. LIPIcs, vol. 222, pp. 14:1–14:28. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022). https://doi.org/10.4230/LIPIcs.ECOOP.2022.14

    Chapter  Google Scholar 

  24. Pollock, L.L., Soffa, M.L.: An incremental version of iterative data flow analysis. IEEE Trans. Softw. Eng. 15(12), 1537–1549 (1989). https://doi.org/10.1109/32.58766

    Article  Google Scholar 

  25. Reps, T.W., Horwitz, S., Sagiv, S.: Precise interprocedural dataflow analysis via graph reachability. In: Cytron, R.K., Lee, P. (eds.) Conference Record of POPL’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Francisco, California, USA, January 23-25, 1995, pp. 49–61. ACM Press (1995). https://doi.org/10.1145/199448.199462

    Chapter  Google Scholar 

  26. Schubert, P.D., Leer, R., Hermann, B., Bodden, E.: Know your analysis: how instrumentation aids understanding static analysis. In: Grech, N., Lavoie, T. (eds.) Proceedings of the 8th ACM SIGPLAN International Workshop on State of the Art in Program Analysis, SOAP@PLDI 2019, Phoenix, AZ, USA, June 22, 2019, pp. 8–13. ACM (2019). https://doi.org/10.1145/3315568.3329965

    Chapter  Google Scholar 

  27. Schubert, P.D., Gazzillo, P., Patterson, Z., Braha, J., Schiebel, F., Hermann, B., Wei, S., Bodden, E.: Static data-flow analysis for software product lines in C. Autom. Softw. Eng. 29(1), 35 (2022). https://doi.org/10.1007/s10515-022-00333-1

    Article  Google Scholar 

  28. Semgrep, Inc.: Semgrep (2020). https://semgrep.dev

  29. Souter, A.L., Pollock, L.L.: Incremental call graph reanalysis for object-oriented software maintenance. In: 2001 International Conference on Software Maintenance, ICSM 2001, Florence, Italy, November 6–10, 2001, pp. 682–691. IEEE Computer Society (2001). https://doi.org/10.1109/ICSM.2001.972787

    Chapter  Google Scholar 

  30. Toman, J., Grossman, D.: Taming the static analysis beast. In: Lerner, B.S., Bodík, R., Krishnamurthi, S. (eds.) 2nd Summit on Advances in Programming Languages, SNAPL 2017, Asilomar, CA, USA, May 7-10, 2017. LIPIcs, vol. 71, pp. 18:1–18:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.SNAPL.2017.18

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liblit, B., Lyu, Y., Mukherjee, R. et al. User-assisted code query customization and optimization. Int J Softw Tools Technol Transfer 26, 607–619 (2024). https://doi.org/10.1007/s10009-024-00763-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10009-024-00763-0

Keywords