Abstract
Running static analysis rules in the wild as part of a commercial service demands special consideration of time limits and scalability, given the large and diverse real-world workloads that the rules are evaluated on. Furthermore, these rules do not run in isolation, which exposes opportunities for reuse of partial evaluation results across rules. In our work on Amazon CodeGuru Reviewer, and its underlying rule-authoring toolkit known as the Guru Query Language (GQL), we have encountered performance and scalability challenges, and identified corresponding optimization opportunities, such as caching, indexing, and customization of data-flow specification, which rule authors can take advantage of as built-in GQL constructs. Our experimental evaluation of a dataset of open-source GitHub repositories shows 3× speedup and perfect recall using indexing-based configurations, 2× speedup and 51% increase on the number of findings for caching-based optimization. Customizing the data-flow specification, such as expanding the tracking scope, can yield a remarkable increase in the number of findings, as much as 136%. However, this enhancement comes at the expense of a longer analysis time. Our evaluations emphasize the importance of customizing the data-flow specification, particularly when users operate under time constraints. This customization helps the analysis complete within the given time frame, ultimately leading to improved recall.
Similar content being viewed by others
References
Amazon Web Services: Boto3 - the AWS SDK for Java (2022). https://github.com/aws/aws-sdk-java
Amazon Web Services: Boto3 - the AWS SDK for Python (2022). https://github.com/boto/boto3
Amazon Web Services: AWS SDK for Python (Boto3) (2022). https://aws.amazon.com/sdk-for-python/
Amazon Web Services: Missing pagination rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/missing-pagination/
Amazon Web Services: Batch request with unchecked failures rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/aws-unchecked-batch-failures/
Amazon Web Services: Inefficient polling of aws resource high rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/aws-polling-instead-of-waiter/
Amazon Web Services: Check uncaught exceptions high rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/check-uncaught-exceptions/
Amazon Web Services: Use of a deprecated method rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/deprecated-method/
Amazon Web Services: What is Amazon CodeGuru Reviewer? (2023). https://docs.aws.amazon.com/codeguru/latest/reviewer-ug/welcome.html
Amazon Web Services: Codeguru rules (2024). https://docs.aws.amazon.com/codeguru/detector-library/
Arzt, S., Bodden, E.: Reviser: efficiently updating ide-/ifds-based data-flow analyses in response to incremental program changes. In: Jalote, P., Briand, L.C., van der Hoek, A. (eds.) 36th International Conference on Software Engineering, ICSE ’14, Hyderabad, India, May 31 - June 07, 2014, pp. 288–298. ACM (2014). https://doi.org/10.1145/2568225.2568243
Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D., McDaniel, P.D.: Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In: O’Boyle, M.F.P., Pingali, K. (eds.) ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom, June 09-11, 2014, pp. 259–269. ACM (2014). https://doi.org/10.1145/2594291.2594299
Calcagno, C., Infer, D.D.: An automatic program verifier for memory safety of C programs. In: Gheorghiu Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) Proceedings, NASA Formal Methods - Third International Symposium, NFM 2011, Pasadena, CA, USA, April 18-20, 2011. Lecture Notes in Computer Science, vol. 6617, pp. 459–465. Springer (2011). https://doi.org/10.1007/978-3-642-20398-5_33
GitHub, Inc.: Codeql (2019). https://codeql.github.com
Gu, R., Zuo, Z., Jiang, X., Yin, H., Wang, Z., Wang, L., Li, X., Huang, Y.: Towards efficient large-scale interprocedural program static analysis on distributed data-parallel computation. IEEE Trans. Parallel Distrib. Syst. 32(4), 867–883 (2021). https://doi.org/10.1109/TPDS.2020.3036190
Hardekopf, B., Wiedermann, B., Churchill, B.R., Kashyap, V.: Widening for control-flow. In: McMillan, K.L., Rival, X. (eds.) Proceedings, Verification, Model Checking, and Abstract Interpretation - 15th International Conference, VMCAI 2014, San Diego, CA, USA, January 19-21, 2014. Lecture Notes in Computer Science, vol. 8318, pp. 472–491. Springer (2014). https://doi.org/10.1007/978-3-642-54013-4_26
Kashyap, V., Dewey, K., Kuefner, E.A., Wagner, J., Gibbons, K., Sarracino, J., Wiedermann, B., Ben, H.: JSAI: a static analysis platform for javascript. In: Cheung, S.-C., Orso, A., Storey, M.-A.D. (eds.) Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16-22, 2014, pp. 121–132. ACM (2014). https://doi.org/10.1145/2635868.2635904
Ko, Y., Lee, H., Dolby, J., Ryu, S.: Practically tunable static analysis framework for large-scale javascript applications (T). In: Cohen, M.B., Grunske, L., Whalen, M. (eds.) 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015, pp. 541–551. IEEE Computer Society (2015). https://doi.org/10.1109/ASE.2015.28
Kulkarni, S., Mangal, R., Zhang, X., Naik, M.: Accelerating program analyses by cross-program training. In: Visser, E., Smaragdakis, Y. (eds.) Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016, pp. 359–377. ACM (2016). https://doi.org/10.1145/2983990.2984023
Lyu, Y., Volokh, S., Halfond, W.G.J., Tripp, O.: SAND: a static analysis approach for detecting SQL antipatterns. In: Cadar, C., Zhang, X. (eds.) ISSTA ’21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Denmark, July 11-17, 2021, pp. 270–282. ACM (2021). https://doi.org/10.1145/3460319.3464818
McPeak, S., Gros, C.-H., Ramanathan, M.K.: Scalable and incremental software bug detection. In: Meyer, B., Baresi, L., Mezini, M. (eds.) Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18–26, 2013, pp. 554–564. ACM (2013). https://doi.org/10.1145/2491411.2501854
Mudduluru, R., Ramanathan, M.K.: Efficient incremental static analysis using path abstraction. In: Gnesi, S., Rensink, A. (eds.) Proceedings, Fundamental Approaches to Software Engineering - 17th International Conference, FASE 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Lecture Notes in Computer Science, vol. 58411, pp. 125–139. Springer (2014). https://doi.org/10.1007/978-3-642-54804-8_9
Mukherjee, R., Tripp, O., Liblit, B., Wilson, M.: Static analysis for AWS best practices in python code. In: Ali, K., Vitek, J. (eds.) 36th European Conference on Object-Oriented Programming, ECOOP 2022, Berlin, Germany, June 6-10, 2022. LIPIcs, vol. 222, pp. 14:1–14:28. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022). https://doi.org/10.4230/LIPIcs.ECOOP.2022.14
Pollock, L.L., Soffa, M.L.: An incremental version of iterative data flow analysis. IEEE Trans. Softw. Eng. 15(12), 1537–1549 (1989). https://doi.org/10.1109/32.58766
Reps, T.W., Horwitz, S., Sagiv, S.: Precise interprocedural dataflow analysis via graph reachability. In: Cytron, R.K., Lee, P. (eds.) Conference Record of POPL’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Francisco, California, USA, January 23-25, 1995, pp. 49–61. ACM Press (1995). https://doi.org/10.1145/199448.199462
Schubert, P.D., Leer, R., Hermann, B., Bodden, E.: Know your analysis: how instrumentation aids understanding static analysis. In: Grech, N., Lavoie, T. (eds.) Proceedings of the 8th ACM SIGPLAN International Workshop on State of the Art in Program Analysis, SOAP@PLDI 2019, Phoenix, AZ, USA, June 22, 2019, pp. 8–13. ACM (2019). https://doi.org/10.1145/3315568.3329965
Schubert, P.D., Gazzillo, P., Patterson, Z., Braha, J., Schiebel, F., Hermann, B., Wei, S., Bodden, E.: Static data-flow analysis for software product lines in C. Autom. Softw. Eng. 29(1), 35 (2022). https://doi.org/10.1007/s10515-022-00333-1
Semgrep, Inc.: Semgrep (2020). https://semgrep.dev
Souter, A.L., Pollock, L.L.: Incremental call graph reanalysis for object-oriented software maintenance. In: 2001 International Conference on Software Maintenance, ICSM 2001, Florence, Italy, November 6–10, 2001, pp. 682–691. IEEE Computer Society (2001). https://doi.org/10.1109/ICSM.2001.972787
Toman, J., Grossman, D.: Taming the static analysis beast. In: Lerner, B.S., Bodík, R., Krishnamurthi, S. (eds.) 2nd Summit on Advances in Programming Languages, SNAPL 2017, Asilomar, CA, USA, May 7-10, 2017. LIPIcs, vol. 71, pp. 18:1–18:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.SNAPL.2017.18
Author information
Authors and Affiliations
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liblit, B., Lyu, Y., Mukherjee, R. et al. User-assisted code query customization and optimization. Int J Softw Tools Technol Transfer 26, 607–619 (2024). https://doi.org/10.1007/s10009-024-00763-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10009-024-00763-0