User-assisted code query customization and optimization

Liblit, Ben; Lyu, Yingjun; Mukherjee, Rajdeep; Tripp, Omer; Wang, Yanjun

doi:10.1007/s10009-024-00763-0

User-assisted code query customization and optimization

General
Special Issue: SOAP 2023
Published: 27 August 2024

Volume 26, pages 607–619, (2024)
Cite this article

International Journal on Software Tools for Technology Transfer Aims and scope Submit manuscript

Ben Liblit¹,
Yingjun Lyu¹,
Rajdeep Mukherjee¹,
Omer Tripp¹ &
…
Yanjun Wang¹

74 Accesses
2 Citations
Explore all metrics

Abstract

Running static analysis rules in the wild as part of a commercial service demands special consideration of time limits and scalability, given the large and diverse real-world workloads that the rules are evaluated on. Furthermore, these rules do not run in isolation, which exposes opportunities for reuse of partial evaluation results across rules. In our work on Amazon CodeGuru Reviewer, and its underlying rule-authoring toolkit known as the Guru Query Language (GQL), we have encountered performance and scalability challenges, and identified corresponding optimization opportunities, such as caching, indexing, and customization of data-flow specification, which rule authors can take advantage of as built-in GQL constructs. Our experimental evaluation of a dataset of open-source GitHub repositories shows 3× speedup and perfect recall using indexing-based configurations, 2× speedup and 51% increase on the number of findings for caching-based optimization. Customizing the data-flow specification, such as expanding the tracking scope, can yield a remarkable increase in the number of findings, as much as 136%. However, this enhancement comes at the expense of a longer analysis time. Our evaluations emphasize the importance of customizing the data-flow specification, particularly when users operate under time constraints. This customization helps the analysis complete within the given time frame, ultimately leading to improved recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic Properties

Efficient Compilation of Regular Path Queries

Article 09 September 2020

Augmenting and structuring user queries to support efficient free-form code search

Article 26 January 2018

References

Amazon Web Services: Boto3 - the AWS SDK for Java (2022). https://github.com/aws/aws-sdk-java
Amazon Web Services: Boto3 - the AWS SDK for Python (2022). https://github.com/boto/boto3
Amazon Web Services: AWS SDK for Python (Boto3) (2022). https://aws.amazon.com/sdk-for-python/
Amazon Web Services: Missing pagination rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/missing-pagination/
Amazon Web Services: Batch request with unchecked failures rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/aws-unchecked-batch-failures/
Amazon Web Services: Inefficient polling of aws resource high rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/aws-polling-instead-of-waiter/
Amazon Web Services: Check uncaught exceptions high rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/check-uncaught-exceptions/
Amazon Web Services: Use of a deprecated method rule (2022). https://docs.aws.amazon.com/codeguru/detector-library/java/deprecated-method/
Amazon Web Services: What is Amazon CodeGuru Reviewer? (2023). https://docs.aws.amazon.com/codeguru/latest/reviewer-ug/welcome.html
Amazon Web Services: Codeguru rules (2024). https://docs.aws.amazon.com/codeguru/detector-library/
Arzt, S., Bodden, E.: Reviser: efficiently updating ide-/ifds-based data-flow analyses in response to incremental program changes. In: Jalote, P., Briand, L.C., van der Hoek, A. (eds.) 36th International Conference on Software Engineering, ICSE ’14, Hyderabad, India, May 31 - June 07, 2014, pp. 288–298. ACM (2014). https://doi.org/10.1145/2568225.2568243
Chapter Google Scholar
Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D., McDaniel, P.D.: Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In: O’Boyle, M.F.P., Pingali, K. (eds.) ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom, June 09-11, 2014, pp. 259–269. ACM (2014). https://doi.org/10.1145/2594291.2594299
Chapter Google Scholar
Calcagno, C., Infer, D.D.: An automatic program verifier for memory safety of C programs. In: Gheorghiu Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) Proceedings, NASA Formal Methods - Third International Symposium, NFM 2011, Pasadena, CA, USA, April 18-20, 2011. Lecture Notes in Computer Science, vol. 6617, pp. 459–465. Springer (2011). https://doi.org/10.1007/978-3-642-20398-5_33
Chapter Google Scholar
GitHub, Inc.: Codeql (2019). https://codeql.github.com
Gu, R., Zuo, Z., Jiang, X., Yin, H., Wang, Z., Wang, L., Li, X., Huang, Y.: Towards efficient large-scale interprocedural program static analysis on distributed data-parallel computation. IEEE Trans. Parallel Distrib. Syst. 32(4), 867–883 (2021). https://doi.org/10.1109/TPDS.2020.3036190
Article Google Scholar
Hardekopf, B., Wiedermann, B., Churchill, B.R., Kashyap, V.: Widening for control-flow. In: McMillan, K.L., Rival, X. (eds.) Proceedings, Verification, Model Checking, and Abstract Interpretation - 15th International Conference, VMCAI 2014, San Diego, CA, USA, January 19-21, 2014. Lecture Notes in Computer Science, vol. 8318, pp. 472–491. Springer (2014). https://doi.org/10.1007/978-3-642-54013-4_26
Chapter Google Scholar
Kashyap, V., Dewey, K., Kuefner, E.A., Wagner, J., Gibbons, K., Sarracino, J., Wiedermann, B., Ben, H.: JSAI: a static analysis platform for javascript. In: Cheung, S.-C., Orso, A., Storey, M.-A.D. (eds.) Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16-22, 2014, pp. 121–132. ACM (2014). https://doi.org/10.1145/2635868.2635904
Chapter Google Scholar
Ko, Y., Lee, H., Dolby, J., Ryu, S.: Practically tunable static analysis framework for large-scale javascript applications (T). In: Cohen, M.B., Grunske, L., Whalen, M. (eds.) 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015, pp. 541–551. IEEE Computer Society (2015). https://doi.org/10.1109/ASE.2015.28
Chapter Google Scholar
Kulkarni, S., Mangal, R., Zhang, X., Naik, M.: Accelerating program analyses by cross-program training. In: Visser, E., Smaragdakis, Y. (eds.) Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016, pp. 359–377. ACM (2016). https://doi.org/10.1145/2983990.2984023
Chapter Google Scholar
Lyu, Y., Volokh, S., Halfond, W.G.J., Tripp, O.: SAND: a static analysis approach for detecting SQL antipatterns. In: Cadar, C., Zhang, X. (eds.) ISSTA ’21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Denmark, July 11-17, 2021, pp. 270–282. ACM (2021). https://doi.org/10.1145/3460319.3464818
Chapter Google Scholar
McPeak, S., Gros, C.-H., Ramanathan, M.K.: Scalable and incremental software bug detection. In: Meyer, B., Baresi, L., Mezini, M. (eds.) Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18–26, 2013, pp. 554–564. ACM (2013). https://doi.org/10.1145/2491411.2501854
Chapter Google Scholar
Mudduluru, R., Ramanathan, M.K.: Efficient incremental static analysis using path abstraction. In: Gnesi, S., Rensink, A. (eds.) Proceedings, Fundamental Approaches to Software Engineering - 17th International Conference, FASE 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Lecture Notes in Computer Science, vol. 58411, pp. 125–139. Springer (2014). https://doi.org/10.1007/978-3-642-54804-8_9
Chapter Google Scholar
Mukherjee, R., Tripp, O., Liblit, B., Wilson, M.: Static analysis for AWS best practices in python code. In: Ali, K., Vitek, J. (eds.) 36th European Conference on Object-Oriented Programming, ECOOP 2022, Berlin, Germany, June 6-10, 2022. LIPIcs, vol. 222, pp. 14:1–14:28. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022). https://doi.org/10.4230/LIPIcs.ECOOP.2022.14
Chapter Google Scholar
Pollock, L.L., Soffa, M.L.: An incremental version of iterative data flow analysis. IEEE Trans. Softw. Eng. 15(12), 1537–1549 (1989). https://doi.org/10.1109/32.58766
Article Google Scholar
Reps, T.W., Horwitz, S., Sagiv, S.: Precise interprocedural dataflow analysis via graph reachability. In: Cytron, R.K., Lee, P. (eds.) Conference Record of POPL’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Francisco, California, USA, January 23-25, 1995, pp. 49–61. ACM Press (1995). https://doi.org/10.1145/199448.199462
Chapter Google Scholar
Schubert, P.D., Leer, R., Hermann, B., Bodden, E.: Know your analysis: how instrumentation aids understanding static analysis. In: Grech, N., Lavoie, T. (eds.) Proceedings of the 8th ACM SIGPLAN International Workshop on State of the Art in Program Analysis, SOAP@PLDI 2019, Phoenix, AZ, USA, June 22, 2019, pp. 8–13. ACM (2019). https://doi.org/10.1145/3315568.3329965
Chapter Google Scholar
Schubert, P.D., Gazzillo, P., Patterson, Z., Braha, J., Schiebel, F., Hermann, B., Wei, S., Bodden, E.: Static data-flow analysis for software product lines in C. Autom. Softw. Eng. 29(1), 35 (2022). https://doi.org/10.1007/s10515-022-00333-1
Article Google Scholar
Semgrep, Inc.: Semgrep (2020). https://semgrep.dev
Souter, A.L., Pollock, L.L.: Incremental call graph reanalysis for object-oriented software maintenance. In: 2001 International Conference on Software Maintenance, ICSM 2001, Florence, Italy, November 6–10, 2001, pp. 682–691. IEEE Computer Society (2001). https://doi.org/10.1109/ICSM.2001.972787
Chapter Google Scholar
Toman, J., Grossman, D.: Taming the static analysis beast. In: Lerner, B.S., Bodík, R., Krishnamurthi, S. (eds.) 2nd Summit on Advances in Programming Languages, SNAPL 2017, Asilomar, CA, USA, May 7-10, 2017. LIPIcs, vol. 71, pp. 18:1–18:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.SNAPL.2017.18
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Amazon, San Jose, USA
Ben Liblit, Yingjun Lyu, Rajdeep Mukherjee, Omer Tripp & Yanjun Wang

Authors

Ben Liblit
View author publications
You can also search for this author inPubMed Google Scholar
Yingjun Lyu
View author publications
You can also search for this author inPubMed Google Scholar
Rajdeep Mukherjee
View author publications
You can also search for this author inPubMed Google Scholar
Omer Tripp
View author publications
You can also search for this author inPubMed Google Scholar
Yanjun Wang
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liblit, B., Lyu, Y., Mukherjee, R. et al. User-assisted code query customization and optimization. Int J Softw Tools Technol Transfer 26, 607–619 (2024). https://doi.org/10.1007/s10009-024-00763-0

Download citation

Accepted: 14 August 2024
Published: 27 August 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s10009-024-00763-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

User-assisted code query customization and optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic Properties

Efficient Compilation of Regular Path Queries

Augmenting and structuring user queries to support efficient free-form code search

References

Author information

Authors and Affiliations

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now