Abstract
This paper develops a novel approach to using code change patterns in static analysis of semantic equivalence of large-scale software. In particular, we propose a way to define custom code change patterns, describing changes that do change the semantics but in a safe way, and a graph-based algorithm to efficiently detect occurrences of such patterns between two versions of software. The proposed method allows one to reduce the number of false positive results generated by static code-pattern-based analysis of semantic equivalence by specifying which patterns of changes should be considered semantically equivalent. Our experiments with the Linux kernel show that it is possible to eliminate a substantial number of detected differences with just a small number of patterns, while maintaining a very high scalability of the overall analysis. Furthermore, the proposed concept allows for a possible future combination with automatic inference of patterns, which promises significant improvements in the area of static analysis of semantic equivalence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A list of functions which are guaranteed to remain stable across minor RHEL releases.
References
Apiwattanapong, T., Orso, A., Harrold, M.J.: A differencing algorithm for object-oriented programs. In: Proceedings of the 19th IEEE/ACM International Conference on Automated Software Engineering, pp. 2–13. IEEE (2004)
Dig, D., Comertoglu, C., Marinov, D., Johnson, R.: Automated detection of Refactorings in evolving components. In: Thomas, D. (ed.) ECOOP 2006. LNCS, vol. 4067, pp. 404–428. Springer, Heidelberg (2006). https://doi.org/10.1007/11785477_24
Fowler, M.: Refactoring: Improving the Design of Existing code. Addison-Wesley Professional, Boston (2018)
Godlin, B., Strichman, O.: Regression verification. In: Proceedings of the 46th, pp. 466–471. Association for Computing Machinery, New York, NY, USA (2009)
Hovemeyer, D., Pugh, W.: Finding bugs is easy. ACM sigplan notices 39(12), 92–106 (2004)
Jackson, D., Ladd, D.A.: Semantic diff: A tool for summarizing the effects of modifications. In: Proceedings 1994 International Conference on Software Maintenance, pp. 243–252. USA (1994)
Kiefer, M., Klebanov, V., Ulbrich, M.: Relational program reasoning using compiler IR. J. Autom. Reason. 60, 337–363 (2018). https://doi.org/10.1007/s10817-017-9433-5
Lahiri, S.K., Hawblitzel, C., Kawaguchi, M., Rebêlo, H.: SYMDIFF: a language-agnostic semantic diff tool for imperative programs. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 712–717. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31424-7_54
Lattner, C., Adve, V.: LLVM Language Reference Manual (2022). https://llvm.org/docs/LangRef.html
Liu, K., Kim, D., Bissyandé, T.F., Yoo, S., Le Traon, Y.: Mining fix patterns for findbugs violations. IEEE Trans. Softw. Eng. 47(1), 165–188 (2018)
Liu, K., Koyuncu, A., Kim, D., Bissyandé, T.F.: Avatar: fixing semantic bugs with fix patterns of static analysis violations. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 1–12. IEEE (2019)
Long, F., Amidon, P., Rinard, M.: Automatic inference of code transforms for patch generation. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 727–739 (2017)
Malík, V., Vojnar, T.: Automatically checking semantic equivalence between versions of large-scale C projects. In: 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), pp. 329–339. IEEE (2021)
Marjamäki, D.: Cppcheck: a tool for static c/c++ code analysis (2022). https://cppcheck.sourceforge.io/
Padioleau, Y., Hansen, R.R., Lawall, J.L., Muller, G.: Semantic patches for documenting and automating collateral evolutions in Linux device drivers. In: Proceedings of the 3rd Workshop on Programming Languages and Operating Systems: Linguistic Support for Modern Operating Systems, pp. 10-es (2006)
Padioleau, Y., Lawall, J.L., Muller, G.: Understanding collateral evolution in linux device drivers. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, pp. 59–71 (2006)
Prete, K., Rachatasumrit, N., Sudan, N., Kim, M.: Template-based reconstruction of complex refactorings. In: Proceedings of the 2010 IEEE International Conference on Software Maintenance, pp. 1–10 (2010)
Raghavan, S., Rohana, R., Leon, D., Podgurski, A., Augustine, V.: Dex: a semantic-graph differencing tool for studying changes in large code bases. In: 20th IEEE International Conference on Software Maintenance, 2004, pp. 188–197. USA (2004)
Ramos, D.A., Engler, D.R.: Practical, low-effort equivalence verification of real code. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 669–685. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_55
Weißgerber, P., Diehl, S.: Identifying refactorings from source-code changes. In: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, pp. 231–240. IEEE (2006)
Acknowledgement
The authors were supported by the project 20-07487S of the Czech Science Foundation and the FIT BUT internal project FIT-S-20-6427.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Patterns Used in Experiments
A Patterns Used in Experiments
Here, we present details on patterns that we used for our first experiment. For each pattern, we give an example of a real usage of the pattern within the RHEL kernel. Even though our patterns are defined in LLVM IR, we give examples in C, as it is much more readable. The LLVM IR representations of the patterns can be found in the DiffKemp repository. In our experiment, we defined 5 patterns:
-
P1: Use READ_ONCE for a memory read
Usage of the READ_ONCE macro prevents compiler from merging of refetching memory reads. This pattern describes a situation when a simple memory read is replaced by a memory read through the macro. For example:
$$\begin{aligned} \mathtt {p\! \rightarrow \!cpu} \quad \rightarrow \quad \mathtt {READ\_ONCE(p \!\rightarrow \! cpu)} \end{aligned}$$The pattern is parametrised by 3 inputs: (1) the pointer to read from, (2) the field to read, and (3) the type of the pointer.
-
P2: Use WRITE_ONCE for a memory write
The WRITE_ONCE macro is analogical to READ_ONCE, except that it is suited for memory writes. This pattern describes a situation when a simple memory write is replaced by a write through the macro. For example:
$$\begin{aligned} \mathtt {p\! \rightarrow \!cpu = cpu} \quad \rightarrow \quad \mathtt {WRITE\_ONCE(p\! \rightarrow \!cpu, cpu)} \end{aligned}$$This pattern is parametrised by 4 inputs: (1) the pointer and (2) the field to write to, (3) the type of the pointer, and (4) the value to write.
-
P3: Use unlikely for a condition
Usage of the unlikely macro tells the compiler that certain condition will evaluate to true only in a very small number of cases. The compiler can use this information to, e.g., perform a more efficient ordering of instructions. This pattern reflects a situation when the unlikely macro is added to a condition. For example:
$$\begin{aligned} \mathtt {if (sched\_info\_on())} \quad \rightarrow \quad \mathtt {if (unlikely(sched\_info\_on()))} \end{aligned}$$The boolean condition is the single input of the pattern.
-
P4: Replace spin_(un)lock by raw_spin_(un)lock
The Linux kernel provides multiple functions for locking. This pattern describes a situation when usage of spin_lock is replaced by raw_spin_lock. For example:
The same situation may happen with unlocking, hence we would normally need 2 patterns. Thanks to the possibility to specify renaming rules (see Sect. 4.1), our approach allows to handle both locking and unlocking using a single pattern.
-
P5: Replace RECLAIM_DISTANCE by node_reclaim_distance
The RECLAIM_DISTANCE macro and the node_reclaim_distance global variable are two ways of setting a maximum distance between CPU nodes used for load balancing. This pattern describes a situation when the usage of the macro is replaced by the usage of the global variable. Since this is just a simple replacement of one identifier by another, we leave this pattern without an example.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Malík, V., Šilling, P., Vojnar, T. (2022). Applying Custom Patterns in Semantic Equality Analysis. In: Koulali, MA., Mezini, M. (eds) Networked Systems. NETYS 2022. Lecture Notes in Computer Science, vol 13464. Springer, Cham. https://doi.org/10.1007/978-3-031-17436-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-17436-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17435-3
Online ISBN: 978-3-031-17436-0
eBook Packages: Computer ScienceComputer Science (R0)