Applying Custom Patterns in Semantic Equality Analysis

Malík, Viktor; Šilling, Petr; Vojnar, Tomáš

doi:10.1007/978-3-031-17436-0_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13464))

Included in the following conference series:

International Conference on Networked Systems

366 Accesses

Abstract

This paper develops a novel approach to using code change patterns in static analysis of semantic equivalence of large-scale software. In particular, we propose a way to define custom code change patterns, describing changes that do change the semantics but in a safe way, and a graph-based algorithm to efficiently detect occurrences of such patterns between two versions of software. The proposed method allows one to reduce the number of false positive results generated by static code-pattern-based analysis of semantic equivalence by specifying which patterns of changes should be considered semantically equivalent. Our experiments with the Linux kernel show that it is possible to eliminate a substantial number of detected differences with just a small number of patterns, while maintaining a very high scalability of the overall analysis. Furthermore, the proposed concept allows for a possible future combination with automatic inference of patterns, which promises significant improvements in the area of static analysis of semantic equivalence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A list of functions which are guaranteed to remain stable across minor RHEL releases.

References

Apiwattanapong, T., Orso, A., Harrold, M.J.: A differencing algorithm for object-oriented programs. In: Proceedings of the 19th IEEE/ACM International Conference on Automated Software Engineering, pp. 2–13. IEEE (2004)
Google Scholar
Dig, D., Comertoglu, C., Marinov, D., Johnson, R.: Automated detection of Refactorings in evolving components. In: Thomas, D. (ed.) ECOOP 2006. LNCS, vol. 4067, pp. 404–428. Springer, Heidelberg (2006). https://doi.org/10.1007/11785477_24
Chapter Google Scholar
Fowler, M.: Refactoring: Improving the Design of Existing code. Addison-Wesley Professional, Boston (2018)
Google Scholar
Godlin, B., Strichman, O.: Regression verification. In: Proceedings of the 46th, pp. 466–471. Association for Computing Machinery, New York, NY, USA (2009)
Google Scholar
Hovemeyer, D., Pugh, W.: Finding bugs is easy. ACM sigplan notices 39(12), 92–106 (2004)
Article Google Scholar
Jackson, D., Ladd, D.A.: Semantic diff: A tool for summarizing the effects of modifications. In: Proceedings 1994 International Conference on Software Maintenance, pp. 243–252. USA (1994)
Google Scholar
Kiefer, M., Klebanov, V., Ulbrich, M.: Relational program reasoning using compiler IR. J. Autom. Reason. 60, 337–363 (2018). https://doi.org/10.1007/s10817-017-9433-5
Lahiri, S.K., Hawblitzel, C., Kawaguchi, M., Rebêlo, H.: SYMDIFF: a language-agnostic semantic diff tool for imperative programs. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 712–717. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31424-7_54
Chapter Google Scholar
Lattner, C., Adve, V.: LLVM Language Reference Manual (2022). https://llvm.org/docs/LangRef.html
Liu, K., Kim, D., Bissyandé, T.F., Yoo, S., Le Traon, Y.: Mining fix patterns for findbugs violations. IEEE Trans. Softw. Eng. 47(1), 165–188 (2018)
Article Google Scholar
Liu, K., Koyuncu, A., Kim, D., Bissyandé, T.F.: Avatar: fixing semantic bugs with fix patterns of static analysis violations. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 1–12. IEEE (2019)
Google Scholar
Long, F., Amidon, P., Rinard, M.: Automatic inference of code transforms for patch generation. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 727–739 (2017)
Google Scholar
Malík, V., Vojnar, T.: Automatically checking semantic equivalence between versions of large-scale C projects. In: 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), pp. 329–339. IEEE (2021)
Google Scholar
Marjamäki, D.: Cppcheck: a tool for static c/c++ code analysis (2022). https://cppcheck.sourceforge.io/
Padioleau, Y., Hansen, R.R., Lawall, J.L., Muller, G.: Semantic patches for documenting and automating collateral evolutions in Linux device drivers. In: Proceedings of the 3rd Workshop on Programming Languages and Operating Systems: Linguistic Support for Modern Operating Systems, pp. 10-es (2006)
Google Scholar
Padioleau, Y., Lawall, J.L., Muller, G.: Understanding collateral evolution in linux device drivers. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, pp. 59–71 (2006)
Google Scholar
Prete, K., Rachatasumrit, N., Sudan, N., Kim, M.: Template-based reconstruction of complex refactorings. In: Proceedings of the 2010 IEEE International Conference on Software Maintenance, pp. 1–10 (2010)
Google Scholar
Raghavan, S., Rohana, R., Leon, D., Podgurski, A., Augustine, V.: Dex: a semantic-graph differencing tool for studying changes in large code bases. In: 20th IEEE International Conference on Software Maintenance, 2004, pp. 188–197. USA (2004)
Google Scholar
Ramos, D.A., Engler, D.R.: Practical, low-effort equivalence verification of real code. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 669–685. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_55
Chapter Google Scholar
Weißgerber, P., Diehl, S.: Identifying refactorings from source-code changes. In: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, pp. 231–240. IEEE (2006)
Google Scholar

Download references

Acknowledgement

The authors were supported by the project 20-07487S of the Czech Science Foundation and the FIT BUT internal project FIT-S-20-6427.

Author information

Authors and Affiliations

Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Viktor Malík, Petr Šilling & Tomáš Vojnar
Red Hat Czech, Brno, Czech Republic
Viktor Malík

Authors

Viktor Malík
View author publications
You can also search for this author in PubMed Google Scholar
Petr Šilling
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Vojnar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Viktor Malík .

Editor information

Editors and Affiliations

Mohammed First University, Oujda, Morocco
Mohammed-Amine Koulali
Technical University of Darmstadt, Darmstadt, Germany
Mira Mezini

A Patterns Used in Experiments

Here, we present details on patterns that we used for our first experiment. For each pattern, we give an example of a real usage of the pattern within the RHEL kernel. Even though our patterns are defined in LLVM IR, we give examples in C, as it is much more readable. The LLVM IR representations of the patterns can be found in the DiffKemp repository. In our experiment, we defined 5 patterns:

P1: Use READ_ONCE for a memory read

Usage of the READ_ONCE macro prevents compiler from merging of refetching memory reads. This pattern describes a situation when a simple memory read is replaced by a memory read through the macro. For example:
$$\begin{aligned} \mathtt {p\! \rightarrow \!cpu} \quad \rightarrow \quad \mathtt {READ\_ONCE(p \!\rightarrow \! cpu)} \end{aligned}$$
The pattern is parametrised by 3 inputs: (1) the pointer to read from, (2) the field to read, and (3) the type of the pointer.
P2: Use WRITE_ONCE for a memory write

The WRITE_ONCE macro is analogical to READ_ONCE, except that it is suited for memory writes. This pattern describes a situation when a simple memory write is replaced by a write through the macro. For example:
$$\begin{aligned} \mathtt {p\! \rightarrow \!cpu = cpu} \quad \rightarrow \quad \mathtt {WRITE\_ONCE(p\! \rightarrow \!cpu, cpu)} \end{aligned}$$
This pattern is parametrised by 4 inputs: (1) the pointer and (2) the field to write to, (3) the type of the pointer, and (4) the value to write.
P3: Use unlikely for a condition

Usage of the unlikely macro tells the compiler that certain condition will evaluate to true only in a very small number of cases. The compiler can use this information to, e.g., perform a more efficient ordering of instructions. This pattern reflects a situation when the unlikely macro is added to a condition. For example:
$$\begin{aligned} \mathtt {if (sched\_info\_on())} \quad \rightarrow \quad \mathtt {if (unlikely(sched\_info\_on()))} \end{aligned}$$
The boolean condition is the single input of the pattern.
P4: Replace spin_(un)lock by raw_spin_(un)lock

The Linux kernel provides multiple functions for locking. This pattern describes a situation when usage of spin_lock is replaced by raw_spin_lock. For example:
The same situation may happen with unlocking, hence we would normally need 2 patterns. Thanks to the possibility to specify renaming rules (see Sect. 4.1), our approach allows to handle both locking and unlocking using a single pattern.
P5: Replace RECLAIM_DISTANCE by node_reclaim_distance

The RECLAIM_DISTANCE macro and the node_reclaim_distance global variable are two ways of setting a maximum distance between CPU nodes used for load balancing. This pattern describes a situation when the usage of the macro is replaced by the usage of the global variable. Since this is just a simple replacement of one identifier by another, we leave this pattern without an example.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malík, V., Šilling, P., Vojnar, T. (2022). Applying Custom Patterns in Semantic Equality Analysis. In: Koulali, MA., Mezini, M. (eds) Networked Systems. NETYS 2022. Lecture Notes in Computer Science, vol 13464. Springer, Cham. https://doi.org/10.1007/978-3-031-17436-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-17436-0_18
Published: 28 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17435-3
Online ISBN: 978-3-031-17436-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Applying Custom Patterns in Semantic Equality Analysis

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Patterns Used in Experiments

A Patterns Used in Experiments

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation