Skip to main content

Applying Custom Patterns in Semantic Equality Analysis

  • Conference paper
  • First Online:
Networked Systems (NETYS 2022)

Abstract

This paper develops a novel approach to using code change patterns in static analysis of semantic equivalence of large-scale software. In particular, we propose a way to define custom code change patterns, describing changes that do change the semantics but in a safe way, and a graph-based algorithm to efficiently detect occurrences of such patterns between two versions of software. The proposed method allows one to reduce the number of false positive results generated by static code-pattern-based analysis of semantic equivalence by specifying which patterns of changes should be considered semantically equivalent. Our experiments with the Linux kernel show that it is possible to eliminate a substantial number of detected differences with just a small number of patterns, while maintaining a very high scalability of the overall analysis. Furthermore, the proposed concept allows for a possible future combination with automatic inference of patterns, which promises significant improvements in the area of static analysis of semantic equivalence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A list of functions which are guaranteed to remain stable across minor RHEL releases.

References

  1. Apiwattanapong, T., Orso, A., Harrold, M.J.: A differencing algorithm for object-oriented programs. In: Proceedings of the 19th IEEE/ACM International Conference on Automated Software Engineering, pp. 2–13. IEEE (2004)

    Google Scholar 

  2. Dig, D., Comertoglu, C., Marinov, D., Johnson, R.: Automated detection of Refactorings in evolving components. In: Thomas, D. (ed.) ECOOP 2006. LNCS, vol. 4067, pp. 404–428. Springer, Heidelberg (2006). https://doi.org/10.1007/11785477_24

    Chapter  Google Scholar 

  3. Fowler, M.: Refactoring: Improving the Design of Existing code. Addison-Wesley Professional, Boston (2018)

    Google Scholar 

  4. Godlin, B., Strichman, O.: Regression verification. In: Proceedings of the 46th, pp. 466–471. Association for Computing Machinery, New York, NY, USA (2009)

    Google Scholar 

  5. Hovemeyer, D., Pugh, W.: Finding bugs is easy. ACM sigplan notices 39(12), 92–106 (2004)

    Article  Google Scholar 

  6. Jackson, D., Ladd, D.A.: Semantic diff: A tool for summarizing the effects of modifications. In: Proceedings 1994 International Conference on Software Maintenance, pp. 243–252. USA (1994)

    Google Scholar 

  7. Kiefer, M., Klebanov, V., Ulbrich, M.: Relational program reasoning using compiler IR. J. Autom. Reason. 60, 337–363 (2018). https://doi.org/10.1007/s10817-017-9433-5

  8. Lahiri, S.K., Hawblitzel, C., Kawaguchi, M., Rebêlo, H.: SYMDIFF: a language-agnostic semantic diff tool for imperative programs. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 712–717. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31424-7_54

    Chapter  Google Scholar 

  9. Lattner, C., Adve, V.: LLVM Language Reference Manual (2022). https://llvm.org/docs/LangRef.html

  10. Liu, K., Kim, D., Bissyandé, T.F., Yoo, S., Le Traon, Y.: Mining fix patterns for findbugs violations. IEEE Trans. Softw. Eng. 47(1), 165–188 (2018)

    Article  Google Scholar 

  11. Liu, K., Koyuncu, A., Kim, D., Bissyandé, T.F.: Avatar: fixing semantic bugs with fix patterns of static analysis violations. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 1–12. IEEE (2019)

    Google Scholar 

  12. Long, F., Amidon, P., Rinard, M.: Automatic inference of code transforms for patch generation. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 727–739 (2017)

    Google Scholar 

  13. Malík, V., Vojnar, T.: Automatically checking semantic equivalence between versions of large-scale C projects. In: 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), pp. 329–339. IEEE (2021)

    Google Scholar 

  14. Marjamäki, D.: Cppcheck: a tool for static c/c++ code analysis (2022). https://cppcheck.sourceforge.io/

  15. Padioleau, Y., Hansen, R.R., Lawall, J.L., Muller, G.: Semantic patches for documenting and automating collateral evolutions in Linux device drivers. In: Proceedings of the 3rd Workshop on Programming Languages and Operating Systems: Linguistic Support for Modern Operating Systems, pp. 10-es (2006)

    Google Scholar 

  16. Padioleau, Y., Lawall, J.L., Muller, G.: Understanding collateral evolution in linux device drivers. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, pp. 59–71 (2006)

    Google Scholar 

  17. Prete, K., Rachatasumrit, N., Sudan, N., Kim, M.: Template-based reconstruction of complex refactorings. In: Proceedings of the 2010 IEEE International Conference on Software Maintenance, pp. 1–10 (2010)

    Google Scholar 

  18. Raghavan, S., Rohana, R., Leon, D., Podgurski, A., Augustine, V.: Dex: a semantic-graph differencing tool for studying changes in large code bases. In: 20th IEEE International Conference on Software Maintenance, 2004, pp. 188–197. USA (2004)

    Google Scholar 

  19. Ramos, D.A., Engler, D.R.: Practical, low-effort equivalence verification of real code. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 669–685. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_55

    Chapter  Google Scholar 

  20. Weißgerber, P., Diehl, S.: Identifying refactorings from source-code changes. In: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, pp. 231–240. IEEE (2006)

    Google Scholar 

Download references

Acknowledgement

The authors were supported by the project 20-07487S of the Czech Science Foundation and the FIT BUT internal project FIT-S-20-6427.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viktor Malík .

Editor information

Editors and Affiliations

A Patterns Used in Experiments

A Patterns Used in Experiments

Here, we present details on patterns that we used for our first experiment. For each pattern, we give an example of a real usage of the pattern within the RHEL kernel. Even though our patterns are defined in LLVM IR, we give examples in C, as it is much more readable. The LLVM IR representations of the patterns can be found in the DiffKemp repository. In our experiment, we defined 5 patterns:

  • P1: Use READ_ONCE for a memory read

    Usage of the READ_ONCE macro prevents compiler from merging of refetching memory reads. This pattern describes a situation when a simple memory read is replaced by a memory read through the macro. For example:

    $$\begin{aligned} \mathtt {p\! \rightarrow \!cpu} \quad \rightarrow \quad \mathtt {READ\_ONCE(p \!\rightarrow \! cpu)} \end{aligned}$$

    The pattern is parametrised by 3 inputs: (1) the pointer to read from, (2) the field to read, and (3) the type of the pointer.

  • P2: Use WRITE_ONCE for a memory write

    The WRITE_ONCE macro is analogical to READ_ONCE, except that it is suited for memory writes. This pattern describes a situation when a simple memory write is replaced by a write through the macro. For example:

    $$\begin{aligned} \mathtt {p\! \rightarrow \!cpu = cpu} \quad \rightarrow \quad \mathtt {WRITE\_ONCE(p\! \rightarrow \!cpu, cpu)} \end{aligned}$$

    This pattern is parametrised by 4 inputs: (1) the pointer and (2) the field to write to, (3) the type of the pointer, and (4) the value to write.

  • P3: Use unlikely for a condition

    Usage of the unlikely macro tells the compiler that certain condition will evaluate to true only in a very small number of cases. The compiler can use this information to, e.g., perform a more efficient ordering of instructions. This pattern reflects a situation when the unlikely macro is added to a condition. For example:

    $$\begin{aligned} \mathtt {if (sched\_info\_on())} \quad \rightarrow \quad \mathtt {if (unlikely(sched\_info\_on()))} \end{aligned}$$

    The boolean condition is the single input of the pattern.

  • P4: Replace spin_(un)lock by raw_spin_(un)lock

    The Linux kernel provides multiple functions for locking. This pattern describes a situation when usage of spin_lock is replaced by raw_spin_lock. For example:

    figure d

    The same situation may happen with unlocking, hence we would normally need 2 patterns. Thanks to the possibility to specify renaming rules (see Sect. 4.1), our approach allows to handle both locking and unlocking using a single pattern.

  • P5: Replace RECLAIM_DISTANCE by node_reclaim_distance

    The RECLAIM_DISTANCE macro and the node_reclaim_distance global variable are two ways of setting a maximum distance between CPU nodes used for load balancing. This pattern describes a situation when the usage of the macro is replaced by the usage of the global variable. Since this is just a simple replacement of one identifier by another, we leave this pattern without an example.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Malík, V., Šilling, P., Vojnar, T. (2022). Applying Custom Patterns in Semantic Equality Analysis. In: Koulali, MA., Mezini, M. (eds) Networked Systems. NETYS 2022. Lecture Notes in Computer Science, vol 13464. Springer, Cham. https://doi.org/10.1007/978-3-031-17436-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17436-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17435-3

  • Online ISBN: 978-3-031-17436-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics