Towards assisting developers in API usage by automated recovery of complex temporal patterns
Introduction
In modern software development, achieving any meaningful task of non-trivial complexity means that developers must reuse software from multiple sources in the form of APIs, libraries, and services. Such libraries usually require that client applications obey assumed constraints and usage patterns. Such constraints can be a barrier to adoption by developers, as learning them is time consuming and tedious, depending heavily on the quality of documentation. To make matters worse, such directives are generally not well-documented [23], [32], [38]. For example, they may touch many methods/classes, whereas documentation such as Javadoc tends to be unit based (per class, per method).
Specification mining is one way to address these problems. Mined specification patterns can be used to complement the documentation of libraries. This can be done by, for example, providing developers with typical usage scenarios or by integrating the mined patterns into IDEs to provide on the fly recommendations.
Recently, much research effort has been dedicated to the identification of sequential API usage patterns (ordered sets of co-used methods) [37] based on development activities [2] and execution traces [11], [22], [31], [33]. Other research contributions targeted unordered API usage patterns [9], [26], [27], [28], [29], [30]. Given a set of client programs that use a library of interest, existing techniques identify API usage patterns that are recurrent in them. However, leveraging mined patterns to help developers ensure that the software remains correct while features are added or removed remains a challenge as the usage patterns inferred with such techniques tend to be simple, numerous, and with high degrees of redundancy.
However, mining temporal aspects of such constraints (i.e., latent temporal properties of APIs) is still an open question. Some work has been done on mining temporal specifications from libraries in the form of automata [17] or rules [9], [18]. Others have attempted to uncover latent behaviour in UML models [10]. On the one hand, a mined automaton expresses a global picture of library specification, but the graph may be very complex and not practical. On the other hand, mined rules generally consist of two events, i.e., two method calls, which limits their ability to express complex temporal properties. This is due to the fact that mining approaches generally rely on predetermined templates, such as the property patterns of Dwyer et al. [6]. Thus, only specific classes of constraints can be identified, instead of all the possibilities that a person could build, require, or understand. Further, to the best of our knowledge, the published literature does not address the mining of APIs in particular.
In this paper, we propose to generalize existing approaches for learning temporal API constraints without using predetermined templates. To handle a wider spectrum of constraints, we define a probabilistic approach centered around the use of atomic constraint sub-expressions as “building blocks” for a search-based exploration of the space of possible API constraints. To this end, we propose a genetic-programming technique that gradually builds Linear Temporal Logic (LTL) formulas, representing candidate usage patterns, by combining API method calls with logical and temporal operators. The search-space exploration is guided by the conformance of candidate patterns with execution traces of client programs using the targeted API.
We evaluated our approach on eight APIs having a variable number of clients. Our evaluation shows that we obtained patterns with different sizes and complexities. It also evidenced that these patterns are generalizable clients not seen in the learning phase.
This paper extends our previous work [31] that was published in the Genetic and Evolutionary Computation Conference (GECCO) as follows:
- (a)
We provide extensive details about the experimental setup of the pattern mining validation.
- (b)
We re-implemented a baseline approach and compared with our approach through shedding light on the kind of patterns that could be inferred.
- (c)
We introduce Tapir (Temporal API Recommender), a tool that allows putting the mined temporal investigate to use within developers’ IDEs. Specifically, we envision using Tapir in four contexts: First, before the developer starts writing the API client application code, we envision using Tapir to help augment existing API documentation by translating patterns into structured natural language. Second, Tapir can help when developers write the client-application code. This is done by flagging potential misuses or by refining the code completion suggestions for API calls. Additionally, Tapir can be used at testing time to respectively assess whether the code and the execution traces satisfy the mined patterns. A planned extension to Tapir , will allow us to adapt this for use at compilation time, too.
- (d)
We present real-word running examples to illustrate the different ways to leverage the mined API temporal patterns.
- (e)
We propose a method, implemented in Tapir , to help client application developers to correctly use an API at different development phases using the LTL patterns. Specifically, we propose a method based on the generation and analysis of “pseudo-traces” from complete and partial source code, using static analysis.
The rest of the paper is organized as follows. Section 2 outlines the contours of the problem of automated mining of temporal API usage patterns. Section 3 includes an overview of the related work, and we discuss our vision on how existing approaches can be extended to target a wide spectrum of API usage constraints. The details of our approach and its evaluations are provided respectively in Sections 4 and 5. In Section 6, we present a research agenda for leveraging mined API usage patterns in practice. We conclude this paper in Section 7.
Section snippets
Problem discussion
In this section, we outline the contours of the problem of automated recovery of temporal API usage patterns. In this paper, we focus on recovering such patterns in the form of Linear Temporal Logic (LTL) expressions. We symbolize as the space of all well-formed LTL expressions. Given an API, we denote as ⊆ the set of all LTL expressions that describe its valid usage patterns, i.e. LTL expressions involving the API public methods. Ideally, a recovery technique can search all of to find
Related work
In recent years, much research effort has been dedicated to specification mining. For instance, SpecForge [14], synergize many existing finite state automaton based on specification mining algorithms. SpecForge generates a superior FSA from a set of FSAs mined with existing algorithms. SpecForge extracts important constraints that are common across the mined FSAs and combine the extracted constraints into one FSA model. It also uses linear temporal logic to specify ordering constraints
Complex temporal API usage patterns mining
Our approach consists in using genetic programming, an evolutionary method, to mine temporal patterns from execution traces. Genetic programming is a powerful search method inspired by natural selection. The basic idea is to make a population of candidate “programs” evolve toward the solution of a specific problem. Each individual of the population is evaluated by a fitness function that determines its ability to solve the target problem. New individuals are derived from existing one by
Research questions
To evaluate the efficiency and relevance of our approach, we defined four research questions:
- •
RQ1: What kind of patterns we can infer with our approach?
- •
RQ2: Are the inferred patterns generalizable to other “new” client programs that are non-seen in the mining process?
- •
RQ3: Are the inferred patterns meaningful for developers?
- •
RQ4: What kind of LTL patterns are mined with a non-evolutionary state-of-the-art approach?
Data
We evaluate our technique through the usage of 8 widely used APIs from the Android
Usage of API temporal patterns
We have created the tool Tapir 2 (Temporal API Recommender) that guides developers in using APIs based on mined temporal patterns. In this section, we present how it integrates temporal patterns into development activities. Specifically, Tapir focuses on four development activities where we can take advantage of the LTL patterns as illustrated in Fig. 6:
- (a)
During testing, Tapir can verify whether execution traces of API client code satisfy the mined patterns.
- (b)
Discussion and conclusion
We have proposed a genetic-programming approach to recover API temporal constraints from execution traces of client programs using the API. Our approach explores the space of LTL expressions, representing the candidate patterns, that can be defined on the API public methods. The exploration is guided by the applicability of candidate patterns to the trace samples. Unlike most of the existing approaches, ours does not search for specific pattern templates. We evaluated our approach on eight
Acknowledgments
We acknowledge the contributions of Pierre-Olivier Talbot in the conference version of this paper. We also acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).
References (43)
- et al.
Improving reusability of software libraries through usage pattern mining
Journal of Systems and Software
(2018) - et al.
Detection techniques of dead code: systematic literature review
Proceedings of the XII Brazilian Symposium on Information Systems on Brazilian Symposium on Information Systems: Information Systems in the Cloud Computing Era-Volume 1
(2016) - et al.
Detection of software evolution phases based on development activities
Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension
(2015) - et al.
Investigating order information in api-usage patterns: a benchmark and empirical study
- et al.
Searching connected api subgraph via text phrases
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
(2012) - et al.
Counterexample-guided abstraction refinement
- et al.
Patterns in property specifications for finite-state verification
Proceedings of the 21st International Conference on Software Engineering
(1999) - et al.
What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution
IEEE International Conference on Robotics and Automation
(2009) - et al.
Javert: Fully automatic mining of general temporal properties from dynamic traces
ACM SIGSOFT International Symposium on Foundations of Software Engineering
(2008) - et al.
Online inference and enforcement of temporal properties
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1
(2010)
Mining complex temporal api usage patterns: an evolutionary approach
Proceedings of the 39th International Conference on Software Engineering Companion
Interprocedural analysis with lazy propagation
International Static Analysis Symposium
Automatic flow analysis using symbolic execution and path enumeration
2006 International Conference on Parallel Processing Workshops (ICPPW’06)
Synergizing specification miners through model fissions and fusions (t)
2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)
General LTL specification mining (t)
Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on
Mining energy-greedy api usage patterns in android apps: an empirical study
Proceedings of the 11th Working Conference on Mining Software Repositories
Mining temporal rules for software maintenance
J. Softw. Maint. Evol.
Automatic translation of natural language system specifications into temporal logic
International Conference on Computer Aided Verification
Grapacc: A graph-based pattern-oriented, context-sensitive code completion tool
2012 34th International Conference on Software Engineering (ICSE)
Cited by (12)
Improving microservices extraction using evolutionary search
2022, Information and Software TechnologyCitation Excerpt :Furthermore, finding a trade-off between granularity, coupling and cohesion within the extracted microservices limits their performance and practicality. Indeed, decomposing a software system into smaller components always has been a challenge in software engineering, and known as complex problems which are best suited to search-based software engineering (SBSE) [21–31]. In this paper, we found the current contribution on our previous work published in the 17th International Conference on Service-Oriented Computing (ICSOC) [11].
An empirical study on API usages from code search engine and local library
2023, Empirical Software EngineeringClient-Specific Upgrade Compatibility Checking via Knowledge-Guided Discovery
2023, ACM Transactions on Software Engineering and MethodologyA Hierarchical DBSCAN Method for Extracting Microservices from Monolithic Applications
2022, ACM International Conference Proceeding SeriesCombining Static and Dynamic Analysis to Decompose Monolithic Application into Microservices
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)