Towards assisting developers in API usage by automated recovery of complex temporal patterns

https://doi.org/10.1016/j.infsof.2019.106213Get rights and content

Abstract

Context

Despite the many advantages, the use of external libraries through their APIs remains difficult because of the usage patterns and constraints that are hidden or not properly documented. Existing work provides different techniques to recover API usage patterns from client programs in order to help developers use those libraries. However, most of these techniques produce patterns that generally do not involve temporal properties.

Objective

In this paper, we discuss the problem of temporal usage patterns recovery and propose an algorithm to solve it. We also discuss how the obtained patterns can be used at different stages of client development.

Method

We address the recovery of temporal API usage patterns as an optimization problem and solve it using a genetic-programming algorithm.

Results

Our evaluation on different APIs shows that the proposed algorithm allows to derive non-trivial temporal usage that are useful and generalizable to new API clients.

Conclusion

Recovering API usage temporal patterns helps client developers to use APIs in an appropriate way. In addition to potentially improve productivity, such patterns also helps preventing errors that result from an incorrect use of the APIs.

Introduction

In modern software development, achieving any meaningful task of non-trivial complexity means that developers must reuse software from multiple sources in the form of APIs, libraries, and services. Such libraries usually require that client applications obey assumed constraints and usage patterns. Such constraints can be a barrier to adoption by developers, as learning them is time consuming and tedious, depending heavily on the quality of documentation. To make matters worse, such directives are generally not well-documented [23], [32], [38]. For example, they may touch many methods/classes, whereas documentation such as Javadoc tends to be unit based (per class, per method).

Specification mining is one way to address these problems. Mined specification patterns can be used to complement the documentation of libraries. This can be done by, for example, providing developers with typical usage scenarios or by integrating the mined patterns into IDEs to provide on the fly recommendations.

Recently, much research effort has been dedicated to the identification of sequential API usage patterns (ordered sets of co-used methods) [37] based on development activities [2] and execution traces [11], [22], [31], [33]. Other research contributions targeted unordered API usage patterns [9], [26], [27], [28], [29], [30]. Given a set of client programs that use a library of interest, existing techniques identify API usage patterns that are recurrent in them. However, leveraging mined patterns to help developers ensure that the software remains correct while features are added or removed remains a challenge as the usage patterns inferred with such techniques tend to be simple, numerous, and with high degrees of redundancy.

However, mining temporal aspects of such constraints (i.e., latent temporal properties of APIs) is still an open question. Some work has been done on mining temporal specifications from libraries in the form of automata [17] or rules [9], [18]. Others have attempted to uncover latent behaviour in UML models [10]. On the one hand, a mined automaton expresses a global picture of library specification, but the graph may be very complex and not practical. On the other hand, mined rules generally consist of two events, i.e., two method calls, which limits their ability to express complex temporal properties. This is due to the fact that mining approaches generally rely on predetermined templates, such as the property patterns of Dwyer et al. [6]. Thus, only specific classes of constraints can be identified, instead of all the possibilities that a person could build, require, or understand. Further, to the best of our knowledge, the published literature does not address the mining of APIs in particular.

In this paper, we propose to generalize existing approaches for learning temporal API constraints without using predetermined templates. To handle a wider spectrum of constraints, we define a probabilistic approach centered around the use of atomic constraint sub-expressions as “building blocks” for a search-based exploration of the space of possible API constraints. To this end, we propose a genetic-programming technique that gradually builds Linear Temporal Logic (LTL) formulas, representing candidate usage patterns, by combining API method calls with logical and temporal operators. The search-space exploration is guided by the conformance of candidate patterns with execution traces of client programs using the targeted API.

We evaluated our approach on eight APIs having a variable number of clients. Our evaluation shows that we obtained patterns with different sizes and complexities. It also evidenced that these patterns are generalizable clients not seen in the learning phase.

This paper extends our previous work [31] that was published in the Genetic and Evolutionary Computation Conference (GECCO) as follows:

  • (a)

    We provide extensive details about the experimental setup of the pattern mining validation.

  • (b)

    We re-implemented a baseline approach and compared with our approach through shedding light on the kind of patterns that could be inferred.

  • (c)

    We introduce Tapir (Temporal API Recommender), a tool that allows putting the mined temporal investigate to use within developers’ IDEs. Specifically, we envision using Tapir in four contexts: First, before the developer starts writing the API client application code, we envision using Tapir to help augment existing API documentation by translating patterns into structured natural language. Second, Tapir can help when developers write the client-application code. This is done by flagging potential misuses or by refining the code completion suggestions for API calls. Additionally, Tapir can be used at testing time to respectively assess whether the code and the execution traces satisfy the mined patterns. A planned extension to Tapir , will allow us to adapt this for use at compilation time, too.

  • (d)

    We present real-word running examples to illustrate the different ways to leverage the mined API temporal patterns.

  • (e)

    We propose a method, implemented in Tapir , to help client application developers to correctly use an API at different development phases using the LTL patterns. Specifically, we propose a method based on the generation and analysis of “pseudo-traces” from complete and partial source code, using static analysis.

The rest of the paper is organized as follows. Section 2 outlines the contours of the problem of automated mining of temporal API usage patterns. Section 3 includes an overview of the related work, and we discuss our vision on how existing approaches can be extended to target a wide spectrum of API usage constraints. The details of our approach and its evaluations are provided respectively in Sections 4 and 5. In Section 6, we present a research agenda for leveraging mined API usage patterns in practice. We conclude this paper in Section 7.

Section snippets

Problem discussion

In this section, we outline the contours of the problem of automated recovery of temporal API usage patterns. In this paper, we focus on recovering such patterns in the form of Linear Temporal Logic (LTL) expressions. We symbolize as S the space of all well-formed LTL expressions. Given an API, we denote as A  ⊆ S the set of all LTL expressions that describe its valid usage patterns, i.e. LTL expressions involving the API public methods. Ideally, a recovery technique can search all of A to find

Related work

In recent years, much research effort has been dedicated to specification mining. For instance, SpecForge [14], synergize many existing finite state automaton based on specification mining algorithms. SpecForge generates a superior FSA from a set of FSAs mined with existing algorithms. SpecForge extracts important constraints that are common across the mined FSAs and combine the extracted constraints into one FSA model. It also uses linear temporal logic to specify ordering constraints

Complex temporal API usage patterns mining

Our approach consists in using genetic programming, an evolutionary method, to mine temporal patterns from execution traces. Genetic programming is a powerful search method inspired by natural selection. The basic idea is to make a population of candidate “programs” evolve toward the solution of a specific problem. Each individual of the population is evaluated by a fitness function that determines its ability to solve the target problem. New individuals are derived from existing one by

Research questions

To evaluate the efficiency and relevance of our approach, we defined four research questions:

  • RQ1: What kind of patterns we can infer with our approach?

  • RQ2: Are the inferred patterns generalizable to other “new” client programs that are non-seen in the mining process?

  • RQ3: Are the inferred patterns meaningful for developers?

  • RQ4: What kind of LTL patterns are mined with a non-evolutionary state-of-the-art approach?

Data

We evaluate our technique through the usage of 8 widely used APIs from the Android

Usage of API temporal patterns

We have created the tool Tapir 2 (Temporal API Recommender) that guides developers in using APIs based on mined temporal patterns. In this section, we present how it integrates temporal patterns into development activities. Specifically, Tapir focuses on four development activities where we can take advantage of the LTL patterns as illustrated in Fig. 6:

  • (a)

    During testing, Tapir can verify whether execution traces of API client code satisfy the mined patterns.

  • (b)

Discussion and conclusion

We have proposed a genetic-programming approach to recover API temporal constraints from execution traces of client programs using the API. Our approach explores the space of LTL expressions, representing the candidate patterns, that can be defined on the API public methods. The exploration is guided by the applicability of candidate patterns to the trace samples. Unlike most of the existing approaches, ours does not search for specific pattern templates. We evaluated our approach on eight

Acknowledgments

We acknowledge the contributions of Pierre-Olivier Talbot in the conference version of this paper. We also acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).

References (43)

  • M.A. Saied et al.

    Improving reusability of software libraries through usage pattern mining

    Journal of Systems and Software

    (2018)
  • C. Bastos et al.

    Detection techniques of dead code: systematic literature review

    Proceedings of the XII Brazilian Symposium on Information Systems on Brazilian Symposium on Information Systems: Information Systems in the Cloud Computing Era-Volume 1

    (2016)
  • O. Benomar et al.

    Detection of software evolution phases based on development activities

    Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension

    (2015)
  • E. Çergani et al.

    Investigating order information in api-usage patterns: a benchmark and empirical study

  • W.-K. Chan et al.

    Searching connected api subgraph via text phrases

    Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering

    (2012)
  • E. Clarke et al.

    Counterexample-guided abstraction refinement

  • M.B. Dwyer et al.

    Patterns in property specifications for finite-state verification

    Proceedings of the 21st International Conference on Software Engineering

    (1999)
  • J. Dzifcak et al.

    What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution

    IEEE International Conference on Robotics and Automation

    (2009)
  • M. Gabel et al.

    Javert: Fully automatic mining of general temporal properties from dynamic traces

    ACM SIGSOFT International Symposium on Foundations of Software Engineering

    (2008)
  • M. Gabel et al.

    Online inference and enforcement of temporal properties

    Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1

    (2010)
  • H.J. Goldsby, B.H.C. Cheng, Automatically discovering properties that specify the latent behavior of UML models,...
  • S. Huppe et al.

    Mining complex temporal api usage patterns: an evolutionary approach

    Proceedings of the 39th International Conference on Software Engineering Companion

    (2017)
  • S.H. Jensen et al.

    Interprocedural analysis with lazy propagation

    International Static Analysis Symposium

    (2010)
  • D. Kebbal

    Automatic flow analysis using symbolic execution and path enumeration

    2006 International Conference on Parallel Processing Workshops (ICPPW’06)

    (2006)
  • T.-D.B. Le et al.

    Synergizing specification miners through model fissions and fusions (t)

    2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

    (2015)
  • C. Lemieux et al.

    General LTL specification mining (t)

    Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on

    (2015)
  • M. Linares-Vásquez et al.

    Mining energy-greedy api usage patterns in android apps: an empirical study

    Proceedings of the 11th Working Conference on Mining Software Repositories

    (2014)
  • D. Lo, S.-C. Khoo, Smartic: towards building an accurate, robust and scalable specification miner, in: Proceedings of...
  • D. Lo et al.

    Mining temporal rules for software maintenance

    J. Softw. Maint. Evol.

    (2008)
  • R. Nelken et al.

    Automatic translation of natural language system specifications into temporal logic

    International Conference on Computer Aided Verification

    (1996)
  • A.T. Nguyen et al.

    Grapacc: A graph-based pattern-oriented, context-sensitive code completion tool

    2012 34th International Conference on Software Engineering (ICSE)

    (2012)
  • Cited by (12)

    • Improving microservices extraction using evolutionary search

      2022, Information and Software Technology
      Citation Excerpt :

      Furthermore, finding a trade-off between granularity, coupling and cohesion within the extracted microservices limits their performance and practicality. Indeed, decomposing a software system into smaller components always has been a challenge in software engineering, and known as complex problems which are best suited to search-based software engineering (SBSE) [21–31]. In this paper, we found the current contribution on our previous work published in the 17th International Conference on Service-Oriented Computing (ICSOC) [11].

    • Client-Specific Upgrade Compatibility Checking via Knowledge-Guided Discovery

      2023, ACM Transactions on Software Engineering and Methodology
    • Combining Static and Dynamic Analysis to Decompose Monolithic Application into Microservices

      2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text