Authors:
Ervina Çergani
1
;
Sebastian Proksch
2
;
Sarah Nadi
3
and
Mira Mezini
1
Affiliations:
1
Software Technology Group, Technische Universität Darmstadt, Darmstadt and Germany
;
2
Software Evolution and Architecture Lab, University of Zürich, Zürich and Switzerland
;
3
Department of Computing Science, University of Alberta, Alberta and Canada
Keyword(s):
API Usage Pattern Types, Code Repositories, Events Mining, Empirical Evaluation, Benchmark.
Abstract:
Many approaches have been proposed for learning Application Programming Interface (API) usage patterns from code repositories. Depending on the underlying technique, the mined patterns may (1) be strictly sequential, (2) consider partial order between method calls, or (3) not consider order information. Understanding the trade-offs between these pattern types with respect to real code is important in many applications (e.g. code recommendation or misuse detection). In this work, we present a benchmark consisting of an episode mining algorithm that can be configured to learn all three types of patterns mentioned above. Running our benchmark on an existing dataset of 360 C# code repositories, we empirically study the resulting API usage patterns per pattern type. Our results show practical evidence that not only do partial-order patterns represent a generalized super set of sequential-order patterns, partial-order mining also finds additional patterns missed by sequence mining, which a
re used by a larger number of developers across code repositories. Additionally, our study empirically quantifies the importance of the order information encoded in sequential and partial-order patterns for representing correct co-occurrences of code elements in real code. Furthermore, our benchmark can be used by other researchers to explore additional properties of API patterns.
(More)