skip to main content
research-article

Mining Unit Tests for Discovery and Migration of Math APIs

Published: 07 October 2014 Publication History

Abstract

Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance.
The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MathFinder, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MathFinder synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version.
We perform extensive evaluation of MathFinder (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MathFinder on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MathFinder for API discovery, the programmers who used MathFinder finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MathFinder to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MathFinder is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.

References

[1]
P. Abeles. 2010. Java Matrix Benchmark. http://code.google.com/p/java-matrix-benchmark/. (2010). Accessed: 2012-07-30.
[2]
P. J. Acklam. 2003. MATLAB array manipulation tips and tricks. http://home.online.no/∼pjacklam/matlab/doc/mtt/doc/mtt.pdf. (Oct. 2003).
[3]
A. V. Aho, M. Lam, R. Sethi, and J. D. Ullman. 2006. Compilers: Principles, Techniques, and Tools. Prentice Hall.
[4]
I. Balaban, F. Tip, and R. Fuhrer. 2005. Refactoring support for class library migration. In Proceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications. 265--279.
[5]
T. T. Bartolomei, K. Czarnecki, and R. Lammel. 2010a. Swing to SWT and back: Patterns for API migration by wrapping. In Proceedings of the International Conference on Software Maintenance. 1--10.
[6]
T. T. Bartolomei, K. Czarnecki, R. Lämmel, and T. van der Storm. 2010b. Study of an API migration for two XML APIs. In Proceedings of the International Conference on Software Language Engineering. 42--61.
[7]
A. Begel. 2007. Codifier: A programmer-centric search user interface. In Proceedings of the Workshop on Human-Computer Interaction and Information Retrieval.
[8]
S. Chatterjee, S. Juvekar, and K. Sen. 2009. SNIFF: A search engine for Java using free-form queries. In Proceedings of the International Conference on Fundamental Approaches to Software Engineering. 385--400.
[9]
B. E. Cossette and R. J. Walker. 2012. Seeking the ground truth: A retroactive study on the evolution and migration of software libraries. In Proceedings of the International Symposium on the Foundations of Software Engineering. 55, 1--11.
[10]
B. Dagenais and M. P. Robillard. 2009. SemDiff: Analysis and recommendation support for API evolution. In Proceedings of the International Conference on Software Engineering. 599--602.
[11]
B. N. Datta. 2004. Numerical Methods for Linear Control Systems. Elsevier, Inc.
[12]
J. Dean and S. Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (Jan. 2008), 107--113.
[13]
D. Dig, C. Comertoglu, D. Marinov, and R. Johnson. 2006. Automated detection of refactorings in evolving components. In Proceedings of the European Conference on Object-Oriented Programming. 404--428.
[14]
D. Dig and R. Johnson. 2006. How do APIs evolve? A story of refactoring. J. Softw. Maint. Evol. 18, 2 (March 2006), 83--107.
[15]
J. Dongarra. 2002. Basic linear algebra subprograms technical (blast) forum standard (1). IJHPCA 16, 1, 1--111.
[16]
E. Duala-Ekoko and M. P. Robillard. 2011. Using structure-based recommendations to facilitate discoverability in APIs. In Proceedings of the European Conference on Object-Oriented programming. 79--104.
[17]
A. Gokhale, V. Ganapathy, and Y. Padmanaban. 2013. Inferring likely mappings between APIs. In Proceedings of the International Conference on Software Engineering. 82--91.
[18]
T. Gvero, V. Kuncak, and R. Piskac. 2011. Interactive synthesis of code snippets. In Proceedings of the International Conference on Computer Aided Verification. 418--423.
[19]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explor. Newsletter 11, 1 (Nov. 2009), 10--18.
[20]
R. J. Hall. 1993. Generalized behavior-based retrieval. In Proceedings of the International Conference on Software Engineering. 371--380.
[21]
J. Henkel and A. Diwan. 2005. CatchUp!: Capturing and replaying refactorings to support API evolution. In Proceedings of the International Conference on Software Engineering. 274--283.
[22]
R. Hoffmann, J. Fogarty, and D. S. Weld. 2007. Assieme: Finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the ACM Symposium on User Interface Software and Technology. 13--22.
[23]
R. Holmes and G. C. Murphy. 2005. Using structural context to recommend source code examples. In Proceedings of the International Conference on Software Engineering. 117--125.
[24]
R. Holmes and R. J. Walker. 2013. Systematizing pragmatic software reuse. ACM Trans. Softw. Eng. Methodol. 21, 4 (Feb. 2013), 20:1--20:44.
[25]
O. Hummel, W. Janjic, and C. Atkinson. 2008. Code conjurer: Pulling reusable software out of thin air. IEEE Softw. 25, 5 (Sept. 2008), 45--52.
[26]
P. Kapur, B. Cossette, and R. J. Walker. 2010. Refactoring references for library migration. In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Languages, and Applications. 726--738.
[27]
D. Kawrykow and M. P. Robillard. 2009a. Detecting inefficient API usage. In Proceedings of the International Conference on Software Engineering, Companion. 183--186.
[28]
D. Kawrykow and M. P. Robillard. 2009b. Improving API usage through automatic detection of redundant code. In Proceedings of the International Conference on Automated Software Engineering. 111--122.
[29]
G. T. Leavens, A. L. Baker, and C. Ruby. 1999. JML: A notation for detailed design. In Behav. Spec. Businesses Syst. 175--188.
[30]
O. A. L. Lemos, S. Bajracharya, J. Ossher, P. C. Masiero, and C. Lopes. 2011. A test-driven approach to code search and its application to the reuse of auxiliary functionality. Inf. Softw. Technol. 53, 4 (April 2011), 294--306.
[31]
E. Linstead, S. K. Bajracharya, T. C. Ngo, P. Rigor, C. V. Lopes, and P. Baldi. 2009. Sourcerer: Mining and searching internet-scale software repositories. Data Min. Knowl. Discov. 18, 2, 300--336.
[32]
G. Little and R. C. Miller. 2007. Keyword programming in Java. In Proceedings of the International Conference on Automated Software Engineering. 84--93.
[33]
D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. 2005. Jungloid mining: Helping to navigate the API jungle. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 48--61.
[34]
C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. I--XXI, 1--482 pages.
[35]
C. McMillan, M. Grechanik, D. Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: Finding relevant functions and their usage. In Proceedings of the International Conference on Software Engineering. 111--120.
[36]
S. Meng, X. Wang, L. Zhang, and H. Mei. 2012. A history-based matching approach to identification of framework evolution. In Proceedings of the International Conference on Software Engineering. 353--363.
[37]
A. Mishne, S. Shoham, and E. Yahav. 2012. Typestate-based semantic code search over partial programs. In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Languages, and Applications. 997--1016.
[38]
H. A. Nguyen, T. T. Nguyen, G. Wilson Jr., A. T. Nguyen, M. Kim, and T. N. Nguyen. 2010. A graph-based approach to API usage adaptation. In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Languages, and Applications. 302--321.
[39]
M. Nita and D. Notkin. 2010. Using twinning to adapt programs to alternative APIs. In Proceedings of the International Conference on Software Engineering. 205--214.
[40]
D. Perelman, S. Gulwani, T. Ball, and D. Grossman. 2012. Type-directed completion of partial expressions. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 275--286.
[41]
J. H. Perkins. 2005. Automatically generating refactorings to support API evolution. In Proceedings of the ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (Lisbon, Portugal), 111--114.
[42]
P. Persson. 2007. MIT 18.335: Introduction to Numerical Methods (Fall 2007). http://persson.berkeley. edu/18.335/. (2007). Accessed: 2012-07-30.
[43]
A. Podgurski and L. Pierce. 1992. Behavior sampling: A technique for automated retrieval of reusable components. In Proceedings of the International Conference on Software Engineering. 349--361.
[44]
S. P. Reiss. 2009. Semantics-based code search. In Proceedings of the International Conference on Software Engineering. 243--253.
[45]
M. Rittri. 1990. Retrieving library identifiers via equational matching of types. In Proceedings of the International Conference on Automated Deduction. 603--617.
[46]
M. P. Robillard, E. Bodden, D. Kawrykow, M. Mezini, and T. Ratchford. 2013. Automated API property inference techniques. IEEE Trans. Softw. Eng. 39, 5 (May 2013), 613--637.
[47]
N. Sahavechaphan and K. Claypool. 2006. XSnippet: Mining for sample code. In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Languages, and Applications. 413--430.
[48]
A. Santhiar, O. Pandita, and A. Kanade. 2013. Discovering math APIs by mining unit tests. In Proceedings of the International Conference on Fundamental Approaches to Software Engineering. 327--342.
[49]
T. Schäfer, J. Jonas, and M. Mezini. 2008. Mining framework usage changes from instantiation code. In Proceedings of the International Conference on Software Engineering. 471--480.
[50]
J. Stylos and B. A. Myers. 2008. The implications of method placement on API learnability. In Proceedings of the International Symposium on the Foundations of Software Engineering. 105--112.
[51]
C. Teyton, J. Falleri, and X. Blanc. 2012. Mining library migration graphs. In Proceedings of the Working Conference on Reverse Engineering. 289--298.
[52]
S. Thummalapenta and T. Xie. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the International Conference on Automated Software Engineering. 204--213.
[53]
J. Vesanto, J. Himberg, E. Alhoniemi, and J. Parhankangas. 2000. Self-organizing map in Matlab: The SOM toolbox. In Proceedings of the Matlab DSP Conference. 35--40.
[54]
P. Weissgerber and S. Diehl. 2006. Identifying refactorings from source-code changes. In Proceedings of the International Conference on Automated Software Engineering. 231--240.
[55]
W. Wu, Y. Guéhéneuc, G. Antoniol, and M. Kim. 2010. AURA: A hybrid approach to identify framework evolution. In Proceedings of the International Conference on Software Engineering. 325--334.
[56]
Y. Ye and G. Fischer. 2002. Supporting reuse by delivering task-relevant and personalized information. In Proceedings of the International Conference on Software Engineering. 513--523.
[57]
K. Yessenov, Z. Xu, and A. Solar-Lezama. 2011. Data-driven synthesis for object-oriented frameworks. In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Languages, and Applications. 65--82.
[58]
K. Yuen. 2010. Bayesian Methods for Structural Dynamics and Civil Engineering. John Wiley & Sons.
[59]
A. M. Zaremski and J. M. Wing. 1993. Signature matching: A key to reuse. In Proceedings of the International Symposium on the Foundations of Software Engineering. 182--190.
[60]
A. M. Zaremski and J. M. Wing. 1997. Specification matching of software components. ACM Trans. Softw. Eng. Methodol. 6, 4 (Oct. 1997), 333--369.
[61]
H. Zhong, S. Thummalapenta, T. Xie, L. Zhang, and Q. Wang. 2010. Mining API mapping for language migration. In Proceedings of the International Conference on Software Engineering. 195--204.

Cited By

View all
  • (2022)Mining Software Information Sites to Recommend Cross-Language Analogical Libraries2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00109(913-924)Online publication date: Mar-2022
  • (2021)Mining Likely Analogical APIs Across Third-Party Libraries via Large-Scale Unsupervised API Semantics EmbeddingIEEE Transactions on Software Engineering10.1109/TSE.2019.289612347:3(432-447)Online publication date: 1-Mar-2021
  • (2021)A Multi-Metric Ranking Approach for Library Migration Recommendations2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER50967.2021.00016(72-83)Online publication date: Mar-2021
  • Show More Cited By

Recommendations

Reviews

Richard John Botting

Unit tests are not just about testing [1]. A unit test is a piece of code that executes a part of a program (the unit) and checks to see if it worked. Therefore, the test documents a way to use the unit. Two teams (at least) are independently exploring ways to use this information. This paper shows how a tool (MathFinder) can use unit tests to help a programmer select a math library suitable for an algorithm. Once selected, the tool proposes detailed code for the algorithm using the library's application programming interface (API). The paper describes experiments that show how a typical maintenance project within a Java/JUnit/Eclipse plus Scilab environment is done faster when programmers use MathFinder. The results may generalize to other environments. The key idea is to specify requirements as unit tests for a very high-level interpreted language, and secondly, as queries to search an index of unit tests in a lower-level language plus API. MathFinder acts as a partial compiler and produces a list of possible sequences of function calls that pass the tests. Apparently, 90 percent of the time the top of the list is a suitable piece of code to implement the given algorithm. This is a typical research paper in the software engineering field and will interest fellow researchers. Meanwhile, a quarter of the way round the world, another team [2] (not referred to here) is also starting to mine unit tests to recommend code to programmers. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 24, Issue 1
September 2014
226 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/2676679
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2014
Accepted: 01 May 2014
Revised: 01 January 2014
Received: 01 August 2013
Published in TOSEM Volume 24, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. API discovery
  2. API migration
  3. mathematical computation
  4. mining
  5. unit tests

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)3
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Mining Software Information Sites to Recommend Cross-Language Analogical Libraries2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00109(913-924)Online publication date: Mar-2022
  • (2021)Mining Likely Analogical APIs Across Third-Party Libraries via Large-Scale Unsupervised API Semantics EmbeddingIEEE Transactions on Software Engineering10.1109/TSE.2019.289612347:3(432-447)Online publication date: 1-Mar-2021
  • (2021)A Multi-Metric Ranking Approach for Library Migration Recommendations2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER50967.2021.00016(72-83)Online publication date: Mar-2021
  • (2020)Discovering discrepancies in numerical librariesProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3395363.3397380(488-501)Online publication date: 18-Jul-2020
  • (2020)Monotone Precision and Recall Measures for Comparing Executions and Specifications of Dynamic SystemsACM Transactions on Software Engineering and Methodology10.1145/338790929:3(1-41)Online publication date: 1-Jun-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media