Skip to main content
Log in

Discovering API Directives from API Specifications with Text Classification

Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Application programming interface (API) libraries are extensively used by developers. To correctly program with APIs and avoid bugs, developers shall pay attention to API directives, which illustrate the constraints of APIs. Unfortunately, API directives usually have diverse morphologies, making it time-consuming and error-prone for developers to discover all the relevant API directives. In this paper, we propose an approach leveraging text classification to discover API directives from API specifications. Specifically, given a set of training sentences in API specifications, our approach first characterizes each sentence by three groups of features. Then, to deal with the unequal distribution between API directives and non-directives, our approach employs an under-sampling strategy to split the imbalanced training set into several subsets and trains several classifiers. Given a new sentence in an API specification, our approach synthesizes the trained classifiers to predict whether it is an API directive. We have evaluated our approach over a publicly available annotated API directive corpus. The experimental results reveal that our approach achieves an F-measure value of up to 82.08%. In addition, our approach statistically outperforms the state-of-the-art approach by up to 29.67% in terms of F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Maalej W, Robillard M P. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering, 2013, 39(9): 1264-1282. DOI: https://doi.org/10.1109/TSE.2013.12.

    Article  Google Scholar 

  2. Petrosyan G, Robillard M P, De Mori R. Discovering information explaining API types using text classification. In Proc. the 37th International Conference on Software Engineering, May 2015, pp.869-879. DOI: https://doi.org/10.1109/ICSE.2015.97.

  3. Jiang H, Zhang J X, Ren Z L, Zhang T. An unsupervised approach for discovering relevant tutorial fragments for APIs. In Proc. the 39th International Conference on Software Engineering, May 2017, pp.38-48. DOI: https://doi.org/10.1109/ICSE.2017.12.

  4. Monperrus M, Eichberg M, Tekes E, Mezini M. What should developers be aware of? An empirical study on the directives of API documentation. Empirical Software Engineering, 2012, 17(6): 703-737. DOI: https://doi.org/10.1007/s10664-011-9186-4.

    Article  Google Scholar 

  5. Dekel U, Herbsleb J D. Improving API documentation us-ability with knowledge pushing. In Proc. the 31st International Conference on Software Engineering, May 2009, pp.320-330. DOI: https://doi.org/10.1109/ICSE.2009.5070532.

  6. Dagenais B, Robillard M P. Recovering traceability links between an API and its learning resources. In Proc. the 34th IEEE/ACM International Conference on Software Engineering, June 2012, pp.47-57. DOI: https://doi.org/10.1109/ICSE.2012.6227207.

  7. Subramanian S, Inozemtseva L Holmes R. Live API documentation. In Proc. the 36th ACM/IEEE International Conference on Software Engineering, May 2014, pp.643-652. DOI: https://doi.org/10.1145/2568225.2568313.

  8. Saied M A, Sahraoui H, Dufour B. An observational study on API usage constraints and their documentation. In Proc. the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, March 2015, pp.33-42. DOI: https://doi.org/10.1109/SANER.2015.7081813.

  9. Liu X Y, Wu J X, Zhou Z H. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(2): 539-550. DOI: https://doi.org/10.1109/TSMCB.2008.2007853.

  10. Robillard M P, DeLine R. A field study of API learning obstacles. Empirical Software Engineering, 2011, 16(6): 703-732. DOI: https://doi.org/10.1007/s10664-010-9150-8.

    Article  Google Scholar 

  11. Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: A case study of bug reports. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.505-514. DOI: https://doi.org/10.1145/1806799.1806872.

  12. Jiang H, Zhang J X, Li X C, Ren Z L, Lo D. A more accurate model for _nding tutorial segments explaining APIs. In Proc. the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, March 2016, pp.157-167. DOI: https://doi.org/10.1109/SANER.2016.59.

  13. Chen D Q, Manning C D. A fast and accurate dependency parser using neural networks. In Proc. the Conference on Empirical Methods in Natural Language Processing, October 2014, pp.740-750. DOI: https://doi.org/10.3115/v1/D14-1082.

  14. Manning C D, Mihai S, John b, Jenny F, Steven J B, David M. The Stanford CoreNLP natural language processing toolkit. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, June 2014, pp.55-60. DOI: https://doi.org/10.3115/v1/P14-5010.

  15. Mirray G, Carenini G. Summarizing spoken and written conversations. In Proc. the 2008 Conference on Empirical Methods in Natural Language Processing, October 2008, pp.773-782. DOI: https://doi.org/10.3115/1613715.1613813.

  16. Panichella A, Dit B, Oliveto R, Penta M D, Poshynanyk D, Lucia A D. How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.522-531. DOI: https://doi.org/10.1109/ICSE.2013.6606598.

  17. Nguyen A T, Nguyen T T, Nguyen T N, Lo D, Sun C N. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proc. the 27th International Conference on Automated Software Engineering, September 2012, pp.70-79. DOI: https://doi.org/10.1145/2351676.2351687.

  18. Gorla A, Tavecchia I, Gross F, Zeller A. Checking app behavior against app descriptions. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.1025-1035. DOI: https://doi.org/10.1145/2568225.2568276.

  19. Bernardi M L, Sementa C, Zagarese Q, Distante D, Penta M D. What topics do Firefox and Chrome contributors discuss? In Proc. the 8th Working Conference on Mining Software Repositories, May 2011, pp.234-237. DOI: https://doi.org/10.1145/1985441.1985480.

  20. Xia X, Lo D, Shihab E, Wang X Y, Yang X H. ELBlocker: Predicting blocking bugs with ensemble imbalance learning. Information and Software Technology, 2015, 61: 93-106. DOI: https://doi.org/10.1016/j.infsof.2014.12.006.

    Article  Google Scholar 

  21. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10-18. DOI: https://doi.org/10.1145/1656274.1656278.

    Article  Google Scholar 

  22. Fu W, Menzies T, Sheng X P. Tuning for software analytics: Is it really necessary? Information and Software Technology, 2016, 76: 135-146. DOI: https://doi.org/10.1016/j.infsof.2016.04.017.

    Article  Google Scholar 

  23. Zhang C, Yang J Y, Zhang Y, Fan J, Zhang X, Zhao J J, Ou P Z. Automatic parameter recommendation for practical API usage. In Proc. the 34th International Conference on Software Engineering, June 2012, pp.826-836. DOI: https://doi.org/10.1109/ICSE.2012.6227136.

  24. Field A. Discovering Statistics Using SPSS (2nd edition). Sage, 2005.

  25. Head A, Sadowski C, Murphy-Hill E, Knight A. When not to comment: Questions and tradeoffs with API documentation for C++ projects. In Proc. the 40th International Conference on Software Engineering, May 2018, pp.643-653. DOI: https://doi.org/10.1145/3180155.3180176.

  26. Zhang J X, Jiang H, Ren Z L, Zhang T, Huang Z Q. Enriching API documentation with code samples and usage scenarios from crowd knowledge. IEEE Transactions on Software Engineering. DOI: https://doi.org/10.1109/TSE.2019.2919304.

  27. Dekel U. Increasing awareness of delocalized information to facilitate API usage [Ph.D. Thesis]. Carnegie Mellon University, 2009.

  28. Zhou Y, Gu R H, Chen T L, Huang Z Q, Panichella S, Gall H C. Analyzing APIs documentation and code to detect directive defects. In Proc. the 39th International Conference on Software Engineering, May 2017, pp.27-37. DOI: https://doi.org/10.1109/ICSE.2017.11.

  29. Zhong H, Su Z D. Detecting API documentation errors. In Proc. the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications, October 2013, pp.803-816. DOI: https://doi.org/10.1145/2509136.2509523.

  30. Shi L, Zhong H, Xie T, Li M S. An empirical study on evolution of API documentation. In Proc. the 14th International Conference on Fundamental Approaches to Software Engineering, March 26{April 3, 2011, pp.416-431. DOI: https://doi.org/10.1007/978-3-642-19811-3_29.

  31. Tan L, Yuan D, Krishna G, Zhou Y Y. /*iComment: Bugs or bad comments?*/. In Proc. the 21st ACM SIGOPS Symposium on Operating Systems Principles, October 2007, pp.145-158. DOI: https://doi.org/10.1145/1294261.1294276.

  32. Blasi A, Goffi A, Kuznetsov K, Gorla A, Ernst M D, Pezzè M, Castellanos S D. Translating code comments to procedure specifications. In Proc. the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, July 2018, pp.242-253. DOI: https://doi.org/10.1145/3213846.3213872.

  33. Zhong H, Zhang L, Xie T, Mei H. Inferring specifications for resources from natural language API documentation. Automated Software Engineering, 2011, 18(3/4): 227-261. DOI: https://doi.org/10.1007/s10515-011-0082-3.

    Article  Google Scholar 

  34. Pandita R, Taneja K, Williams L, Tung T. ICON: Inferring temporal constraints from natural language API descriptions. In Proc. the 2016 IEEE International Conference on Software Maintenance and Evolution, October 2016, pp.378-388. DOI: https://doi.org/10.1109/ICSME.2016.59.

  35. Robillard M P, Chhetri Y B. Recommending reference API documentation. Empirical Software Engineering, 2015, 20(6): 1558-1586. DOI: https://doi.org/10.1007/s10664-014-9323-y.

    Article  Google Scholar 

  36. Dagenais B, Robillard M P. Using traceability links to recommend adaptive changes for documentation evolution. IEEE Transactions on Software Engineering, 2014, 40(11): 1126-1146. DOI: https://doi.org/10.1109/TSE.2014.2347969.

    Article  Google Scholar 

  37. Treude C, Robillard M P. Augmenting API documentation with insights from Stack Overow. In Proc. the 38th IEEE/ACM International Conference on Software Engineering, May 2016, pp.392-403. DOI: https://doi.org/10.1145/2884781.2884800.

  38. Kim J, Lee S, Hwang S, Kim S. Enriching documents with examples: A corpus mining approach. ACM Transactions on Information Systems, 2013, 33(1): Article No. 1. DOI: https://doi.org/10.1145/2414782.2414783.

  39. Wu Y C, Mar L W, Jiau H C. CoDocent: Support API usage with code example and API documentation. In Proc. the 5th International Conference on Software Engineering Advances, August 2010, pp.135-140. DOI: https://doi.org/10.1109/IC-SEA.2010.28.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing-Xuan Zhang.

Supplementary Information

ESM 1

(PDF 335 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, JX., Tao, CQ., Huang, ZQ. et al. Discovering API Directives from API Specifications with Text Classification. J. Comput. Sci. Technol. 36, 922–943 (2021). https://doi.org/10.1007/s11390-021-0235-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-0235-1

Keywords

Navigation