skip to main content
10.1145/3236024.3236036acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Complementing global and local contexts in representing API descriptions to improve API retrieval tasks

Published:26 October 2018Publication History

ABSTRACT

When being trained on API documentation and tutorials, Word2vec produces vector representations to estimate the relevance between texts and API elements. However, existing Word2vec-based approaches to measure document similarities aggregate Word2vec vectors of individual words or APIs to build the representation of a document as if the words are independent. Thus, the semantics of API descriptions or code fragments are not well represented.

In this work, we introduce D2Vec, a new model that fits with API documentation better than Word2vec. D2Vec is a neural network model that considers two complementary contexts to better capture the semantics of API documentation. We first connect the global context of the current API topic under description to all the text phrases within the description of that API. Second, the local orders of words and API elements in the text phrases are maintained in computing the vector representations for the APIs. We conducted an experiment to verify two intrinsic properties of D2Vec's vectors: 1) similar words and relevant API elements are projected into nearby locations; and 2) some vector operations carry semantics. We demonstrate the usefulness and good performance of D2Vec in three applications: API code search (text-to-code retrieval), API tutorial fragment search (code-to-text retrieval), and mining API mappings between software libraries (code-to-code retrieval). Finally, we provide actionable insights and implications for researchers in using our model in other applications with other types of documents.

References

  1. M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In Proceedings of the 2014 International Symposium on Foundations of Software Engineering, FSE’14, pages 281–293. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Suggesting accurate method and class names. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 38–49. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th IEEE Working Conference on Mining Software Repositories (MSR’13), pages 207–216. IEEE CS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In Proceedings of the 32nd International Conference on Machine Learning, ICML ’15. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proceedings of International Conference on Software Engineering, ICSE ’06, pages 361–370. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Apache documentation. https://httpd.apache.org/docs/.Google ScholarGoogle Scholar
  7. E. Arisoy, T. N. Sainath, B. Kingsbury, and B. Ramabhadran. Deep neural network language models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, WLM ’12, pages 20–28. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: A search engine for open source code supporting structure-based search. In Proceedings of the 2006 ACM International Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 681–682. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, Mar. 2003. Google ScholarGoogle ScholarCross RefCross Ref
  10. P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263–311, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W.-K. Chan, H. Cheng, and D. Lo. Searching Connected API Subgraph via Text Phrases. In Proceedings of the 20th International Symposium on the Foundations of Software Engineering, FSE ’12, pages 10:1–10:11. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Cleland-Huang, O. C. Z. Gotel, J. Huffman Hayes, P. Mäder, and A. Zisman. Software traceability: Trends and future directions. In Proceedings of the Future of Software Engineering workshop, FOSE’14, pages 55–69. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Dagenais and M. P. Robillard. Recovering traceability links between an API and its learning resources. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 47–57. IEEE Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Desai, S. Gulwani, V. Hingorani, N. Jain, A. Karkare, M. Marron, S. R, and S. Roy. Program synthesis using natural language. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 345–356. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Gokhale, V. Ganapathy, and Y. Padmanaban. Inferring likely mappings between APIs. In Proceedings of the 35th International Conference on Software Engineering, ICSE ’13, pages 82–91. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Gu, H. Zhang, D. Zhang, and S. Kim. Deep API Learning. In Proceedings of the 2016 ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Guerrouj, D. Bourque, and P. C. Rigby. Leveraging informal documentation to summarize classes and methods in context. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Volume 2, pages 639–642. IEEE CS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Gvero and V. Kuncak. Synthesizing Java expressions from free-form queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 416– 432. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering, ICSE 2012, pages 837–847. IEEE Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Inoue, R. Yokomori, H. Fujiwara, T. Yamamoto, M. Matsushita, and S. Kusumoto. Component rank: Relative significance rank for software component search. In Proceedings of the 25th International Conference on Software Engineering, ICSE ’03, pages 14–24. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Java platform standard edition 7 documentation. http://docs.oracle.com/javase/7/docs/.Google ScholarGoogle Scholar
  22. H. Jiang, J. Zhang, X. Li, Z. Ren, and D. Lo. A more accurate model for finding tutorial segments explaining APIs. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), volume 1, pages 157–167, March 2016.Google ScholarGoogle ScholarCross RefCross Ref
  23. H. Jiang, J. Zhang, Z. Ren, and T. Zhang. An unsupervised approach for discovering relevant tutorial fragments for APIs. In Proceedings of the 39th International Conference on Software Engineering, ICSE ’17, pages 38–48. IEEE Press, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Jolliffe. Principal component analysis. Springer Verlag, New York, 2002.Google ScholarGoogle Scholar
  25. Kode java. https://kodejava.org/.Google ScholarGoogle Scholar
  26. Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 1188–1196, Bejing, China, 22–24 Jun 2014. PMLR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. CoRR, abs/1405.4053, 2014.Google ScholarGoogle Scholar
  28. C. J. Maddison and D. Tarlow. Structured generative models of natural source code. In The 31st International Conference on Machine Learning (ICML), June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. McMillan, D. Poshyvanyk, and M. Grechanik. Recommending source code examples via API call usages and documentation. In Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, RSSE ’10, pages 21–25. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Meng, X. Wang, L. Zhang, and H. Mei. A history-based matching approach to identification of framework evolution. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 353–363. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Cernocky. Strategies for training large scale neural network language models. In Proceedings of Automatic Speech Recognition and Understanding Workshop, ASRU’11. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  32. T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur. Recurrent neural network based language model. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP), ICASSP’10, pages 1045–1048. IEEE, 2010.Google ScholarGoogle Scholar
  33. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In 27th Annual Conference on Neural Information Processing Systems 2013 (NIPS’13), pages 3111– 3119, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Mou, G. Li, Z. Jin, L. Zhang, and T. Wang. TBCNN: A tree-based convolutional neural network for programming language processing. CoRR, abs/1409.5718, 2014.Google ScholarGoogle Scholar
  35. A. T. Nguyen, H. A. Nguyen, T. T. Nguyen, and T. N. Nguyen. Statistical learning approach for mining API usage mappings for code migration. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pages 457–468. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. T. Nguyen, P. C. Rigby, T. Nguyen, D. Palani, M. Karanfil, and T. N. Nguyen. Statistical translation of English texts to API code templates. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME ’18. IEEE, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. D. Nguyen, A. T. Nguyen, and T. N. Nguyen. Mapping API elements for code migration with vector representations. In Proceedings of the 38th International Conference on Software Engineering Companion, ICSE ’16, pages 756–758. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. D. Nguyen, A. T. Nguyen, H. D. Phan, and T. N. Nguyen. Exploring API embedding for API usages and applications. In Proceedings of the 39th International Conference on Software Engineering, ICSE ’17, pages 438–449. IEEE Press, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. A statistical semantic language model for source code. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pages 532–542. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Nita and D. Notkin. Using twinning to adapt programs to alternative APIs. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE ’10, pages 205–214. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. G. Petrosyan, M. P. Robillard, and R. De Mori. Discovering information explaining API types using text classification. In Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE ’15, pages 869–879. IEEE Press, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H. Phan, H. A. Nguyen, N. M. Tran, L. H. Truong, A. T. Nguyen, and T. N. Nguyen. Statistical learning of api fully qualified names in code snippets of online forums. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pages 632–642. ACM, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Puppin and F. Silvestri. The social network of Java classes. In SAC’06, pages 1409–1413. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Raghothaman, Y. Wei, and Y. Hamadi. SWIM: synthesizing what I mean. In Proceedings of the 38th International Conference on Software Engineering, ICSE’16. ACM Press, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 419–428. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 832–841. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. D. Sorbo, S. Panichella, C. A. Visaggio, M. D. Penta, G. Canfora, and H. C. Gall. Development emails content analyzer: Intention mining in developer discussions. In Proceedings of International Conference on Automated Software Engineering, ASE ’15. IEEE, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker. Towards automatically generating summary comments for Java methods. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE ’10, pages 43–52. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Subramanian, L. Inozemtseva, and R. Holmes. Live API documentation. In Proceedings of the 36th International Conference on Software Engineering, ICSE ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA Nguyen, Tran, Phan, Nguyen, Truong, Nguyen, Nguyen, and Nguyen 2014, pages 643–652. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. T. V. Nguyen, A. T. Nguyen, and T. N. Nguyen. Characterizing API elements in software documentation with vector representation. In Proceedings of the 38th International Conference on Software Engineering Companion, ICSE ’16, pages 749–751. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. W. Wu, Y.-G. Guéhéneuc, G. Antoniol, and M. Kim. Aura: A hybrid approach to identify framework evolution. In Proceedings of the ACM/IEEE International Conference on Software Engineering, ICSE ’10, pages 325–334. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Yang and L. Tan. Swordnet: Inferring semantically related words from software context. Empirical Softw. Engg., 19(6):1856–1886, Dec. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. X. Ye, R. Bunescu, and C. Liu. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 689–699. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. X. Ye, H. Shen, X. Ma, R. Bunescu, and C. Liu. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 404–415. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. W. Zheng, Q. Zhang, and M. Lyu. Cross-library API recommendation using web search engines. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11, pages 480–483. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. H. Zhong, S. Thummalapenta, T. Xie, L. Zhang, and Q. Wang. Mining API mapping for language migration. In Proceedings of International Conference on Software Engineering, ICSE ’10, pages 195–204. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 14–24. IEEE Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Complementing global and local contexts in representing API descriptions to improve API retrieval tasks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
      October 2018
      987 pages
      ISBN:9781450355735
      DOI:10.1145/3236024

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate112of543submissions,21%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader