Skip to main content
Log in

Toward accurate link between code and software documentation

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Recovering traceability links between source code and software documentation is an important research topic in software maintenance and software reuse. There have been a lot of research efforts in recovering traceability between documentation and code elements (class, interface, method, etc.), mostly based on program analysis. However, there are still a lot of noise links being established in existing work. In this paper, we propose a novel approach to classifying code elements, occurring in a document, into contextual code elements and salient code elements. As a result, we can filter the noise traceability links between a software document and its contextual code elements and get a higher quality link set. Our classifier is trained based on open source project Lucene’s source code and 1899 StackOverflow answer documents about Lucene. We extract code elements from these documents and represent each of these code elements with a 7-dimension feature vector, then we use a decision-tree-based learning model to classify them as salient or not. In the experiments, we get a precision of 70.7% in recognizing the salient code elements of these documents and get 12% improvement compared with Rigby’s work. We can filter out 56.5%~69.3% noise traceability links with different thresholds in our classifier. It can improve the quality of traceability links between source code and their related software documents in application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Antoniol G, Canfora G, Casazza G, et al. Recovering traceability links between code and documentation. IEEE Trans Softw Eng, 2002, 28: 970–983

    Article  Google Scholar 

  2. Marcus A, Maletic J I. Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th International Conference on Software Engineering, Portland, 2003. 125–135

    Google Scholar 

  3. Robillard M P, Marcus A, Treude C, et al. On-demand developer documentation. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME 2017), Shanghai, 2017. 479–483

    Chapter  Google Scholar 

  4. Bacchelli A, D’Ambros M, Lanza M, et al. Benchmarking lightweight techniques to link e-mails and source code. In: Proceedings of the 16th Working Conference on Reverse Engineering (WCRE 2009), Lille, 2009. 205–214

    Chapter  Google Scholar 

  5. Bacchelli A, Lanza M, Robbes R. Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, Cape Town, 2010. 375–384

    Google Scholar 

  6. Dagenais B, Robillard M P. Recovering traceability links between an API and its learning resources. In: Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Zurich, 2012. 47–57

    Chapter  Google Scholar 

  7. Rigby P C, Robillard M P. Discovering essential code elements in informal documentation. In: Proceedings of the 2013 International Conference on Software Engineering, San Francisco, 2013. 832–841

    Google Scholar 

  8. McMillan C, Poshyvanyk D, Revelle M. Combining textual and structural analysis of software artifacts for traceability link recovery. In: Proceedings of ICSE Workshop on Traceability in Emerging Forms of Software Engineering. Washington: IEEE Computer Society, 2009. 41–48

    Google Scholar 

  9. Panichella A, McMillan C, Moritz E, et al. When and how using structural information to improve ir-based traceability recovery. In: Proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR 2013), Genova, 2013. 199–208

    Chapter  Google Scholar 

  10. Subramanian S, Inozemtseva L, Holmes R. Live API documentation. In: Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), Hyderabad, 2014. 643–652

    Chapter  Google Scholar 

  11. Petrosyan G, Robillard M P, Mori R D. Discovering information explaining API types using text classification. In: Proceedings of the 37th International Conference on Software Engineering-Volume 1, Florence, 2015. 869–879

    Google Scholar 

  12. Jiang H, Zhang J, Li X, et al. A more accurate model for finding tutorial segments explaining APIs. In: Proceedings of IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016), Suita, 2016. 157–167

    Google Scholar 

  13. Zou Y Z, Ye T, Lu Y Y, et al. Learning to rank for question-oriented software text retrieval. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015), Lincoln, 2015. 1–11

    Google Scholar 

  14. Lin Z Q, Xie B, Zou Y Z, et al. Intelligent development environment and software knowledge graph. J Comput Sci Technol, 2017, 32: 242–249

    Article  Google Scholar 

  15. Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. ArXiv:1301.3781

  16. Friedman J H. Greedy function approximation: a gradient boosting machine. Ann Stat, 2001, 29: 1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  17. Friedman J H. Stochastic gradient boosting. Comput Stat Data Anal, 2002, 38: 367–378

    Article  MathSciNet  MATH  Google Scholar 

  18. Tsuchiya R, Kato T, Washizaki H, et al. Recovering traceability links between requirements and source code in the same series of software products. In: Proceedings of the 17th International Software Product Line Conference, Tokyo, 2013. 121–130

    Google Scholar 

  19. Tsuchiya R, Washizaki H, Fukazawa Y, et al. Recovering traceability links between requirements and source code using the configuration management log. IEICE Trans Inf Syst, 2015, 98: 852–862

    Article  Google Scholar 

  20. Xu Y, Liu C. Research on retrieval methods for traceability between Chinese documentation and source code based on LDA. Comput Eng Appl, 2013, 49: 70–76

    Google Scholar 

  21. Lai G, Wang X, Liu C. Analysis and improvement on retrieval methods for traceability links between source code and documentation. ACTA Electron Sin, 2009, 37: 22–30

    Google Scholar 

  22. Yang B, Liu C. Research on traceability recovery between documentation and source code based on software structure. J Front Comput Sci Tech, 2014, 6: 7

    Google Scholar 

  23. Ye X, Shen H, Ma X, et al. From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, Austin, 2016. 404–415

    Google Scholar 

  24. Rahimi M, Goss W, Cleland-Huang J. Evolving requirements-to-code trace links across versions of a software system. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME 2016), Raleigh, 2016. 99–109

    Chapter  Google Scholar 

  25. Zhang Y, Lo D, Xia X, et al. Inferring links between concerns and methods with multi-abstraction vector space model. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME 2016), Raleigh, 2016. 110–121

    Chapter  Google Scholar 

  26. Kim S, Kim H Y, Kim J A, et al. A study on traceability between documents of a software R&D project. In: Advanced Multimedia and Ubiquitous Engineering. Berlin: Springer, 2016. 203–210

    Chapter  Google Scholar 

  27. de Lucia A, Fasano F, Oliveto R, et al. Enhancing an artefact management system with traceability recovery features. In: Proceedings of the 20th International Conference on Software Maintenance (ICSM 2004), Chicago, 2004. 306–315

    Chapter  Google Scholar 

  28. Nishikawa K, Washizaki H, Fukazawa Y, et al. Recovering transitive traceability links among software artifacts. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME 2015), Bremen, 2015. 576–580

    Chapter  Google Scholar 

  29. Ye D, Xing Z, Foo C Y, et al. Learning to extract api mentions from informal natural language discussions. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME 2016), Raleigh, 2016. 389–399

    Chapter  Google Scholar 

  30. Sridhara G, Hill E, Muppaneni D, et al. Towards automatically generating summary comments for java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (ASE 2010), Antwerp, 2010. 43–52

    Google Scholar 

  31. Eddy B P, Kraft N A. Using structured queries for source code search. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014. 431–435

    Google Scholar 

  32. Ponzanelli L, Mocci A, Bacchelli A, et al. Improving low quality stack overflow post detection. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014. 541–544

    Google Scholar 

  33. Lin Y, Liu Z, Sun M, et al. Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, 2015. 2181–2187

    Google Scholar 

  34. Creation O W L. To generate the ontology from Java source code. Int J Adv Comput Sci Appl, 2011, 2: 111–116

    Google Scholar 

  35. McMillan C, Grechanik M, Poshyvanyk D, et al. Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd International Conference on Software Engineering, Waikiki, 2011. 111–120

    Google Scholar 

  36. Bajracharya S K, Ossher J, Lopes C V. Leveraging usage similarity for effective retrieval of examples in code repositories. In: Proceedings of the 18th ACM SIGSOFT international symposium on Foundations of software engineering, Santa Fe, 2010. 157–166

    Google Scholar 

  37. Butler S, Wermelinger M, Yu Y J. Investigating naming convention adherence in Java references. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME 2015), Bremen, 2015. 41–50

    Chapter  Google Scholar 

Download references

Acknowledgements

This paper was supported by National Key Research and Development Project of China (Grant No. 2016YFB1000804) and National Natural Science Fund for Distinguished Young Scholars (Grant No. 61525201).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanzhen Zou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Zou, Y., Luo, Y. et al. Toward accurate link between code and software documentation. Sci. China Inf. Sci. 61, 050105 (2018). https://doi.org/10.1007/s11432-017-9402-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-017-9402-3

Keywords

Navigation