Skip to main content
Log in

Wikifying software artifacts

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

The computational linguistics community has developed tools, called wikifiers, to identify links to Wikipedia articles from free-form text. Software engineering research can leverage wikifiers to add semantic information to software artifacts. However, no empirically-grounded basis exists to choose an effective wikifier and to configure it for the software domain, on which wikifiers were not specifically trained.

Objective

We conducted a study to guide the selection of a wikifier and its configuration for applications in the software domain, and to measure what performance can be expected of wikifiers.

Method

We applied six wikifiers, with multiple configurations, to a sample of 500 Stack Overflow posts. We manually annotated the 41 124 articles identified by the wikifiers as correct or not to compare their precision and recall.

Results

Each wikifier, in turn, achieved the highest precision, between 13% and 82%, for different thresholds of recall, from 60% to 5%. However, filtering the wikifiers’ output with a whitelist can considerably improve the precision above 79% for recall up to 30%, and above 47% for recall up to 60%.

Conclusions

Results reported in each wikifier’s original article cannot be generalized to software-specific documents. Given that no wikifier performs universally better than all others, we provide empirically grounded insights to select a wikifier for different scenarios, and suggest ways to further improve their performance for the software domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Uppercase letters in superscript (e.g., A) refer to URLs listed in Appendix A.

  2. There are a few exceptions of tools that target specific types of documents, such as Twitter messages (tweets) (Cassidy et al. 2012).

  3. We mark titles of Wikipedia articles with a different Font.

  4. The score of a Stack Overflow post is visible next to the post on Stack Overflow, and represents the number of “upvotes” minus the number of “downvotes” attributed by Stack Overflow users based on the usefulness and quality of the post.

  5. One alternative is to aggregate each statistic per post first, then average them over all posts. However, this alternative gives more weight to small posts with few related articles, which would be detrimental to the interpretation and generalizability of the results.

  6. Moro et al. (2014) do not provide an explicit definition for accuracy.

  7. precision \(= \frac {TP}{TP + FP}\); recall \(= \frac {TP}{TP + FN}\)

  8. F1 score is the unweighted harmonic mean of precision and recall, or \(\frac {2 \times \text {precision} \times \text {recall}}{\text {precision}+\text {recall}}\)

  9. The AIDA-CoNLL dataset only links proper nouns (i.e., named entities), but other datasets exist for both named entities and concepts.

  10. The term “annotate”, in the wikification community, often refers to the wikification process itself, or a variant. In this article, we use “annotate” (and its derivatives) only when referring to the manual annotation of human experts to assess the correctness of an article–post pair.

  11. We express our criterion in terms of computing-related articles, rather than only those specific to software engineering, because the precise boundary of software engineering is less well defined than that of computing, especially among Wikipedia articles.

  12. There are concepts that can conceivably be related to computing in some contexts, but part of a general body of knowledge in others. Blog is such a concept: We consider that, when referring to a specific post, the concept is not a technical term, but when discussing the creation of a blogging platform, this term becomes related to computing.

  13. In addition to making grammatical errors, post authors often use natural language shortcuts such as abbreviations, acronyms, omissions, and ambiguous terms, which require additional effort from annotators to resolve. The misuse of formatting options, such as formatting code blocks as inline code, also makes posts harder to understand. For example, the scope keyword “until successful” can easily be misread as a preposition and an adjective if it is not formatted as inline code.Q

  14. Annotators cannot assume the topic of a Wikipedia article only by its title. For example, the article Java does not describe the programming language, but the Indonesian island. Also, although most titles that consist only of three capital letters lead to disambiguation pages, the article URL does not. Redirect titles further add to the possible disconnect between titles and article content. Thus, annotators must make the effort to scan the article to verify what it actually describes.

  15. The number of distinct articles is slightly less than 1098 and 10 854, respectively, because some titles are actually redirect pages to other articles.

  16. An additional 15 previously identified articles become MCS matches, due to unpredictable factors of the wikification algorithms.

  17. Instead of restricting the correlation to the overlap, another possible approach would have been to use all articles, and assign a confidence of 0 to articles not found by wikifiers. This approach, however, is sensible to noise due to differences in the knowledge bases (and their version) used to train the wikifiers. Using this approach, we observed correlation scores all near zero. Therefore, we present the more useful results using only the overlaps.

  18. The τb variant of the statistics is explicitly tuned to account for ties, which is especially important for the discretized confidence scores of DBpedia’s results.

  19. The ISO two-letter code for the Burmese language is “my”, an homograph of the possessive determiner, which appears in many posts.

  20. This musical is often linked to the word your, an obvious incorrect artifact from the training phase.

References

  • Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? An analysis of topics and trends in stack overflow. Empir Softw Eng 19 (3):619–654

    Article  Google Scholar 

  • Bourque P, Fairley RE (2014) Guide to the software engineering body of knowledge, 3rd edn. IEEE Computer Society Press. https://www.swevok.org

  • Brank J, Leban G, Grobelnik M (2017) Annotating documents with relevant wikipedia concepts. In: Proceedings of the Slovenian conference on data mining and data warehouses, p 4

  • Carvalho NR, Almeida JJ, Henriques PR, Varanda MJ (2015) From source code identifiers to natural language terms. J Syst Softw 100:117–128

    Article  Google Scholar 

  • Cassidy T, Ji H, Ratinov LA, Zubiaga A, Huang H (2012) Analysis and enhancement of wikification for microblogs with context expansion. In: Proceedings of the 24th international conference on computational linguistics, pp 441–456

  • Chen C, Xing Z, Wang X (2017) Unsupervised software-specific morphological forms inference from informal discussions. In: Proceedings of the 39th international conference on software engineering, pp 450–461

  • Chen C, Xing Z, Liu Y (2018) What’s Spain’s Paris? Mining analogical libraries from Q&A discussions. Empir Softw Eng 24(3):1155–1194

    Article  Google Scholar 

  • Cheng X, Roth D (2013) Relational inference for wikification. In: Proceedings of the conference on empirical methods in natural language processing, pp 1787–1796

  • Cleland-Huang J, Gotel OCZ, Huffman Hayes J, Mäder P, Zisman A (2014) Software traceability: trends and future directions. In: Proceedings of the on future of software engineering, pp 55–69

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd international conference on World Wide Web, pp 249–260

  • Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic system, pp 121–124

  • Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press

  • Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM international conference on information and knowledge management, pp 1625–1628

  • Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the ACL conference on empirical methods in natural language processing, pp 782–792

  • ISO/IEC/IEEE (2017) International standard—systems and software engineering—vocabulary. Standard 24765:2017, ISO/IEC/IEEE

  • Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93

    Article  Google Scholar 

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    Article  Google Scholar 

  • Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia—a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2):167–195

    Article  Google Scholar 

  • Ma S, Xing Z, Chen C, Chen C, Qu L, Li G (2019) Easy-to-deploy api extraction by multi-level feature embedding and transfer learning. IEEE Trans Softw Eng 15 pp, to appear

  • Meij E, Weerkamp W, de Rijke M (2012) Adding semantics to microblog posts. In: Proceedings of the 5th ACM international conference on web search and data mining, pp 563–572

  • Mendes PN, Jakob M, Garcia-Silva A, Bizer C (2011) DBpedia Spotlight: shedding light on the web of documents. In: Proceedings of the 7th international conference on semantic systems, pp 1–8

  • Mihalcea R, Chklovski T, Kilgarriff A (2004) The senseval-3 English lexical sample task. In: Proceedings of the third international workshop on the evaluation of systems for the semantic analysis of text, pp 25–28

  • Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 509–518

  • Milne D, Witten IH (2013) An open-source toolkit for mining wikipedia. Artif Intell 194:222–239

    Article  MathSciNet  Google Scholar 

  • Moro A, Raganato A, Navigli R (2014) Entity linking meets word sense disambiguation: a unified approach. Trans Assoc Comput Linguist 2:231–244

    Article  Google Scholar 

  • Nassif M, Treude C, Robillard MP (2020) Automatically categorizing software technologies. IEEE Trans Softw Eng 46(1):20–32

    Article  Google Scholar 

  • Navigli R, Ponzetto SP (2012) BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif Intell 193:217–250

    Article  MathSciNet  Google Scholar 

  • Navigli R, Jurgens D, Vannella D (2013) SemEval-2013 task 12: multilingual word sense disambiguation. In: Second joint conference on lexical and computational semantics, vol 2. Proceedings of the seventh international workshop on semantic evaluation, pp 222–231

  • Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford InfoLab

  • Patil S (2017) Concept-based classification of software defect reports. In: Proceedings of the 14th international conference on mining software repositories, pp 182–186

  • Piccinno F, Ferragina P (2014) From TagME to WAT: a new entity annotator. In: Proceedings of the first international workshop on entity recognition & disambiguation, pp 55–62

  • Ponzanelli L, Bacchelli A, Lanza M (2013) Seahawk: stack overflow in the ide. In: Proceedings of the 35th international conference on software engineering, pp 1295–1298

  • Ratinov L, Roth D, Downey D, Anderson M (2011) Local and global algorithms for disambiguation to wikipedia. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1, pp 1375–1384

  • Rebele T, Suchanek F, Hoffart J, Biega J, Kuzey E, Weikum G (2016) YAGO: a multilingual knowledge base from wikipedia, wordnet and geonames. In: Proceedings of the international semantic web conference, pp 177–185

  • Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of the 35th IEEE/ACM international conference on software engineering, pp 832–841

  • Schindler M, Fox O, Rausch A (2015) Clustering source code elements by semantic similarity using wikipedia. In: Proceedings of the fourth international workshop on realizing artificial intelligence synergies in software engineering, pp 13–18

  • Seyler D, Dembelova T, Del Corro L, Hoffart J, Weikum G (2018) A study of the importance of external knowledge in the named entity recognition task. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 2: short papers), pp 241–246

  • Shen W, Wang J, Han J (2015) Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans Knowl Data Eng 27(2):443–460

    Article  Google Scholar 

  • Sundheim BM (1995) Overview of results of the MUC-6 evaluation. In: Proceedings of the 6th conference on message understanding, pp 13–31

  • Szymański J, Naruszewicz M (2019) Review on wikification methods. AI Commun 32(3):235–251

    Article  MathSciNet  Google Scholar 

  • Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL, pp 142–147

  • Treude C, Robillard MP (2016) Augmenting API documentation with insights from stack overflow. In: Proceedings of the IEEE/ACM 38th international conference on software engineering, pp 392–403

  • Usbeck R, Röder M, Ngonga Ngomo AC, Baron C, Both A, Brümmer M, Ceccarelli D, Cornolti M, Cherix D, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R, Piccinno F, Rizzo G, Sack H, Speck R, Troncy R, Waitelonis J, Wesemann L (2015) GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th international conference on World Wide Web, pp 1133–1143

  • Vincent N, Johnson I, Hecht B (2018) Examining Wikipedia with a broader lens: quantifying the value of Wikipedia’s relationship with other large-scale online communities. In: Proceedings of the CHI conference on human factors in computing systems, pp 1–13

  • Wang C, Peng X, Liu M, Xing Z, Bai X, Xie B, Wang T (2019) A learning-based approach for automatic construction of domain glossary from source code and documentation. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 97–108

  • Wikipedia (2019) Wikipedia: manual of style/linking. https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking. Accessed 2020-01-06

  • Xun G, Jia X, Gopalakrishnan V, Zhang A (2017) A survey on context learning. IEEE Trans Knowl Data Eng 29(1):38–56

    Article  Google Scholar 

  • Ye D, Xing Z, Foo CY, Ang ZQ, Li J, Kapre N (2016a) Software-specific named entity recognition in software engineering social content. In: Proceedings of the IEEE 23rd international conference on software analysis, evolution, and reengineering, pp 90–101

  • Ye D, Xing Z, Foo C Y, Li J, Kapre N (2016b) Learning to extract api mentions from informal natural language discussions. In: IEEE international conference on software maintenance and evolution, pp 389–399

  • Ye X, Shen H, Ma X, Bunescu R, Liu C (2016c) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering, pp 404–415

  • Ye D, Bao L, Xing Z, Lin S W (2018) APIReal: an api recognition and linking approach for online developer forums. Empir Softw Eng 23 (6):3129–3160

    Article  Google Scholar 

  • Zhao X, Xing Z, Kabir MA, Sawada N, Li J, Lin SW (2017) HDSKG: harvesting domain specific knowledge graph from content of webpages. In: Proceedings of the IEEE 24th international conference on software analysis, evolution and reengineering, pp 56–67

Download references

Acknowledgements

We are grateful to the external annotators for helping with the manual annotation of the wikifiers output. This work is funded by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathieu Nassif.

Additional information

Communicated by: Xin Peng

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: List of Links to External Resources

Appendix A: List of Links to External Resources

  1. A.

    https://stackoverflow.com/questions

  2. B.

    https://en.wikipedia.org/wiki/Main_Page

  3. C.

    https://www.reddit.com/

  4. D.

    https://en.wikipedia.org/wiki/Software_system

  5. E.

    https://archive.org/details/stackexchange

  6. F.

    https://jsoup.org/

  7. G.

    https://github.com/ambiverse-nlu/ambiverse-nlu

  8. H.

    http://babelfy.org/

  9. I.

    https://www.dbpedia-spotlight.org/

  10. J.

    https://cogcomp.seas.upenn.edu/page/software_view/Wikifier

  11. K.

    http://wikifier.org/

  12. L.

    https://services.d4science.org/web/tagme/wat-api

  13. M.

    https://doi.org/10.5281/zenodo.4442458

  14. N.

    https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking

  15. O.

    https://stackoverflow.com/questions/2348415

  16. P.

    https://stackoverflow.com/questions/55228245

  17. Q.

    https://stackoverflow.com/questions/41998618

  18. R.

    https://stackoverflow.com/questions/55893389

  19. S.

    https://stackoverflow.com/questions/21646135

  20. T.

    https://stackoverflow.com/questions/16516936

  21. U.

    https://en.wikipedia.org/w/index.php?title=Database_management_system&diff=544579037&oldid=544577010

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nassif, M., Robillard, M.P. Wikifying software artifacts. Empir Software Eng 26, 31 (2021). https://doi.org/10.1007/s10664-020-09918-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-020-09918-4

Keywords

Navigation