Wikifying software artifacts

Nassif, Mathieu; Robillard, Martin P.

doi:10.1007/s10664-020-09918-4

Wikifying software artifacts

Published: 11 March 2021

Volume 26, article number 31, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

379 Accesses
4 Citations
Explore all metrics

Abstract

Context

The computational linguistics community has developed tools, called wikifiers, to identify links to Wikipedia articles from free-form text. Software engineering research can leverage wikifiers to add semantic information to software artifacts. However, no empirically-grounded basis exists to choose an effective wikifier and to configure it for the software domain, on which wikifiers were not specifically trained.

Objective

We conducted a study to guide the selection of a wikifier and its configuration for applications in the software domain, and to measure what performance can be expected of wikifiers.

Method

We applied six wikifiers, with multiple configurations, to a sample of 500 Stack Overflow posts. We manually annotated the 41 124 articles identified by the wikifiers as correct or not to compare their precision and recall.

Results

Each wikifier, in turn, achieved the highest precision, between 13% and 82%, for different thresholds of recall, from 60% to 5%. However, filtering the wikifiers’ output with a whitelist can considerably improve the precision above 79% for recall up to 30%, and above 47% for recall up to 60%.

Conclusions

Results reported in each wikifier’s original article cannot be generalized to software-specific documents. Given that no wikifier performs universally better than all others, we provide empirically grounded insights to select a wikifier for different scenarios, and suggest ways to further improve their performance for the software domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing

The Use of Artificial Intelligence in Writing Scientific Review Articles

Article Open access 16 January 2024

Melissa A. Kacena, Lilian I. Plotkin & Jill C. Fehrenbacher

Artificial intelligence to automate the systematic review of scientific literature

Article Open access 11 May 2023

José de la Torre-López, Aurora Ramírez & José Raúl Romero

Notes

Uppercase letters in superscript (e.g., ^A) refer to URLs listed in Appendix A.
There are a few exceptions of tools that target specific types of documents, such as Twitter messages (tweets) (Cassidy et al. 2012).
We mark titles of Wikipedia articles with a different Font.
The score of a Stack Overflow post is visible next to the post on Stack Overflow, and represents the number of “upvotes” minus the number of “downvotes” attributed by Stack Overflow users based on the usefulness and quality of the post.
One alternative is to aggregate each statistic per post first, then average them over all posts. However, this alternative gives more weight to small posts with few related articles, which would be detrimental to the interpretation and generalizability of the results.
Moro et al. (2014) do not provide an explicit definition for accuracy.
precision \(= \frac {TP}{TP + FP}\); recall \(= \frac {TP}{TP + FN}\)
F1 score is the unweighted harmonic mean of precision and recall, or \(\frac {2 \times \text {precision} \times \text {recall}}{\text {precision}+\text {recall}}\)
The AIDA-CoNLL dataset only links proper nouns (i.e., named entities), but other datasets exist for both named entities and concepts.
The term “annotate”, in the wikification community, often refers to the wikification process itself, or a variant. In this article, we use “annotate” (and its derivatives) only when referring to the manual annotation of human experts to assess the correctness of an article–post pair.
We express our criterion in terms of computing-related articles, rather than only those specific to software engineering, because the precise boundary of software engineering is less well defined than that of computing, especially among Wikipedia articles.
There are concepts that can conceivably be related to computing in some contexts, but part of a general body of knowledge in others. Blog is such a concept: We consider that, when referring to a specific post, the concept is not a technical term, but when discussing the creation of a blogging platform, this term becomes related to computing.
In addition to making grammatical errors, post authors often use natural language shortcuts such as abbreviations, acronyms, omissions, and ambiguous terms, which require additional effort from annotators to resolve. The misuse of formatting options, such as formatting code blocks as inline code, also makes posts harder to understand. For example, the scope keyword “until successful” can easily be misread as a preposition and an adjective if it is not formatted as inline code.^Q
Annotators cannot assume the topic of a Wikipedia article only by its title. For example, the article Java does not describe the programming language, but the Indonesian island. Also, although most titles that consist only of three capital letters lead to disambiguation pages, the article URL does not. Redirect titles further add to the possible disconnect between titles and article content. Thus, annotators must make the effort to scan the article to verify what it actually describes.
The number of distinct articles is slightly less than 1098 and 10 854, respectively, because some titles are actually redirect pages to other articles.
An additional 15 previously identified articles become MCS matches, due to unpredictable factors of the wikification algorithms.
Instead of restricting the correlation to the overlap, another possible approach would have been to use all articles, and assign a confidence of 0 to articles not found by wikifiers. This approach, however, is sensible to noise due to differences in the knowledge bases (and their version) used to train the wikifiers. Using this approach, we observed correlation scores all near zero. Therefore, we present the more useful results using only the overlaps.
The τ_b variant of the statistics is explicitly tuned to account for ties, which is especially important for the discretized confidence scores of DBpedia’s results.
The ISO two-letter code for the Burmese language is “my”, an homograph of the possessive determiner, which appears in many posts.
This musical is often linked to the word your, an obvious incorrect artifact from the training phase.

References

Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? An analysis of topics and trends in stack overflow. Empir Softw Eng 19 (3):619–654
Article Google Scholar
Bourque P, Fairley RE (2014) Guide to the software engineering body of knowledge, 3rd edn. IEEE Computer Society Press. https://www.swevok.org
Brank J, Leban G, Grobelnik M (2017) Annotating documents with relevant wikipedia concepts. In: Proceedings of the Slovenian conference on data mining and data warehouses, p 4
Carvalho NR, Almeida JJ, Henriques PR, Varanda MJ (2015) From source code identifiers to natural language terms. J Syst Softw 100:117–128
Article Google Scholar
Cassidy T, Ji H, Ratinov LA, Zubiaga A, Huang H (2012) Analysis and enhancement of wikification for microblogs with context expansion. In: Proceedings of the 24th international conference on computational linguistics, pp 441–456
Chen C, Xing Z, Wang X (2017) Unsupervised software-specific morphological forms inference from informal discussions. In: Proceedings of the 39th international conference on software engineering, pp 450–461
Chen C, Xing Z, Liu Y (2018) What’s Spain’s Paris? Mining analogical libraries from Q&A discussions. Empir Softw Eng 24(3):1155–1194
Article Google Scholar
Cheng X, Roth D (2013) Relational inference for wikification. In: Proceedings of the conference on empirical methods in natural language processing, pp 1787–1796
Cleland-Huang J, Gotel OCZ, Huffman Hayes J, Mäder P, Zisman A (2014) Software traceability: trends and future directions. In: Proceedings of the on future of software engineering, pp 55–69
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article Google Scholar
Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd international conference on World Wide Web, pp 249–260
Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic system, pp 121–124
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press
Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM international conference on information and knowledge management, pp 1625–1628
Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the ACL conference on empirical methods in natural language processing, pp 782–792
ISO/IEC/IEEE (2017) International standard—systems and software engineering—vocabulary. Standard 24765:2017, ISO/IEC/IEEE
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93
Article Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Article Google Scholar
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia—a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2):167–195
Article Google Scholar
Ma S, Xing Z, Chen C, Chen C, Qu L, Li G (2019) Easy-to-deploy api extraction by multi-level feature embedding and transfer learning. IEEE Trans Softw Eng 15 pp, to appear
Meij E, Weerkamp W, de Rijke M (2012) Adding semantics to microblog posts. In: Proceedings of the 5th ACM international conference on web search and data mining, pp 563–572
Mendes PN, Jakob M, Garcia-Silva A, Bizer C (2011) DBpedia Spotlight: shedding light on the web of documents. In: Proceedings of the 7th international conference on semantic systems, pp 1–8
Mihalcea R, Chklovski T, Kilgarriff A (2004) The senseval-3 English lexical sample task. In: Proceedings of the third international workshop on the evaluation of systems for the semantic analysis of text, pp 25–28
Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 509–518
Milne D, Witten IH (2013) An open-source toolkit for mining wikipedia. Artif Intell 194:222–239
Article MathSciNet Google Scholar
Moro A, Raganato A, Navigli R (2014) Entity linking meets word sense disambiguation: a unified approach. Trans Assoc Comput Linguist 2:231–244
Article Google Scholar
Nassif M, Treude C, Robillard MP (2020) Automatically categorizing software technologies. IEEE Trans Softw Eng 46(1):20–32
Article Google Scholar
Navigli R, Ponzetto SP (2012) BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif Intell 193:217–250
Article MathSciNet Google Scholar
Navigli R, Jurgens D, Vannella D (2013) SemEval-2013 task 12: multilingual word sense disambiguation. In: Second joint conference on lexical and computational semantics, vol 2. Proceedings of the seventh international workshop on semantic evaluation, pp 222–231
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford InfoLab
Patil S (2017) Concept-based classification of software defect reports. In: Proceedings of the 14th international conference on mining software repositories, pp 182–186
Piccinno F, Ferragina P (2014) From TagME to WAT: a new entity annotator. In: Proceedings of the first international workshop on entity recognition & disambiguation, pp 55–62
Ponzanelli L, Bacchelli A, Lanza M (2013) Seahawk: stack overflow in the ide. In: Proceedings of the 35th international conference on software engineering, pp 1295–1298
Ratinov L, Roth D, Downey D, Anderson M (2011) Local and global algorithms for disambiguation to wikipedia. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1, pp 1375–1384
Rebele T, Suchanek F, Hoffart J, Biega J, Kuzey E, Weikum G (2016) YAGO: a multilingual knowledge base from wikipedia, wordnet and geonames. In: Proceedings of the international semantic web conference, pp 177–185
Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of the 35th IEEE/ACM international conference on software engineering, pp 832–841
Schindler M, Fox O, Rausch A (2015) Clustering source code elements by semantic similarity using wikipedia. In: Proceedings of the fourth international workshop on realizing artificial intelligence synergies in software engineering, pp 13–18
Seyler D, Dembelova T, Del Corro L, Hoffart J, Weikum G (2018) A study of the importance of external knowledge in the named entity recognition task. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 2: short papers), pp 241–246
Shen W, Wang J, Han J (2015) Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans Knowl Data Eng 27(2):443–460
Article Google Scholar
Sundheim BM (1995) Overview of results of the MUC-6 evaluation. In: Proceedings of the 6th conference on message understanding, pp 13–31
Szymański J, Naruszewicz M (2019) Review on wikification methods. AI Commun 32(3):235–251
Article MathSciNet Google Scholar
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL, pp 142–147
Treude C, Robillard MP (2016) Augmenting API documentation with insights from stack overflow. In: Proceedings of the IEEE/ACM 38th international conference on software engineering, pp 392–403
Usbeck R, Röder M, Ngonga Ngomo AC, Baron C, Both A, Brümmer M, Ceccarelli D, Cornolti M, Cherix D, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R, Piccinno F, Rizzo G, Sack H, Speck R, Troncy R, Waitelonis J, Wesemann L (2015) GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th international conference on World Wide Web, pp 1133–1143
Vincent N, Johnson I, Hecht B (2018) Examining Wikipedia with a broader lens: quantifying the value of Wikipedia’s relationship with other large-scale online communities. In: Proceedings of the CHI conference on human factors in computing systems, pp 1–13
Wang C, Peng X, Liu M, Xing Z, Bai X, Xie B, Wang T (2019) A learning-based approach for automatic construction of domain glossary from source code and documentation. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 97–108
Wikipedia (2019) Wikipedia: manual of style/linking. https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking. Accessed 2020-01-06
Xun G, Jia X, Gopalakrishnan V, Zhang A (2017) A survey on context learning. IEEE Trans Knowl Data Eng 29(1):38–56
Article Google Scholar
Ye D, Xing Z, Foo CY, Ang ZQ, Li J, Kapre N (2016a) Software-specific named entity recognition in software engineering social content. In: Proceedings of the IEEE 23rd international conference on software analysis, evolution, and reengineering, pp 90–101
Ye D, Xing Z, Foo C Y, Li J, Kapre N (2016b) Learning to extract api mentions from informal natural language discussions. In: IEEE international conference on software maintenance and evolution, pp 389–399
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016c) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering, pp 404–415
Ye D, Bao L, Xing Z, Lin S W (2018) APIReal: an api recognition and linking approach for online developer forums. Empir Softw Eng 23 (6):3129–3160
Article Google Scholar
Zhao X, Xing Z, Kabir MA, Sawada N, Li J, Lin SW (2017) HDSKG: harvesting domain specific knowledge graph from content of webpages. In: Proceedings of the IEEE 24th international conference on software analysis, evolution and reengineering, pp 56–67

Download references

Acknowledgements

We are grateful to the external annotators for helping with the manual annotation of the wikifiers output. This work is funded by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

School of Computer Science, McGill University, Montréal, Canada
Mathieu Nassif & Martin P. Robillard

Authors

Mathieu Nassif
View author publications
You can also search for this author in PubMed Google Scholar
Martin P. Robillard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathieu Nassif.

Additional information

Communicated by: Xin Peng

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: List of Links to External Resources

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nassif, M., Robillard, M.P. Wikifying software artifacts. Empir Software Eng 26, 31 (2021). https://doi.org/10.1007/s10664-020-09918-4

Download citation

Accepted: 14 October 2020
Published: 11 March 2021
DOI: https://doi.org/10.1007/s10664-020-09918-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Wikifying software artifacts