Natural Language Insights from Code Reviews that Missed a Vulnerability

Munaiah, Nuthan; Meyers, Benjamin S.; Alm, Cecilia O.; Meneely, Andrew; Murukannaiah, Pradeep K.; Prud’hommeaux, Emily; Wolff, Josephine; Yu, Yang

doi:10.1007/978-3-319-62105-0_5

Nuthan Munaiah¹⁶,
Benjamin S. Meyers¹⁶,
Cecilia O. Alm¹⁷,
Andrew Meneely¹⁶,
Pradeep K. Murukannaiah¹⁶,
Emily Prud’hommeaux¹⁷,
Josephine Wolff¹⁷ &
…
Yang Yu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10379))

Included in the following conference series:

International Symposium on Engineering Secure Software and Systems

1388 Accesses
10 Citations

Abstract

Engineering secure software is challenging. Software development organizations leverage a host of processes and tools to enable developers to prevent vulnerabilities in software. Code reviewing is one such approach which has been instrumental in improving the overall quality of a software system. In a typical code review, developers critique a proposed change to uncover potential vulnerabilities. Despite best efforts by developers, some vulnerabilities inevitably slip through the reviews. In this study, we characterized linguistic features—inquisitiveness, sentiment and syntactic complexity—of conversations between developers in a code review, to identify factors that could explain developers missing a vulnerability. We used natural language processing to collect these linguistic features from 3,994,976 messages in 788,437 code reviews from the Chromium project. We collected 1,462 Chromium vulnerabilities to empirically analyze the linguistic features. We found that code reviews with lower inquisitiveness, higher sentiment, and lower complexity were more likely to miss a vulnerability. We used a Naïve Bayes classifier to assess if the words (or lemmas) in the code reviews could differentiate reviews that are likely to miss vulnerabilities. The classifier used a subset of all lemmas (over 2 million) as features and their corresponding TF-IDF scores as values. The average precision, recall, and F-measure of the classifier were 14%, 73%, and 23%, respectively. We believe that our linguistic characterization will help developers identify problematic code reviews before they result in a vulnerability being missed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Reuters-21578, Distribution 1.0. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
Baddeley, A.: Recent developments in working memory. Curr. Opin. Neurobiol. 8(2), 234–238 (1998)
Article Google Scholar
Baddeley, A.: Working memory and language: an overview. J. Commun. Disord. 36(3), 189–208 (2003)
Article Google Scholar
Baysal, O., Kononenko, O., Holmes, R., Godfrey, M.W.: The influence of non-technical factors on code review. In: 2013 20th Working Conference on Reverse Engineering (WCRE), pp. 122–131, October 2013
Google Scholar
Beller, M., Bacchelli, A., Zaidman, A., Juergens, E.: Modern code reviews in open-source projects: which problems do they fix? In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, New York, NY, USA, pp. 202–211. ACM, New York (2014)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc, Sebastopol (2009)
MATH Google Scholar
Bosu, A., Carver, J.C.: Peer code review to prevent security vulnerabilities: an empirical evaluation. In: 2013 IEEE Seventh International Conference on Software Security and Reliability Companion, pp. 229–230, June 2013
Google Scholar
Bosu, A., Greiler, M., Bird, C.: Characteristics of useful code reviews: an empirical study at microsoft. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 146–156, May 2015
Google Scholar
Bosu, A., Carver, J.C., Hafiz, M., Hilley, P., Janni, D.: Identifying the characteristics of vulnerable code changes: an empirical study. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, New York, NY, pp. 257–268. ACM, New York (2014)
Google Scholar
Brown, C., Snodgrass, T., Kemper, S.J., Herman, R., Covington, M.A.: Automatic measurement of propositional idea density from part-of-speech tagging. Behav. Res. Methods 40(2), 540–545 (2008)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Chomsky, N.: Syntactic Structures. Mouton, The Hague (1957)
MATH Google Scholar
Chromium: Chromium OS developer’s guide (2017). https://www.chromium.org/chromium-os/developer-guide
Ciolkowski, M., Laitenberger, O., Biffl, S.: Software reviews: the state of the practice. IEEE Software 20(6), 46–51 (2003)
Article Google Scholar
Czerwonka, J., Greiler, M., Tilford, J.: Code reviews do not find bugs: how the current code review best practice slows us down. In: Proceedings of the 37th International Conference on Software Engineering, ICSE 2015, vol. 2, pp. 27–28. IEEE Press, Piscataway (2015). http://dl.acm.org/citation.cfm?id=2819009.2819015
Edmundson, A., Holtkamp, B., Rivera, E., Finifter, M., Mettler, A., Wagner, D.: An empirical study on the effectiveness of security code review. In: Jürjens, J., Livshits, B., Scandariato, R. (eds.) ESSoS 2013. LNCS, vol. 7781, pp. 197–212. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36563-8_14
Chapter Google Scholar
Francis, W.N., Kucera, H.: A standard corpus of present-day edited American English, for use with digital computers. Coll. Engl. 26(4), 267 (1965)
Article Google Scholar
Frazier, L.: Syntactic complexity. In: Dowty, D.R., Karttunen, L., Zwicky, A.M. (eds.) Natural Language Parsing, pp. 129–189. Cambridge University Press (CUP), Cambridge (1985)
Chapter Google Scholar
Frazier, L.: Sentence Processing: A Tutorial Review (1987)
Google Scholar
Frazier, L.: syntactic processing: evidence from Dutch. Nat. Lang. Linguist. Theor. 5(4), 519–559 (1987)
Article Google Scholar
Frazier, L., Taft, L., Roeper, T., Clifton, C., Ehrlich, K.: Parallel structure: a source of facilitation in sentence comprehension. Mem. Cogn. 12(5), 421–430 (1984)
Article Google Scholar
Guzman, E., Azócar, D., Li, Y.: Sentiment analysis of commit comments in GitHub: an empirical study. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, NY, pp. 352–355. ACM, New York (2014)
Google Scholar
Hart, M.S., Austen, J., Blake, W., Burgess, T.W., Bryant, S.C., Carroll, L., Chesterton, G.K., Edgeworth, M., Melville, H., Milton, J., Shakespeare, W., Whitman, W., Bible, K.J.: Project Gutenberg Selections. Freely available as a Corpus in the Natural Language ToolKit. http://www.nltk.org/nltk_data/#25
Hinkle, D.E., Wiersma, W., Jurs, S.G.: Applied Statistics for the Behavioral Sciences. Houghton Mifflin, Boston (2002)
Google Scholar
Lipner, S.: The trustworthy computing security development lifecycle. In: 20th Annual Computer Security Applications Conference, pp. 2–13, December 2004
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)
Google Scholar
Mäntylä, M.V., Lassenius, C.: What types of defects are really discovered in code reviews? IEEE Trans. Software Eng. 35(3), 430–448 (2009)
Article Google Scholar
Mayer, R.E., Moreno, R.: Nine ways to reduce cognitive load in multimedia learning. Educ. Psychol. 38(1), 43–52 (2003)
Article Google Scholar
McGraw, G.: Software security. IEEE Secur. Priv. 2(2), 80–83 (2004)
Article Google Scholar
Meneely, A., Srinivasan, H., Musa, A., Tejeda, A.R., Mokary, M., Spates, B.: When a patch goes bad: exploring the properties of vulnerability-contributing commits. In: 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 65–74, October 2013
Google Scholar
Menzies, T., Menzies, A., Distefano, J., Greenwald, J.: Problems with precision: a response to “comments on ‘data mining static code attributes to learn defect predictors”’. IEEE Trans. Softw. Eng. 33(9), 637–640 (2007). doi:10.1109/TSE.2007.70721. ISSN: 0098-5589
Article Google Scholar
Meyers, B.S.: Speech processing & linguistic analysis tool (SPLAT). https://github.com/meyersbs/SPLAT
Miller, G.: Human memory and the storage of information. IRE Trans. Inf. Theor. 2(3), 129–137 (1956)
Article Google Scholar
Miller, J.F., Chapman, R.S.: The relation between age and mean length of utterance in morphemes. J. Speech Lang. Hear. Res. 24(2), 154–161 (1981)
Article Google Scholar
Pletea, D., Vasilescu, B., Serebrenik, A.: Security and emotion: sentiment analysis of security discussions on GitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, NY, pp. 348–351. ACM, New York (2014)
Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015). https://www.R-project.org/
Roark, B., Mitchell, M., Hosom, J., Hollingshead, K., Kaye, J.: Spoken language derived measures for detecting mild cognitive impairment. Trans. Audio Speech Lang. Proc. 19(7), 2081–2090 (2011)
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, October 2013
Google Scholar
Sweller, J., Chandler, P.: Evidence for cognitive load theory. Cogn. Instr. 8(4), 351–362 (1991)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)
Google Scholar
Yngve, V.H.: A Model and an Hypothesis for Language Structure, vol. 104, pp. 444–466. American Philosophical Society (1960)
Google Scholar

Download references

Author information

Authors and Affiliations

B. Thomas Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, 14623, USA
Nuthan Munaiah, Benjamin S. Meyers, Andrew Meneely & Pradeep K. Murukannaiah
College of Liberal Arts, Rochester Institute of Technology, Rochester, NY, 14623, USA
Cecilia O. Alm, Emily Prud’hommeaux & Josephine Wolff
Saunders College of Business, Rochester Institute of Technology, Rochester, NY, 14623, USA
Yang Yu

Authors

Nuthan Munaiah
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin S. Meyers
View author publications
You can also search for this author in PubMed Google Scholar
Cecilia O. Alm
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Meneely
View author publications
You can also search for this author in PubMed Google Scholar
Pradeep K. Murukannaiah
View author publications
You can also search for this author in PubMed Google Scholar
Emily Prud’hommeaux
View author publications
You can also search for this author in PubMed Google Scholar
Josephine Wolff
View author publications
You can also search for this author in PubMed Google Scholar
Yang Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuthan Munaiah .

Editor information

Editors and Affiliations

University of Paderborn, Paderborn, Germany
Eric Bodden
Purdue University, West Lafayette, USA
Mathias Payer
University of Cyprus, Nicosia, Cyprus
Elias Athanasopoulos

A Comparing Distribution of Inquisitiveness, Sentiment and Complexity Metrics

The comparison of the distribution of inquisitiveness, sentiment and complexity metrics for neutral and missed vulnerability code reviews is shown in Fig. 3.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Munaiah, N. et al. (2017). Natural Language Insights from Code Reviews that Missed a Vulnerability. In: Bodden, E., Payer, M., Athanasopoulos, E. (eds) Engineering Secure Software and Systems. ESSoS 2017. Lecture Notes in Computer Science(), vol 10379. Springer, Cham. https://doi.org/10.1007/978-3-319-62105-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-62105-0_5
Published: 24 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62104-3
Online ISBN: 978-3-319-62105-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Natural Language Insights from Code Reviews that Missed a Vulnerability

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Comparing Distribution of Inquisitiveness, Sentiment and Complexity Metrics

A Comparing Distribution of Inquisitiveness, Sentiment and Complexity Metrics

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation