Skip to main content
Log in

Do I really need all this work to find vulnerabilities?

An empirical case study comparing vulnerability detection techniques on a Java application

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context:

Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project.

Objective:

The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based web application.

Method:

We apply four different categories of vulnerability detection techniques – systematic manual penetration testing (SMPT), exploratory manual penetration testing (EMPT), dynamic application security testing (DAST), and static application security testing (SAST) – to an open-source medical records system.

Results:

We found the most vulnerabilities using SAST. However, EMPT found more severe vulnerabilities. With each technique, we found unique vulnerabilities not found using the other techniques. The efficiency of manual techniques (EMPT, SMPT) was comparable to or better than the efficiency of automated techniques (DAST, SAST) in terms of Vulnerabilities per Hour (VpH).

Conclusions:

The vulnerability detection technique practitioners should select may vary based on the goals and available resources of the project. If the goal of an organization is to find “all” vulnerabilities in a project, they need to use as many techniques as their resources allow.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. A theoretical replication seeks to investigate the scope of the underlying theory, e.g. by redesigning the study for a different target population, or by testing a variant of the original hypothesis (Lung et al. 2008)

  2. https://owasp.org/www-project-zap/

  3. https://www.sonarqube.org/

  4. as measured by CLOC v1.74 (https://github.com/AlDanial/cloc)

  5. A theoretical replication seeks to investigate the scope of the underlying theory,for example by redesigning the study for a different target population, or by testing a variant of the original hypothesis of the work (Lung et al. 2008)

  6. https://nvd.nist.gov/vuln

  7. https://github.com/OWASP/ASVS/tree/v4.0.1

  8. https://www.zaproxy.org/

  9. e.g. https://docs.rapid7.com/insightappsec/scan-scope/; https://www.netsparker.com/support/scanning-restful-api-web-service/; https://docs.gitlab.com/ee/user/application_security/api_fuzzing/create_har_files.html

  10. https://github.com/AlDanial/cloc

  11. https://github.com/openmrs

  12. The CVE Counting rules have been updated since our original study. In future work, the authors may follow the updated rules: https://cve.mitre.org/cve/cna/rules.html

  13. https://cwe.mitre.org/data/definitions/209.html

  14. https://cwe.mitre.org/data/definitions/79.html

  15. https://wiki.openmrs.org/

  16. https://vcl.apache.org

  17. https://vcl.ncsu.edu

  18. https://cwe.mitre.org/data/definitions/89.html

  19. https://cwe.mitre.org/data/definitions/7.html

  20. https://cwe.mitre.org/data/definitions/52.html

  21. https://docs.spring.io/spring-security/site/docs/5.0.x/reference/html/csrf.html

  22. https://cwe.mitre.org/data/definitions/352.html

  23. https://cwe.mitre.org/data/definitions/79.html

  24. https://cwe.mitre.org/data/definitions/404.html

  25. the full text of the assignment is available under Project Part 1 in Appendix C

  26. the first author has over 2 years of industry testing experience

  27. https://vcl.apache.org/

  28. https://wiki.x2go.org/doku.php

  29. https://www.eclipse.org/ide/

References

  • Ackerman E (2019) Upgrade to superhuman reflexes without feeling like a robot. IEEE Spectr. https://spectrum.ieee.org/enabling-superhuman-reflexes-without-feeling-like-a-robot

  • Alomar N, Wijesekera P, Qiu E, Egelman S (2020) “you’ve got your nice list of bugs, now what?” vulnerability discovery and management processes in the wild. In: Sixteenth Symposium on Usable Privacy and Security ({SOUPS} 2020), pp 319–339

  • Amankwah R, Chen J, Kudjo PK, Towey D (2020) An empirical comparison of commercial and open-source web vulnerability scanners. Softw - Pract Exp 50(9):1842–1857

    Article  Google Scholar 

  • Anderson T (2020) Linux in 2020: 27.8 million lines of code in the kernel, 1.3 million in systemd. The Register URL https://www.theregister.com/2020/01/06/linux_2020_kernel_systemd_code/. Accessed 21 Dec 2021

  • Antunes N, Vieira M (2009) Comparing the effectiveness of penetration testing and static code analysis on the detection of sql injection vulnerabilities in web services. In: 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing. IEEE, pp 301–306

  • Antunes N, Vieira M (2010) Benchmarking vulnerability detection tools for web services. In: 2010 IEEE International Conference on Web Services,IEEE, pp. 203–210

  • Austin A, Holmgreen C, Williams L (2013) A comparison of the efficiency and effectiveness of vulnerability discovery techniques. Inf Softw Technol 55(7):1279–1288

    Article  Google Scholar 

  • Austin A, Williams L (2011) One technique is not enough: A comparison of vulnerability discovery techniques. In: 2011 International Symposium on Empirical Software Engineering and Measurement, IEEE, pp. 97–106

  • Bannister A (2021) Healthcare provider texas ent alerts 535,000 patients to data breach. The Daily Swig. [Online; Publication Date 20 Dec 2021; Accessed 21 Dec 2021]

  • Bartlett MS (1937) Properties of sufficiency and statistical tests. Proc R Soc A: Math Phys Eng Sci 160(901):268–282

    MATH  Google Scholar 

  • Bau J, Wang F, Bursztein E, Mutchler P, Mitchell JC (2012) Vulnerability factors in new web applications: Audit tools, developer selection & languages. Tech. rep., Stanford, https://seclab.stanford.edu/websec/scannerPaper.pdf. Accessed 21 Dec, 2021

  • Campbell GA (2020) What is ’taint analysis’ and why do i care?, https://blog.sonarsource.com/what-is-taint-analysis

  • Cass S (2021) Top programming languages 2021. IEEE Spectr, https://spectrum.ieee.org/top-programming-languages-2021. Accessed 21 Dec 2021

  • Cass S, Kulkarni P, Guizzo E (2021) Interactive: Top Programming Languages 2021. IEEE Spectrum, https://spectrum.ieee.org/top-programming-languages/. Accessed 20 Apr 2022

  • Chaim ML, Santos DS, Cruzes DS (2018) What do we know about buffer overflow detection?: A survey on techniques to detect a persistent vulnerability. International Journal of Systems and Software Security and Protection (IJSSSP) 9(3):1–33

    Article  Google Scholar 

  • Cicchetti DV, Feinstein AR (1990) High agreement but low kappa: Ii. resolving the paradoxes. J Clin Epidemiol 43(6):551–558

    Article  Google Scholar 

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Condon C, Miller H (2021) Maryland health department says there’s no evidence of data lost after cyberattack; website is back online. Baltimore Sun, https://www.baltimoresun.com/health/bs-hs-mdh-website-down-20211206-o2ky2sn5znb3pdwtnu2a7m5g6q-story.html. Accessed 21 Dec 2021

  • Cook TD, Campbell DT (1979) Quasi-experimentation: Design and analysis issues for field settings. Rand McNally College Publishing, Chicago

    Google Scholar 

  • Corbin J, Strauss A (2008) Basics of qualitative research: Techniques and procedures for developing grounded theory, 3rd edn. SAGE Publications Inc., California

    Book  Google Scholar 

  • Cowan C, Wagle F, Pu C, Beattie S, Walpole J (2000) Buffer overflows: Attacks and defenses for the vulnerability of the decade. In: Proceedings DARPA Information Survivability Conference and Exposition. DISCEX’00, IEEE, vol. 2, pp. 119–129

  • Cruzes DS, Felderer M, Oyetoyan TD, Gander M, Pekaric I (2017) How is security testing done in agile teams? a cross-case analysis of four software teams. In: International Conference on Agile Software Development, Springer, Cham, pp. 201–216

  • Dambra S, Bilge L, Balzarotti D (2020) Sok: Cyber insurance–technical challenges and a system security roadmap. In: 2020 IEEE Symposium on Security and Privacy (SP), IEEE, pp. 1367–1383

  • Davis FD (1989) Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly 13(3):319–340

    Article  Google Scholar 

  • Delaitre AM, Stivalet BC, Black PE, Okun V, Cohen TS, Ribeiro A (2018) Sate v report: Ten years of static analysis tool expositions. NIST SP 500-326, National Institute of Standards and Technology (NIST), https://doi.org/10.6028/NIST.SP.500-326. Accessed 20 Jul 2021

  • Desjardins J (2017) Here’s how many millions of lines of code it takes to run different software. Business Insider, https://www.businessinsider.com/how-many-lines-of-code-it-takes-to-run-different-software-2017-2. Accessed 21 Dec 2021

  • Doupé A, Cova M, Vigna G (2010) Why johnny can’t pentest: An analysis of black-box web vulnerability scanners. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Springer, pp. 111–131

  • Elder SE, Zahan N, Kozarev V, Shu R, Menzies T, Williams L (2021) Structuring a comprehensive software security course around the owasp application security verification standard. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), IEEE, pp. 95–104

  • Epic Systems Corporation (2020) From healthcare to mapping the milky way: 5 things you didn’t know about epic’s tech, https://www.epic.com/epic/post/healthcare-mapping-milky-way-5-things-didnt-know-epics-tech. Accessed 07 Dec 2021

  • Executive Order 14028 (2021) Executive order on improving the nation’s cybersecurity. Exec. Order No. 14028, 86 FR 26633, https://www.federalregister.gov/d/2021-10460

  • Feinstein AR, Cicchetti DV (1990) High agreement but low kappa: I. the problems of two paradoxes. J Clin Epidemiol 43(6):543–549

    Article  Google Scholar 

  • Feldt R, Magazinius A (2010) Validity threats in empirical software engineering research-an initial survey.. In: Seke, pp. 374–379

  • Feng GC (2013) Factors affecting intercoder reliability: A monte carlo experiment. Qual Quant 47(5):2959–2982

    Article  Google Scholar 

  • Fielding RT, Reschke J (2014) Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC Editor https://doi.org/10.7231/RFC7231, https://rfc-editor.org/rfc/rfc7231.txt. Accessed 21 Dec 2021

  • Finifter M, Akhawe D, Wagner D (2013) An empirical study of vulnerability rewards programs. In: 22nd USENIX Security Symposium (USENIX Security 13), USENIX, Washington, D.C., pp. 273–288

  • Fonseca J, Vieira M, Madeira H (2007) Testing and comparing web vulnerability scanning tools for sql injection and xss attacks. In: 13th Pacific Rim international symposium on dependable computing (PRDC 2007), IEEE, pp. 365–372

  • Games PA, Howell JF (1976) Pairwise multiple comparison procedures with unequal n’s and/or variances: a monte carlo study. J Educ Stat 1(2):113–125

    Google Scholar 

  • Github (2021) The 2021 State of the Octoverse, https://octoverse.github.com/. Accessed 20 Apr 2022

  • Gonçales L, Farias K, da Silva BC (2021) Measuring the cognitive load of software developers: An extended systematic mapping study. Inf Softw Technol 106563

  • Hafiz M, Fang M (2016) Game of detections: how are security vulnerabilities discovered in the wild?. Empir Softw Eng 21(5):1920–1959

    Article  Google Scholar 

  • Imtiaz N, Rahman A, Farhana E, Williams L (2019) Challenges with responding to static analysis tool alerts. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, pp. 245–249

  • ISO/IEC/IEEE (2013) Software and systems engineering — software testing — part 1: concepts and definitions. ISO/IEC/IEEE 29119-1:2013, International Organization for Standardization (ISO), International Electrotechnical Commission (IES), and Institute of Electrical and Electronics Engineers (IEEE)

  • Itkonen J, Mäntylä MV (2014) Are test cases needed? replicated comparison between exploratory and test-case-based software testing. Empir Softw Eng 19(2):303–342

    Article  Google Scholar 

  • Itkonen J, Mäntylä MV, Lassenius C (2013) The role of the tester’s knowledge in exploratory software testing. IEEE Trans Softw Eng 39(5):707–724

    Article  Google Scholar 

  • Johnson B, Song Y, Murphy-Hill E, Bowdidge R (2013) Why don’t software developers use static analysis tools to find bugs?. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, pp. 672–681

  • Joint Task Force Transformation Initiative (2013) Security and privacy controls for federal information systems and organizations. NIST SP 800-53, National Institute of Standards and Technology (NIST), https://doi.org/10.6028/NIST.SP.800-53r4. Accessed 20 Jul 2021

  • Kirk R (2013) Experimental design: Procedures for the behavioral sciences, 4th edn. Sage Publications, Thousand Oaks

    Book  Google Scholar 

  • Kitchenham B, Madeyski L, Budgen D, Keung J, Brereton P, Charters S, Gibbs S, Pohthong A (2017) Robust statistical methods for empirical software engineering. Empir Softw Eng 22(2):579–630

    Article  Google Scholar 

  • Kitchenham BA, Budgen D, Brereton P (2015) Evidence-based software engineering and systematic reviews, vol 4. CRC press, Boca Raton

    Book  Google Scholar 

  • Klees G, Ruef A, Cooper B, Wei S, Hicks M (2018) Evaluating fuzz testing. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 2123–2138

  • Liu M, Zhang B, Chen W, Zhang X (2019) A survey of exploitation and detection methods of xss vulnerabilities. IEEE Access 7:182004–182016

    Article  Google Scholar 

  • Lombard M, Snyder-Duch J, Bracken CC (2002) Content analysis in mass communication: Assessment and reporting of intercoder reliability. Hum Commun Res 28(4):587–604

    Article  Google Scholar 

  • Lung J, Aranda J, Easterbrook S, Wilson G (2008) On the difficulty of replicating human subjects studies in software engineering. In: 2008 ACM/IEEE 30th International Conference on Software Engineering, pp. 191–200

  • Mallet F (2016) Sonaranalyzer for java: Tricky bugs are running scared. https://blog.sonarsource.com/sonaranalyzer-for-java-tricky-bugs-are-running-scared, Accessed 05 Dec 2021

  • McGraw G (2006) Software security: building security in. Addison-Wesley Professional, Boston

    Google Scholar 

  • MITRE (2016) Common vulnerabilities and exposures (cve) numbering authority (cna) rules. https://cve.mitre.org/cve/cna/CNA_Rules_v1.1.pdf, Accessed 24 July 2021

  • MITRE (2021) Cve → cwe mapping guidance. In: (MITRE 2021b), https://cwe.mitre.org/documents/cwe_usage/guidance.html. Accessed 24 Jul 2021

  • MITRE (2021) Cwe common weakness enumeration (website), https://cwe.mitre.org/. Accessed 20 Jul 2021

  • MITRE (2021) Cwe view: Weaknesses in owasp top ten (2021). In: (MITRE 2021b), https://cwe.mitre.org/data/definitions/1344.html. Accessed 09 Dec 2021

  • MITRE (2022) Cwe 1003 - cwe view: Weaknesses for simplified mapping of published vulnerabilities

  • Morrison P, Moye D, Pandita R, Williams L (2018) Mapping the field of software life cycle security metrics. Inf Softw Technol 102:146–159

    Article  Google Scholar 

  • Mozilla (2021) Http messages, https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages. Accessed 21 Dec 2021

  • Nagarakatte S, Zhao J, Martin Milo MK, Zdancewic S (2009) Softbound: Highly compatible and complete spatial memory safety for c. In: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 245–258

  • NVD (2021) Cwe over time. In: (NVD 2021b), https://nvd.nist.gov/general/visualizations/vulnerability-visualizations/cwe-over-time. Accessed 05 Dec 2021

  • NVD (2021) National vulnerability database (website). National Institute of Standards and Technology (NIST), https://nvd.nist.gov/. Accessed 01 Nov 2021

  • NVD (2021) Nvd - general faqs. National Institute of Standards and Technology (NIST). In: (NVD 2021b) https://nvd.nist.gov/general/FAQ-Sections/General-FAQs. Accessed 04 Apr 2022

  • NVD (2021) Vulnerabilities. In: (NVD 2021b), https://nvd.nist.gov/vuln. Accessed 01 Nov 2021

  • Okun V, Delaitre A, Black PE (2010) The second static analysis tool exposition (sate) 2009. NIST SP 500-287, National Institute of Standards and Technology (NIST), https://doi.org/10.6028/NIST.SP.500-287. Accessed 20 Jul 2021

  • Okun V, Delaitre A, Black PE (2011) Report on the third static analysis tool exposition (sate 2010). NIST SP 500-283, National Institute of Standards and Technology (NIST), https://doi.org/10.6028/NIST.SP.500-283. Accessed 20 Jul 2021

  • Okun V, Delaitre A, Black PE (2013) Report on the static analysis tool exposition (sate) iv. NIST SP 500-297, National Institute of Standards and Technology (NIST), https://doi.org/10.6028/NIST.SP.500-297. Accessed 20 Jul 2021

  • Okun V, Gaucher R, Black PE (2009) Static analysis tool exposition (sate) 2008. NIST SP 500-279, National Institute of Standards and Technology (NIST), https://doi.org/10.6028/NIST.SP.500-279. Accessed 20 Jul 2021

  • Open Web Application Security Project (OWASP) Foundation (2013) Owasp top ten - 2010, https://owasp.org/www-pdf-archive/OWASP_Top_10_-_2010.pdf. Accessed 05 Dec 2021

  • Open Web Application Security Project (OWASP) Foundation (2013) Owasp top ten - 2013, https://owasp.org/www-pdf-archive/OWASP_Top_10_-_2013.pdf. Accessed 05 Dec 2021

  • Open Web Application Security Project (OWASP) Foundation (2017) Owasp top ten - 2017, https://owasp.org/www-project-top-ten/2017/. Accessed 05 Dec 2021

  • Open Web Application Security Project (OWASP) Foundation (2021) Owasp top ten - 2021, https://owasp.org/Top10/. Accessed 05 Dec 2021

  • Open Web Application Security Project (OWASP) Foundation (2021) The owasp top ten application security risks project, https://owasp.org/www-project-top-ten/. Accessed 09 Dec 2021

  • Open Web Application Security Project (OWASP) Foundation (2021) Owasp zap, https://www.zaproxy.org/. Accessed: 21-Dec-2021

  • OpenMRS (2020) Openmrs developer manual, http://devmanual.openmrs.org/en/. Accessed 24 Jul 2021

  • OpenMRSAtlas (2021) Openmrs atlas, https://atlas.openmrs.org/. Accessed 24 Jul 2021

  • OWASP ZAP Dev Team (2021) Getting started - features - alerts. In: (Team OZD 2021), https://www.zaproxy.org/docs/desktop/start/features/alerts/. Accessed 06 Dec 2021

  • OWASP ZAP Dev Team (2021) Getting started - features - spider. In: (Team OZD 2021), https://www.zaproxy.org/docs/desktop/start/features/spider/. Accessed 20 Jul 2021

  • Pfahl D, Yin H, Mäntylä MV, Münch J (2014) How is exploratory testing used? a state-of-the-practice survey. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, ACM, p. 5

  • Purkayastha S, Goyal S, Phillips T, Wu H, Haakenson B, Zou X (2020) Continuous security through integration testing in an electronic health records system. In: 2020 International Conference on Software Security and Assurance (ICSSA), IEEE, pp. 26–31

  • Radio New Zealand (RNZ) (2021) Health ministry announces $75m to plug cybersecurity gaps, https://www.rnz.co.nz/news/national/458331/health-ministry-announces-75m-to-plug-cybersecurity-gaps. Accessed 21 Dec 2021

  • Rahman AAU, Helms E, Williams L, Parnin C (2015) Synthesizing continuous deployment practices used in software development. In: 2015 Agile Conference, IEEE, pp. 1–10

  • Ralph P, Tempero E (2018) Construct validity in software engineering research and software metrics. In: Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018, pp. 13–23

  • Razali NM, Wah Y B, et al (2011) Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J Stat Modelling Anal 2(1):21–33

    Google Scholar 

  • Scandariato R, Walden J, Joosen W (2013) Static analysis versus penetration testing: A controlled experiment. In: 2013 IEEE 24th international symposium on software reliability engineering (ISSRE), IEEE, pp. 451–460

  • Scanlon T (2018) 10 types of application security testing tools: When and how to use them. Blog, Software Engineering Institute, Carnegie Mellon University, https://insights.sei.cmu.edu/blog/10-types-of-application-security-testing-tools-when-and-how-to-use-them. Accessed 20 Jul 2021

  • Smith B, Williams L (2012) On the effective use of security test patterns. In: 2012 IEEE Sixth International Conference on Software Security and Reliability, IEEE, pp. 108–117

  • Smith B, Williams LA (2011) Systematizing security test planning using functional requirements phrases. Tech. Rep. TR-2011-5, North Carolina State University. Dept. of Computer Science

  • Smith J, Do LNQ, Murphy-Hill E (2020) Why can’t johnny fix vulnerabilities: A usability evaluation of static analysis tools for security. In: Sixteenth Symposium on Usable Privacy and Security ({SOUPS} 2020), USENIX, pp. 221–238

  • Smith J, Johnson B, Murphy-Hill E, Chu B, Lipford HR (2015) Questions developers ask while diagnosing potential security vulnerabilities with static analysis. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ACM, pp. 248–259

  • SonarSource (2019) Sonarqube documentation: Security-related rules, https://docs.sonarqube.org/8.2/user-guide/security-rules/. Accessed 06 Dec 2021

  • StackOverflow (2021) 2021 Developer Survey, https://insights.stackoverflow.com/survey/2021#technology-most-popular-technologies. Accessed: 07 Dec 2021

  • Team OZD (ed.) (2021) The owasp zed attack proxy (zap) desktop user guide

  • Tøndel IA, Jaatun MG, Cruzes DS, Williams L (2019) Collaborative security risk estimation in agile software development. Information & Computer Security 27(4)

  • U.S. Cybersecurity and Infrastructure Security Agency (CISA) (2021) Provide medical care is in critical condition: Analysis and stakeholder decision support to minimize further harm, https://www.cisa.gov/sites/default/files/publications/Insights_MedicalCare_FINAL-v2_0.pdf. Accessed 21 Dec 2021

  • US Dept of Veterans Affairs, Office of Information and Technology, Enterprise Program Management Office (2021) VA Monograph, https://www.va.gov/vdl/documents/Monograph/Monograph/VistA_Monograph_0421_REDACTED.pdf. Accessed 07 Dec 2021

  • van der Stock A, Cuthbert D, Manico J, Grossman J C, Burnett M (2019) Application security verification standard. Rev. 4.0.1, Open Web Application Security Project (OWASP), https://github.com/OWASP/ASVS/tree/v4.0.1/4.0. Accessed 20 Jul 2021

  • Votipka D, Stevens R, Redmiles E, Hu J, Mazurek M (2018) Hackers vs. testers: A comparison of software vulnerability discovery processes. In: 2018 IEEE Symposium on Security and Privacy (SP), IEEE, pp. 374–391

  • Wilcox RR, Keselman HJ (2003) Modern robust data analysis methods: measures of central tendency. Psychol Methods 8(3):254

    Article  Google Scholar 

  • Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media, New York

    Book  Google Scholar 

Download references

Acknowledgements

We appreciate the feedback provided by reviewers for this paper. We thank Jiaming Jiang for her support as teaching assistant for the security class. We are grateful to the I/T staff at the university for their assistance in ensuring that we had sufficient computing power running the course. We also thank the students in the software security class. Finally, we thank all the members of the Realsearch research group for their valuable feedback through this project.

This material is based upon work supported by the National Science Foundation under Grant No. 1909516. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarah Elder.

Additional information

Communicated by: Mehrdad Sabetzadeh

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Automated Technique CWEs

Table 10 CWEs covered in the rules implemented by automated techniques

Appendix B: Student Experience Questionnaire

At the beginning of the course, students were asked to fill out a survey about their experience relevant to the course. The four questions asked to students were as follows:

  1. 1.

    How much time have you spent working at a professional software organization – including internships – in terms of the # of years and the # of months?

  2. 2.

    On a scale from 1 (none) to 5 (fully), how much of the time has your work at a professional software organization involved cybersecurity?

  3. 3.

    Which of the follow classes have you already completed?

  4. 4.

    Which of the following classes are you currently taking?

Q1 was short answer. For Q2, students selected a single number between 1 and 5. For Q3, the students could check any number of checkboxes corresponding to a list of the security and privacy courses offered at the institution. For Q4, the students selected from the subset of classes from question 4 that were being offered the semester in which the survey was given.

Fifty-nine of the sixty-three students who agreed to let their data be used for the study responded to the survey. Of these 59 responses, four students responses to Q1 provided a numeric value, e.g. “3”, but did not specify whether the numeric value indicated years or months. We considered this invalid and summarize experience from the remaining 55 participants in Section 7.2

Appendix C: Student Assignments

The following are the verbatim assignments for the Course Project that guided the tasks performed by students. We have removed sections of the assignment that are not relevant to this project. Additionally, information that is specific to the tools used, such as UI locations, has also been removed. Text that has been removed is indicated by square brackets [ ].

3.1 C.1 Project Part 1

Throughout the course of this semester, you will perform and document a technical security review of OpenMRS (http://openmrs.org). This open-source systems provides electronic health care functionality for “resource-constrained environments”. While the system has not been designed for deployment within the United States, security and privacy concerns are still a paramount security concern for any patient.

3.1.1 Software:

OpenMRS 2.9.0. There is no need to install OpenMRS. You will use the VCL image CSC515_SoftwareSecurity_Ubuntu.

3.1.2 Deliverables:

Submit a PDF with all deliverables in Gradescope. Only one submission should be performed per team. Do not include your names/IDs/team name on the report to facilitate the peer evaluation of your assignment (see Part 3 of this assignment).

  • Security test planning and execution (45 points)

    • Record how much total time (hours and minutes) your team spends to complete this activity (test planning and test execution). Compute a metric of how many true positive defects you found per hour of total effort.

    • Test planning. Create 15 black box test cases to start a repeatable black box test plan for the OpenMSR (Version 2.9). You may find the OWASP Testing Guide and OWASP Proactive Controls helpful references in addition to the references provided throughout the ASVS document.

      For each test case, you must specify:

      • A unique test case id that maps to the ASVS, sticking to Level 1 and Level 2. Provide the name/description of the ASVS control. Only one unique identifier is needed (as opposed to the example in the lecture slides). The ASVS number should be part of the one unique identifier.

      • Detailed and repeatable (the same steps could be done by anyone who reads the instructions) instructions for how to execute the test case

      • Expected results when running the test case. A passing test case would indicate a secure system.

      • Actual results of running the test case.

      • Indicate the CWE (number and name) for the vulnerability you are testing for.

      In choosing your test cases, we are looking for you to demonstrate your understanding of the vulnerability and what it would take to stress the system to see if the vulnerability exists. You may have only one test case per ASVS control.

    • Extra credit (up to 5 points): Create a black box test case that will reveal the vulnerability reported by the static analysis tool (Part 2 of this assignment) for up to 5 vulnerabilities (1 point per vulnerability). Provide the tool output (screen shot of the alert) from each tool.

  • Static analysis (45 points)

    • Record how much total time (hours and minutes) your team spends to complete this activity (test planning and test execution). Compute a metric of how many defects you found per hour of total effort.

    • For each of the three tools (below), review the security reports. Based upon these reports:

      • References:

        • Troubleshooting VCL

        • Opening OpenMRS on VCL

      • Randomly choose 10 security alerts and provide a cross-reference back to the originating report(s) where the alert was documented. Explore the code to determine if the alert is a false positive or a true positive. The alerts analyzed MUST be security alerts even though the tools will report “regular quality” alerts – you need to choose security alerts.

      • If the alert is a false positive, explain why. If you have more than 5 false positives, keep choosing alerts until you have 5 true positives while still reporting the false positives (which may make you go above a total of 10).

      • If the alert is a true positive, (1) explain how to fix the vulnerability; (2) map the vulnerability to a CWE; (3) map the vulnerability to the ASVS control.

      • Find the instructions for getting [SAST-3] going on OpenMRS here[hyperlink removed]. [Tool-specific instructions]

      • Find the instructions for getting [SAST-2] going on OpenMRS here[hyperlink removed]. [Tool-specific instructions]

    • Extra credit (up to 5 points): Find 5 instances (1 point per instance) of a potential vulnerability being reported by multiple tools. Provide the tool output (screen shot of the alert) from each tool. Explore the code to determine if the alert is a false positive or a true positive. If the alert is a false positive, explain why. If the alert is a true positive, explain how to fix the vulnerability.

  • Peer evaluation (10 points)

    Perform a peer evaluation on another team. Produce a complete report of feedback for the other team using this rubric [to be supplied]. Note: For any part of this course-long project, you may not directly copy materials from other sources. You need to adapt and make unique to OpenMRS. You should provide references to your sources. Copying materials without attribution is plagiarism and will be treated as an academic integrity violation.

3.2 C.2 Project Part 2

The fuzzing should be performed on the VCL Class Image (“CSC 515 Software Security Ubuntu”).

  • Black Box Test Cases

    Parts 1 (OWASP ZAP) and 2 ([DAST-2]) ask for you to write a black box test case. We use the same format as was used in Project Part. For each test case, you must specify:

    • A unique test case id that maps to the ASVS, sticking to Level 1 and Level 2. Provide the name/description of the ASVS control. Only one unique identifier is needed (as opposed to the example in the lecture slides). The ASVS number should be part of the one unique identifier.

    • Detailed and repeatable (the same steps could be done by anyone who reads the instructions) instructions for how to execute the test case

    • Expected results when running the test case. A passing test case would indicate a secure system.

    • Actual results of running the test case.

    • Indicate the CWE (number and name) for the vulnerability you are testing for.

  • OWASP ZAP (30 points, 3 points for each of the 5 test cases in the two parts)

    Client-side bypassing

    • Record how much total time (hours and minutes) your team spends to complete this activity. Provide:

      • Total time to plan and run the 5 black box test cases.

      • Total number of vulnerabilities found.

    • Plan 5 black box test cases (using format provided in Part 0 above) in which you stop user input in OpenMRS with OWASP ZAP and change the input string to an attack. (Consider using the strings that can be found in the ZAP rulesets, such as jbrofuzz) Use these instructions as a guide.

    • In your test case, be sure to document the page URL, the input field, the initial user input, and the malicious input. Describe what “filler” information is used for the rest of the fields on the page (if necessary).

    • Run the test case and document the results.

    Fuzzing

    • Record how much total time (hours and minutes) your team spends to complete this activity.

      • Do not include time to run ZAP

      • Provide:

        • Total time to work with the ZAP output to identify the 5 vulnerabilities.

        • Total time to plan and run the 5 black box test cases.

    • Use the 5 client-side bypassing testcases (above) for this exercise.

    • Use the jbrofuzz rulesets to perform a fuzzing exercise on OpenMRS with the following vulnerability types: Injection, Buffer Overflow, XSS, and SQL Injection.

    • Take a screen shot of ZAP information on the five test cases.

    • Report the fuzzers you chose for each vulnerability type along with the results, and what you believe the team would need to do to fix any vulnerabilities you find. If you don’t find any vulnerabilities, provide your reasoning as to why that was the case, and describe and what mitigations the team must have in place such that there are no vulnerabilities.

  • DAST-2 (25 points)

    [DAST-2] FAQ [hyperlink removed] and [DAST-2] Troubleshooting [hyperlink removed]

    • Record how much total time (hours and minutes) your team spends to complete this activity.

      • Do not include time to run [DAST-2].

      • Provide:

        • Total time to work with the [DAST-2] output to identify the 5 vulnerabilities.

        • Total time to plan and run the 5 black box test cases.

    • Run [DAST-2] on OpenMRS. Run any 5 of your test cases from Project Part 1 to seed the [DAST-2] run. Run [DAST-2] long enough that you feel you have captured enough true positive vulnerabilities that you can complete five test case plans. Note: [DAST-2] will like run out of memory if you run all 5 together. It is best to run each one separately. Also, make sure you capture only the steps for your test cases, not other unnecessary steps.

    • Export your results.

    • Take a screen shot of [DAST-2] information on the five vulnerabilities you will explore further. Write five black box test plans (using format provided in Part 0 above) to expose five vulnerabilities detected by [DAST-2] (which may use a proxy). Hint: Your expected results should be different from the actual results since these test cases should be failing test cases.

  • Vulnerable Dependencies (35 points)

    [Assignment Section not Relevant]

  • Peer evaluation (10 points)

    Perform a peer evaluation on another team. Produce a complete report of feedback for the other team using this rubric (to be supplied).

3.3 C.3 Project Part 3

The project can be done on the VCL Class Image (“CSC 515 Software Security Ubuntu”).

  • Black Box Test Cases

    Parts 1 (Logging), 2 ([Interactive Testing]), and 3 (Test coverage) ask for you to write black box test cases. We use the same format as was used in Project Part 1. For each test case, you must specify:

    • A unique test case id that maps to the ASVS, sticking to Level 1 and Level 2. Provide the name/description of the ASVS control. Only one unique identifier is needed (as opposed to the example in the lecture slides). The ASVS number should be part of the one unique identifier.

    • Detailed and repeatable (the same steps could be done by anyone who reads the instructions) instructions for how to execute the test case

    • Expected results when running the test case. A passing test case would indicate a secure system.

    • Actual results of running the test case.

    • Indicate the CWE (number and name) for the vulnerability you are testing for.

  • Logging (25 points)

    Where are the Log files? Check out the OpenMRS FAQ

    • Record how much total time (hours and minutes) your team spends to complete this activity (test planning and test execution). Compute a metric of how many true positive defects you found per hour of total effort.

    • Write 10 black box test cases for ASVS V7 Levels 1 and 2. You can have multiple test cases for the same control testing for logging in multiple areas of the application. What should be logged to support non-repudiation/accountability should be in your expected results.

    • Run the test. Find and document the location of OpenMRS’s transaction logs.

    • Write what is logged in the actual results column. The test case should fail if non-repudiation/accountability is not supported (see the 6 Ws on page 3 of the lecture notes).

    • Comment on the adequacy of OpenMRS’s logging overall based upon these 10 test cases.

  • Interactive Application Security Testing (25 points)

    [Assignment Section not Relevant]

  • Test Coverage (25 points)

    This test coverage relates to all work you have done in Project Parts 1, 2, and 3.

    1. 1.

      Compute your black box test coverage for each section of the ASVS (i.e. V1, V2, etc.) which includes the black box tests you write for Part 2 (Seeker) for Level 1 and Level 2 controls. You get credit for a control (e.g. V1.1) if you have a test case for it. If you have more than one test case for a control, you do not get extra credit –coverage is binary. Coverage is computed as # of test cases / # of requirements.

    2. 2.

      (15 points, 3 points each) Write 5 more black box tests to increase your coverage of controls you did not have a test case for.

    3. 3.

      (5 points) Recompute your test coverage. Report as below. Record how much total time (hours and minutes) your team spends to complete this activity (test planning and test execution). Compute a metric of how many true positive defects you found per hour of total effort.

    4. 4.

      (5 points) Reflect on the controls you have lower coverage for. Are these controls particularly hard to test, we didn’t cover in class, you just didn’t get to it, etc.

    Control

    # of test cases

    # of L1 and L2 controls

    Coverage

    V1.1: Secure development lifecycle

    ?

    7

    ?/7

    ...

       

    Total

       
  • Vulnerability Discovery Comparison (15 points)

    1. 1.

      (5 points) Compare the five vulnerability detection techniques you have used this semester by first completing the table below.

      • A: total number # of true positives for this detection type for all activities (Project Parts 1-3)

      • B: total time spent on all for all activities (Project Parts 1-3)

      • Efficiency is A/B

      • Exploitability: give a relative rating of the ability for this technique to find exploitable vulnerabilities

      • Provide the CWE number for all the true positive vulnerabilities detected by this technique. (This information will help you address the “wide range of vulnerability types” question below.)

      Technique

      # of true positive vulnerabilities discovered

      Total time (hours)

      Efficiency: # vulnerabilities / total time

      Detecting Exploitable vulnerabilities? (High/Med/Low)

      Unique CWE numbers

      Manual black box

           

      Static analysis

           

      Dynamic analysis

           

      Interactive testing

           
    2. 2.

      (10 points) Use this data to re-answer the question that was on the midterm (that people generally didn’t do too well on). Being able to understand the tradeoffs between the techniques is a major learning objective of the class.

      As efficiently and effectively as possible, companies want to detect a wide range of exploitable vulnerabilities (both implementation bugs and design flaws). Based upon your experience with these techniques, compare their ability to efficiently and effectively detect a wide range of types of exploitable vulnerabilities.

  • Peer evaluation (10 points)

    Perform a peer evaluation on another team. Produce a complete report of feedback for the other team using this rubric [to be supplied].

3.4 C.4 Project Part 4

  • Protection Poker (20 points)

    [Assignment Section not Relevant]

  • Vulnerability Fixes (35 points)

    [Assignment Section not Relevant]

  • Exploratory Penetration Testing (35 points)

    Each team member is to perform 3 hours of exploratory penetration testing on OpenMRS. This testing is to be done opportunistically, based upon your general knowledge of OpenMRS but without a test plan, as is done by professional penetration testers. DO NOT USE YOUR OLD BLACK BOX TESTS FROM PRIOR MODULES. Use a screen video/voice screen recorder to record your penetration testing actions. Speak aloud as you work to describe your actions, such as, “I see the input field for logging in. I’m going to see if 1 = 1 works for a password.” or “I see a parameter in the URL, I’m going to see what happens if I change the URL.” You should be speaking around once/minute to narrate what you are attempting. You don’t have to do all 3 hours in one session, but you should have 3 hours of annotated video to document your penetration testing. There’s lots of screen recorders available – if you know of a free one and can suggest it to your classmates, please post on Piazza.

    Pause the recording every time you have a true positive vulnerability. Note how long you have been working so a log of your work and the time between vulnerability discovery is created (For example, Vulnerability #1 was found at 1 hour and 12 minutes, Vulnerability #2 was found at 1 hour and 30 minutes, etc.) If you work in multiple sessions, the elapsed time will pick up where you left off the prior session – like if you do one session for 1 hour 15 minutes, the second session begins at 1 hour 16 minutes. Take a screen shot and number each true positive vulnerability . Record your actions such that this vulnerability could be replicated by someone else via a black box test case. Record the CWE for your true positive vulnerability. Record your work as in the following table. The reference info for video traceability is to aid a reviewer in watching you find the vulnerability. If you have one video, the “time” should aid in finding the appropriate part of the video. If you have multiple videos, please specify which video and what time on that video.

    Vulnerability #

    Elapsed Time

    Ref Info for Video Traceability

    CWE

    Commentary

    Replication instructions via a black box test and the screenshots for each true positive vulnerability should appear below the table, labeled with the vulnerability number. Since you are not recording all your steps, the replication instructions may not work completely since you may change the state of the software somewhere along the line – document what you can via a black box test and say the actual results don’t match your screenshot.

    After you are complete, compute an efficiency metric (true positive vulnerability/hour) metric for each student. Submit a table:

     

    # vuln

    Time

    Efficiency

    Name 1

       

    Name 2

       

    Name 3

       

    Name 4

       

    Total

       

    Copy the efficiency table you turned in for Project Part 3 #4. Add an additional line for Penetration testing. Compare and comment on this efficiency rate with the other vulnerability discovery techniques in the table you input in #4 of Project Part 3.

    • Each person on the team should submit one or more videos by uploading it/them to your own google drive and providing a link to the video(s), sharing the video with anyone who has the link and an NCSU login (which will allow peer evaluation and grading). The video(s) should be approximately 3 hours in length.

    • A person who does not submit a video can not be awarded the points for this part of the project while the rest of the team can.

    • It is possible to work for 3 hours and find 0 vulnerabilities – real penetration tests constantly work more than 3 hours without finding anything. That’s part of the reason for documenting your work via video.

    • For those team members who do submit videos, the grade will be an overall team grade.

    Submission: The team submits one file with the links to the team member’s files.

  • Peer Evaluation (10 points)

    Perform a peer evaluation on another team. Produce a complete report of feedback for the other team using this rubric [to be supplied].

Appendix D: Equipment Specifications

In this appendix we provide additional details of the equipment used in our case study. As noted in Section 11, a key resource used in this project was the school’s Virtual Computing LabFootnote 27 (VCL), which provided virtual machine (VM) instances. Researchers used VCL when applying EMPT, SMPT, and DAST as part of data collection for RQ1. All student tasks were performed using VCL for RQ1 and RQ2. Researchers created a system image including the SUT (OpenMRS) as well as SAST and DAST tools. The base image was assigned 4-cores, 8G RAM, and 40G disk space. An instance of this image could be checked out by students and researchers and accessed remotely through a terminal using ssh or graphically using Remote Desktop Protocol (RDP). Researchers also used two expanded instances of the base image with 16 CPUs, 32GB RAM, and 80G disk space. For client-server tools, a server was setup in a separate VCL instance by researchers with assistance from the teaching staff of the course. The server UI was accessible from VCL instances of the base image, while the server instance itself was only accessible to researchers and teaching staff. The server instance had 4 cores, 8G RAM, and 60G disk space disk space, and contained the server software for SAST-1 used to answer RQ2. All VCL instances in this study used the Ubuntu operating system.

The VCL alone was used for data collection for RQ2. However, the base VCL images were small, and the remote connection to VCL could lag. Researchers used two used additional resources as needed for RQ1 data collection. First, we created a VM in VirtualBox using the same operating system (Ubuntu 18.04 LTS) and OpenMRS version (Version 2.9) as the VCL images. This VM was used by researchers for SMPT and EMPT data collection, particularly when reviewing the output of each technique where instances of the SUT were needed on an ad hoc basis. The VM was assigned 2 CPUs, 4GB RAM, and 32G disk space and could be copied and shared amongst researchers to run locally. Researchers increased the size of the VM as needed, up to 8 CPUs and 16GB RAM when the host system could support the VM size. A second VM was created in VirtualBox with the same specifications and operating system, but with the server software for Sonarqube installed. We also used a desktop machine with 24 CPUs, 32G RAM, and 500G disk space. The desktop was running the Ubuntu operating system. This machine was accessible through the terminal via ssh and graphically using x2goFootnote 28. For RQ1 data collection we ran the SAST-1 server software directly on this machine. The desktop was also used to run VirtualBox VMs for resource-intensive activity such as running Sonarqube and DAST-2.

While equipment constraints impacted both SAST and DAST, available equipment and intended use also impacts how SAST tools are setup. SAST tools can be setup and configured according to different architectures. The SAST tools used in this study could be setup as client-server tools where the SUT code is scanned on the “client” machine, and information is sent to a “server”. The analyst then reviews the results through the server. For some tools, the automated analysis and rules are applied on the client, while for other tools the automated analysis and rules are applied on the server. The SAST tools used in this study also included an optional plugin for Integrated Development Environments (IDEs) such as EclipseFootnote 29. The plugin allows developers to initialize SAST analysis and in some cases view alerts from the tool within the IDE itself. Some tools can be run without a server using only IDE plugins. Other tools require a server. Similar to the previous work by Austin et al. (Austin and Williams 2011; Austin et al. 2013), we found that the server GUI was easier to use when aggregating and analyzing all system vulnerabilities for RQ1. Consequently, a client-server configuration was used with SAST-2 and Sonarqube to answer RQ1. SAST-2 and SAST-3 were more easily configured to use locally within an IDE, as was done in for the class with RQ2 and RQ3.

Appendix E: All CWEs Table

Table 11 shows the CWE for high and medium severity vulnerabilities found. Table 12 provides the same information for low severity vulnerabilities. The first column of the table indicates the CWE number. The CWEs are organized based on the OWASP Top Ten Categories. The second column of the table indicates which, if any, of the OWASP Top Ten the vulnerability maps to. Columns three and four of the table are the number of vulnerabilities found by the techniques SMPT and EMPT. Columns five through eight break down the vulnerabilities found by DAST and SAST by tool (ZAP, DA-2, Sonar, and SA-2). Column nine of Table 11 shows the total number of vulnerabilities found of each CWE type. The Total column is not the same as the sum of the previous six columns. Some vulnerabilities were found using more than one technique. Similarly, 20 Vulnerabilities were associated with more than one CWE; therefore the total vulnerabilities for each technique as shown in Table 5 may be lower than the sum of each column in Table 11.

Table 11 CWEs associated with more severe Vulnerabilities
Table 12 Low Severity Vulnerability CWEs

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elder, S., Zahan, N., Shu, R. et al. Do I really need all this work to find vulnerabilities?. Empir Software Eng 27, 154 (2022). https://doi.org/10.1007/s10664-022-10179-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10179-6

Keywords

Navigation