Skip to main content
Log in

Assessing the quality of industrial avionics software: an extensive empirical evaluation

  • Experience Report
  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

A real-time operating system for avionics (RTOS4A) provides an operating environment for avionics application software. Since an RTOS4A has safety-critical applications, demonstrating a satisfactory level of its quality to its stakeholders is very important. By assessing the variation in quality across consecutive releases of an industrial RTOS4A based on test data collected over 17 months, we aim to provide a set of guidelines to 1) improve the test effectiveness and thus the quality of subsequent RTOS4A releases and 2) similarly assess the quality of other systems from test data. We carefully defined a set of research questions, for which we defined a number of variables (based on available test data), including release and measures of test effort, test effectiveness, complexity, test efficiency, test strength, and failure density. With these variables, to assess the quality in terms of number of failures found in tests, we applied a combination of analyses, including trend analysis using two-dimensional graphs, correlation analysis using Spearman’s test, and difference analysis using the Wilcoxon rank test. Key results include the following: 1) The number of failures and failure density decreased in the latest releases and the test coverage was either high or did not decrease with each release; 2) increased test effort was spent on modules of greater complexity and the number of failures was not high in these modules; and 3) the test coverage for modules without failures was not lower than the test coverage for modules with failures uncovered in all the releases. The overall assessment, based on the evidences, suggests that the quality of the latest RTOS4A release has improved. We conclude that the quality of the RTOS4A studied was improved in the latest release. In addition, our industrial partner found our guidelines useful and we believe that these guidelines can be used to assess the quality of other applications in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. This is the result of NoTC RM divided by the sum of NoTC L+S and NoTC HS for rows R3 to R5, respectively, in Table 6.

  2. Calculated as NoTC AM divided by the sum of NoTC L+S and NoTC HS for row R3 in Table 6.

  3. It is computed by NoTC HS /(NoTC HS  + NoTC L+S ).

  4. Microsoft’s name for the operating system. See https://en.wikipedia.org/wiki/Windows_Vista for details.

  5. An open-source platform. See www.eclipse.org for details.

References

  • Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52

    Article  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1):289–300

    MathSciNet  MATH  Google Scholar 

  • Boehm BW, Brown JR, Lipow M (1976) Quantitative evaluation of software quality[C]. In: Proceedings of the 2nd international conference on Software engineering. IEEE Computer Society Press, p 592–605

  • Cai X, Lyu MR (2005) The effect of code coverage on fault detection under different testing profiles. ACM SIGSOFT Softw Eng Notes 30(4):1–7

    Google Scholar 

  • David L, Podgurski A (2003) A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases. In: Proceedings of the 14th International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society Press, p 442–453

  • Denaro G, Pezze M (2002) An empirical evaluation of fault proneness models. Software Engineering, 2002. ICSE 2002. In: Proceedings of the 24rd International Conference on. IEEE, 2002.

  • Fenton NE, Neil M (1999) Software metrics: successes, failures and new directions. J Syst Softw 47:149–157

    Article  Google Scholar 

  • Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814

    Article  Google Scholar 

  • Fenton N, Pfleeger SL (1996) Software metrics—A rigorous and practical approach, 2nd edn. International Thomson Computer Press, London

    Google Scholar 

  • Fujii T, Dohi T, Fujiwara T. Towards quantitative software reliability assessment in incremental development processes[C]. In: Proceedings of the 33rd International Conference on Software Engineering. ACM, p 41–50

  • Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RL (2006) Multivariate data analysis, vol 6. Pearson Prentice Hall, Upper Saddle River

    Google Scholar 

  • Horgan JR, London S, Lyu MR (1994) Achieving software quality with testing coverage measures[J]. Computer 27(9):60–69

  • Hutchings M, Goradia T, Ostrand T et al (1994) Experiments of the effectiveness of dataflow-and controlflowbased test adequacy criteria[C]. In: Proceedings of the 16th international conference on Software engineering. IEEE Computer Society Press, p 191–200

  • IEEE standard for software test documentation, IEEE Std 829-1998, 16 Dec. 1998

  • Inozemtseva L, Holmes R (2014) Coverage is not strongly correlated with test suite effectiveness[C]. In: Proceedings of the 36th International Conference on Software Engineering. ACM, p 435–445

  • International Standard ISO/IEC 12207, Information Technology-Software Life Cycle Processes, International Organization for Standardization, International Electrotechnical Commission, 1995

  • ISO/IEC 9126-1:2001: Software Engineering – Product Quality. Part 1: Quality Model. Geneva, Switzerland: International Organization for Standardization, 2001

  • ISO/IEC 25010 (2011) Systems and software engineering — systems and software quality requirements and evaluation (SQuaRE) — system and software quality models[J]. International Organizationfor Standardization, 2011: 34

  • Kevrekidis K, Albers S, Sonnemans PJM, Stollman GM (2009) Software complexity and testing effectiveness: an empirical study[C]. Reliability and Maintainability Symposium, 2009. RAMS 2009. Annual. IEEE, 2009: 539–543

  • Khoshgoftaar TM, Allen EB (1999) Logistic regression modeling of software quality. Int J Reliab Qual Saf Eng 6(04):303–317

    Article  Google Scholar 

  • Khoshgoftaar TM, Allen EB, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system[C]. Software Reliability Engineering. In: Proceedings, SeventhInternational Symposium on. IEEE, p 364–371

  • Lyu MR, Huang Z, Sze KS, Cai X (2003) An empirical study on testing and fault tolerance for software reliability engineering[C]. Software Reliability Engineering. ISSRE 2003. 14th International Symposium on. IEEE, p 119–130

  • Malaiya YK, Li N, Bieman J, Karcich R, Skibbe B (1994) The relationship between test coverage and reliability[C]. Software Reliability Engineering, p 186–195. In: Proceedings, 5th International Symposium on. IEEE,

  • Malaiya YK, Li N, Bieman J, Karcich R (2002) Software reliability growth with test coverage. IEEE Trans Reliab 45(4):420–426

    Article  Google Scholar 

  • Marick B (1991) Experience with the cost of different coverage goals for testing[C]. In: Proceedings Pacific Northwest Soft. Quality Conf, p 147–164

  • Marick B (1999) How to misuse code coverage[C]. In: Proceedings of the 16th Interational Conference on Testing Computer Software, p 16–18

  • Memon AM, Xie Q (2005) Studying the fault-detection effectiveness of GUI test cases for rapidly evolving software. IEEE Trans Softw Eng 31(10):884–896

    Article  Google Scholar 

  • Mockus A, Nagappan N, Dinh-Trong TT (2009) Test coverage and post-verification defects: A multiple case study[C]. Empirical Software Engineering and Measurement. ESEM 2009. 3rd International Symposium on IEEE, p 291–301

  • Moller K-H, Paulish D (1993) An empirical investigation of software fault distribution[C]. Software Metrics Symposium. In: Proceedings., First International. IEEE, p 82–90

  • Musa JD, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill, New York

    Google Scholar 

  • Nagappan N, Williams L, Vouk M, Osborne J (2005) Early estimation of software quality using in-process testing metrics: a controlled case study. ACM SIGSOFT Softw Eng Notes 30(4):1–7

    Article  Google Scholar 

  • Nagappan N, Ball T, Murphy B (2006) Using historical in-process and product metrics for early estimation of software failures, Proceedings of the 17th International Symposium on Software Reliability Engineering, pp. 62–74

  • Nagappan N, Maximilien EM, Bhat T, Williams L (2008) Realizing quality improvement through test driven development: results and experiences of four industrial teams. Empir Softw Eng 13:289–302

    Article  Google Scholar 

  • Ntafos SC (1998) A comparison of some structural testing strategies. IEEE Trans Softw Eng 6:868–874

    Google Scholar 

  • Rice J (2006) Mathematical statistics and data analysis, 3rd edn. Nelson Education

  • Rosenberg L, Hammer T, Shaw J (1998) Software metrics and reliability. Proceedings of the Ninth International Symposium on Software Reliability Engineering

  • RTCA/DO-178B, Software Considerations in Airborne Systems and Equipment Certification, December 1, 1992

  • Schneidewind NF (1999) Measuring and evaluating maintenance process using reliability, risk, and test metrics. IEEE Trans Softw Eng 25(6):768–781

    Article  Google Scholar 

  • Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  • Wu J, Ali S, Yue T, Tian J (2013) Experience report: Assessing the reliability of an industrial avionics software: Results, insights and recommendations[C]. Software Reliability Engineering (ISSRE), IEEE 24th International Symposium on. IEEE, p 218–227

  • Yekutieli D, Benjamini Y (1999) Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J Stat Plann Infer 82(1–2):171–196

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang X, Pham H (2000) An analysis of factors affecting software reliability. J Syst Softw 50(1):43–56

    Article  Google Scholar 

  • Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependencygraphs[C]. Proceedings of the 30th international conference on Software engineering. ACM, p 531–540

  • Zimmermann T, Nagappan N, Herzig K, Premraj R, Williams L (2011) An empirical study on the relation between dependency neighborhoods and failures[C]. Software Testing, Verification and Validation (ICST), 2011 IEEE FourthInternational Conference on. IEEE, p 347–356

  • Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process, Proceedings of the Seventh Joint Meeting of the European Software Engineering Conference/ACM SIGSOFT Symposium on the Foundations of Software Engineering, p 91–100

Download references

Acknowledgments

This research is jointly supported by the Technology Foundation Program (JSZL2014601B008) of the National Defense Technology Industry Ministry, the State Key Laboratory of the Software Development Environment (SKLSDE-2013ZX-12). This work was also supported by the MBT4CPS project funded by the Research Council of Norway (grant no. 240013/O70) under the category of Young Research Talents of the FRIPO funding scheme. Tao Yue and Shaukat Ali are also supported by the EU Horizon 2020 project U-Test (http://www.u-test.eu/) (grant no. 645463), the RFF Hovedstaden funded MBE-CR (grant no. 239063) project, the Research Council of Norway funded Zen-Configurator (grant no. 240024/F20) project, and the Research Council of Norway funded Certus SFI (grant no. 203461/O30) (http://certus-sfi.no/).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji Wu.

Additional information

Communicated By: Brian Robinson

This paper is the extended version of the conference paper published in the 24th IEEE International Symposium on Software Reliability Engineering (ISSRE 2013).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, J., Ali, S., Yue, T. et al. Assessing the quality of industrial avionics software: an extensive empirical evaluation. Empir Software Eng 22, 1634–1683 (2017). https://doi.org/10.1007/s10664-016-9440-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9440-x

Keywords

Navigation