Abstract
Over the last three decades, many quality models have been proposed for software systems. These models employ software metrics to assess different quality attributes. However, without adequate thresholds, it is very hard to associate plausible interpretations with software quality attributes. Many attempts are reported in the literature to identify meaningful thresholds for software metrics. However, these attempts fail to clearly map the proposed thresholds to the assessment of software quality attributes. This paper aims at bridging this gap and provides a methodology for quality assessment models based on software metric thresholds. By doing so, software products can be easily ranked according to specific quality levels. Our methodology defines software metric thresholds to generate ordinal data. Then, the ordinal data is combined with a weighting scheme based on the Pearson correlation coefficient. The resulting weights are assigned to data categories in each software metric. Thanks to these weights, project quality levels are straightforwardly estimated. To assess the effectiveness of our software metric thresholding framework, we carry out an empirical study. The reported results clearly show that the proposed framework has a significant impact on the assessment and evaluation of the software product quality.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Jones, C.: Software Quality: The Key to Successful Software Engineering. In: Software engineering best practices: lessons from successful projects in the top companies, pp. 555–642. McGraw-Hill (2010a)
Fenton, N., Bieman, J.: Software Metrics: A Rigorous and Practical Approach. CRC Press (2014)
Chappell, D: The three aspects of software quality: functional, structural, and process. Sponsored by Microsoft Corporation (2013)
SEI: Process maturity profile: Software CMM 2005 end-year," Software Engineering Institute, Carnegie Mellon University (2006)
Abdellatif, A., Alshayeb, M., Zahran, S., Niazi, M.: A measurement framework for software product maturity. Softw. Evol. Process p 32, (2018)
Ferreira, K.A., Bigonha, M.A., Bigonha, R.S., Mendes, L.F., Almeida, H.C.: Identifying thresholds for object-oriented software metrics. J. Syst. Softw. 85(2), 244–257 (2012)
Chrissis, M.B., Konrad, M., Shrum, S.: CMMI Guidlines for Process Integration and Product Improvement. Addison-Wesley Longman Publishing Co., Inc., USA (2003)
Loon, H.V.: Process Assessment and ISO/IEC 15504, Springer US, (2007)
International Organization for Standardization-ISO/IEC 15939:2002," 23 4 2020. [Online]. Available: https://www.iso.org/standard/29572.html.
Committee, S.E.S.: IEEE Standard for a Software Quality Metrics Methodology IEEE Std 1061™-1998 (R2009), 3 Park Avenue, New York, NY 10016–5997. The Institute of Electrical and Electronics Engineers Inc, USA (1998)
McCall, J., Richards, P., Walters, G.: Factors in software quality: volume i. concepts and definitions of software quality, General Electric CO Sunnyvale CA (1977a)
McCall, J., Richards, P., Walters, G.: Factors in software quality. Volume-III. preliminary handbook on software quality for an acquisiton manager, General Electric CO Sunnyvale CA (1977b)
Boehm, B., Brown, J., Lipow, M.: Quantitative evaluation of software quality. In Proceedings of the 2nd international conference on Software engineering, (1976)
Boehm, B., Brown, J., Kaspar, H., Lipow, M., MacLeod, G.: Characteristics of Software Quality. Elsevier, North Holland (1978)
Grady, R., Caswell, D.: Software metrics: establishing a company-wide program. Prentice-Hall Inc (1987)
Software engineering - Product quality - Part 1: Quality model," ISO/IEC 9126-1:2001, 23 4 2020. [Online]. Available: https://www.iso.org/standard/22749.html. [Accessed 23 4 2020]
"ISO/IEC 25010:2011-Systems and software engineering Systems and software Quality Requirements and Evaluation (SQuaRE): System and software quality models," ISO, 23 4 2020. [Online]. Available: https://www.iso.org/standard/35733.html. Accessed 23 4 2020
Tajima, D., Matsubara, T.: Special feature the computer software industry in Japan. Computer 14(05), 89–96 (1981)
Origin, A.: Method for qualification and selection of open source software (QSOS)," 23 4 2020. [Online]. Available: http://www.qsos.org.
Wasserman, A., Pal, M., Chan, C.: The business readiness rating model: an evaluation framework for open source. In Proceedings of the EFOSS Workshop, Como, Italy, Como (2006)
Alvaro, A., de Almeida, E. S., Meira, S. L.: A software component maturity model (SCMM). In 33rd EUROMICRO conference on software engineering and advanced applications (EUROMICRO2007), (2007)
Al-Qutaish, R.E., Abran, A.: A maturity model of software product quality. J. Res. Pract. Inf. Technol. 43(4), 307 (2011)
Gilb, T., Finzi, S.: Principles of Software Engineering Management. Addison-wesley Reading (1988)
Gilb, T.: Software Metrics. Winthrop Inc, Cambridge (1977)
Alshayeb, M., Abdellatif, A. K. Zahran, S., Niazi, M.: Towards a framework for software product maturity measurement. In The Tenth international conference on software engineering advances (2015)
Kitchenham, B., Walker, J.: A quantitative approach to monitoring software development. Softw. Eng. J. 4(1), 2–14 (1989)
Jones, C.: Software Engineering Best Practices. In: Chapter 9 Software quality: the key to successful software engineering, pp. 555–643. McGraw-Hill, New York (2010)
Alves, T.L., Ypma, C., Visser, J.: Deriving metric thresholds from benchmark data. In 2010 IEEE international conference on software maintenance (2010)
Shatnawi, R., Li, W., Swain, J., Newman, T.: Finding software metrics threshold values using ROC curves. J. Softw. Maint. Evol. Res. Pract. 22, 1–16 (2010)
Alqmase, M., Alshayeb, M., Ghouti, L.: Threshold extraction framework for software metrics. J. Comput. Sci. Technol. 34(5), 1063–1078 (2019)
Veado, L., Vale, G. Fernandes, E. Figueiredo, E.: TDTool: threshold derivation tool. In Proceedings of the 20th international conference on evaluation and assessment in software engineering (2016)
Oliveira, P., Lima, F., Valente, M.T., Serebrenik, A.: RTTOOL-a tool for extracting relative thresholds for source code metrics. In 30th IEEE international conference on software maintenance and evolution tool Track, Victoria (2014)
Zhang, H.: An investigation of the relationships between lines of code and defects. In 2009 IEEE international conference on software maintenance (2009)
Lipow, M.: Number of faults per line of code. IEEE Trans. Software Eng. 4, 437–439 (1982)
Yamashita, K., Huang, C., Nagappan, M., Kamei Y., Mockus, A. Hassan, A, Ubayashi, N.: Thresholds for size and complexity metrics: a case study from the perspective of defect density. In 2016 IEEE international conference on software quality, reliability and security (QRS), (2016)
Oliveira, P., Valente, M.T, Lima, F.P.: Extracting relative thresholds for source code metrics. In 2014 Software evolution week-IEEE Conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE) (2014)
Do Vale, G.A., Figueiredo, E.M.L.: A method to derive metric thresholds for software product lines. In 2015 29th Brazilian symposium on software engineering (2015)
Masramon, G.P., Muñoz, L.A.B.: Toward better feature weighting algorithms: a focus on Relief," arXiv preprint http://arxiv.org/abs/1509.03755 (2015)
Mori, T.: Information gain ratio as term weight: the case of summarization of ir results. In COLING 2002: The 19th international conference on computational linguistics (2002)
Kent, J.: Information gain and a general measure of correlation. Biometrika 70, 163–173 (1983)
Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson Correlation Coefficient. In: Noise reduction in speech processing, pp. 1–4. Springer (2009)
Steuer, R., Kurths, J., Daub, C., Weise, J., Selbig, J.: The mutual information: detecting and evaluating dependencies between variables. Bioinformatics 18, S231–S240 (2002)
Weka, G.: Weka documentation," machine learning group at the University of Waikato, [Online]. Available: http://infochim.u-strasbg.fr/cgi-bin/weka-3-9-1/doc/weka/attributeSelection/CorrelationAttributeEval.html. Accessed 23 March 2020
Mountassir, A., Benbrahim, H., Berrada, I.: An empirical study to address the problem of unbalanced data sets in sentiment classification. In IEEE international conference on systems, man, and cybernetics (SMC), (2012)
Han, J., Pei, J., Kamber, M.: Data mining: concepts and techniques. Elsevier (2011)
Chmielewski, M., Grzymala-Busse, J.: Global discretization of continuous attributes as preprocessing for machine learning. In Third international workshop on rough sets and soft computing, (1994)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6(4), 393–423 (2002)
Hall, M.A.: Thesis-Correlation-based Feature Selection for Machine Learning, Hamilton. The University of Waikato, Department of Computer Science, NewZealand (1999)
He, P., Li, B., Liu, X., Chen, J., Ma, Y.: An empirical study on software defect prediction with a simplified metric set. Inf. Softw. Technol. 59, 170–190 (2014)
Spinellis, D.: "Tool writing: a forgotten art? IEEE Softw. 22(4), 9–11 (2005)
Lincke, R., Lundberg, J., Löwe, W.: Comparing software metrics tools. In: Proceedings of the 2008 international symposium on software testing and analysis, New York, NY, USA, (2008)
Jureczko, M., Madeyski, L., Spinellis, D.: PROMISE repository: software defect prediction dataset. (2010a). [Online]. Available: http://openscience.us/repo/. Accessed 2018
Jureczko, M., Madeyski, L.: Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th international conference on predictive models in software engineering, Timiundefinedoara, Romania, (2010)
Jureczko, M., Spinellis, D.: Using object-oriented design metrics to predict software defects. In Models and methods of system dependability, Oficyna Wydawnicza Politechniki Wroclawskiej, (2010b)
Zou, K., Tuncali, K., Silverman, S.: Correlation and simple linear regression. Radiology 227(3), 617–662 (2003)
Bluman, A.G.: Probability and Counting Rules. In: Elementary statistics: a step by step approach, pp. 185–255. McGraw-Hill, New York (2009)
Kira, K., Rendell, L. A.: A practical approach to feature selection. In: Ninth international workshop on machine learning (1992).
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning (1994).
Jureczko, M., Madeyski, L., Spinellis, D.: Software defect prediction dataset (2010b). [Online]. Available: http://purl.org/MarianJureczko/MetricsRepo. Accessed 27 2 2021
Acknowledgements
M. Alqmase and M. Alshayeb would like to acknowledge the support provided by the Deanship of Scientific Research at King Fahd University of Petroleum and Minerals. L. Ghouti acknowledges the support of Prince Sultan University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they do not have any conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Appendix 1
There are three basic interpretations of probability: classical probability, empirical probability, and subjective probability (Bluman 2009).
In the classical probability formula, the probability of any event E is
This probability uses the sample space (S). It assumes that all outcomes in (S) are equally likely to occur.
In the empirical probability formula, given frequency distribution, the probability of an event being in a given class is
This probability is based on observation. The difference between classical and empirical probability is that empirical probability relies on actual experience to determine the likelihood of outcomes (Bluman 2009).
In this work, we proposed a generalized version of the empirical probability formula to adapt the weights extracted by black-box models (e.g., AI models). These weights provide abstract semantics of the impact of many factors. Instead of considering many factors while building the probabilistic models, we tried to abstract them inside the weights, and then the proposed formula will be applied to those weights to build a probabilistic model. This idea is useful to abstract the complexity and allows building something above something. This will help model a very complex concept. The proposed formula can be defined as follows: Given a weighted sum distribution, the probability of an event being in a given class is
This probability is also empirical probability and is based on weights identified by observers. Observers usually are AI models that analyze huge amounts of data to study many relations and abstract the impact of many factors as weights. The following examples will give an interpretation of this formula.
Example 1
Given a software project with three entities E = {e1, e2, e3} described in Fig. 7
-
The metric m1 measures the three entities as (good, good, bad).
-
The metric m2 measures the three entities as (good, bad, bad).
-
Table
Table 14 Frequency distribution (Color table online) 14 shows the frequency distribution.
1.1.1 What is the probability that the given software project is in a good quality level?
This simply can be solved by the frequency formula presented in Eq. (14) as follows:

The next examples show cases where the weights are involved, and the proposed formula in Eq. (15) is applied.
Example 2
Given a software project with three entities E = {e1, e2, e3} described in Fig.
8.
-
1.
The metric m1 measures the three entities as (good, good, bad).
-
2.
The metric m2 measures the three entities as (good, bad, bad).
-
3.
Considering that the impact of m2 to identify the quality of the project is twice m1 where the impact is measured by a black-box model, which gives m1 a weight (0.2) and m2 a weight (0.4). The weighted sum distribution and the frequency distribution are displayed in Table
Table 15 Weighted and frequency distribution (Color table online) 15.
1.1.2 What is the probability that the given software project is in a good quality level?
As mentioned above, the impact of metric m2 to identify the quality of the project is twice m1. This impact is given by these weights m1 = 0.2 and m2 = 0.4. While there is a new factor which is the impact of each software metric on identifying the quality of the software, then the proposed formula presented in Eq. (15) is applied to compute the probability that the given software project is of a good quality level.

The following page provides an interpretation of the proposed formula to calculate the probability.
The above example can also be modeled using multiplication rules, addition rules, and conditional probability. While the impact of metric m2 to identify the quality of the project is twice m1, then this simply means \({\varvec{p}}({\varvec{m}}_{1} ) = \frac{1}{3},{\text{ and }}{\varvec{p}}({\varvec{m}}_{2} ) = \frac{2}{3}\).
With the use of a tree diagram, the sample space can be determined, as shown in Fig.
9. First, assign probabilities to each branch. Next, using the multiplication rule, multiply the probabilities for each branch. Finally, use the addition rule since a good quality level can be identified by metrics m1 or m2.
This shows the same result with comparing to the proposed formula, which provides interpretation. In the next page, we will show another example that helps understand the idea. We will model our problem as a box-color ball problem.
Example 3
In this example, the problem described in example-2 is modeled here as a box-color ball problem. In Fig.
10, the box m1 contains small balls with 1 red ball and two green balls. Box m2 contains big balls with two red balls and one green ball. The preference of selecting a box with a big ball is twice of selecting a box with small balls. Assuming that the preference is measured by a black-box model, which gives m1 a weight (0.2) and m2 a weight (0.4). This simply means \({\varvec{p}}({\varvec{m}}_{1} ) = \frac{1}{3},{\text{ and }}{\varvec{p}}({\varvec{m}}_{2} ) = \frac{2}{3}\).
1.1.3 Find the probability of selecting a green ball?
With the use of a tree diagram, the sample space can be determined, as shown in Fig.
11. First, assign probabilities to each branch. Next, using the multiplication rule, multiply the probabilities for each branch. Finally, use the addition rule since a green ball can be obtained from the box m1 or box m2.
Example 4
Given a software project with three entities E = {e1, e2, e3} described in Fig.
12.
-
1.
The metric m1 measures the three entities as (good, good, bad).
-
2.
The metric m2 measures measures the three entities as (good, good, bad).
-
3.
The metric m3 measures the three entities as (good, bad, bad).
-
4.
Considering that the impact of m2 to identify the quality of the project is twice m1, and m3 is twice m2 where the impact is measured by a black-box model, which gives m1 a weight (1.1), m2 a weight (2.2) and m3 a weight (4.4). The weighted sum distribution and the frequency distribution are described in Table
Table 16 Weighted and frequency distribution (Color table online) 16.
1.1.4 What is the probability that the given software project is in a good quality level?
We can apply the proposed formula in Eq. (15) as follows:

The following page provides an interpretation of how this probability can be calculated using multiplication rules, addition rules, and conditional probability.
The above example can also be modeled as follows:
While the impact of metric m2 to identify the quality of the project is twice m1, and m3 is twice m2, then this simply means \({\varvec{p}}({\varvec{m}}_{1} ) = \frac{1}{7}\), \({\varvec{p}}({\varvec{m}}_{2} ) = \frac{2}{7}\), and \({\varvec{p}}({\varvec{m}}_{3} ) = \frac{4}{7}\).
With the use of a tree diagram, the sample space can be determined, as shown in Fig.
13. First, assign probabilities to each branch. Next, using the multiplication rule, multiply the probabilities for each branch. Finally, use the addition rule since a good quality level can be identified by metrics m1, m2 or m3.
This shows the same result with comparing to the proposed formula, which provides interpretation.
Example 5
Given a software project with three entities E = {e1, e2, e3} described in Fig.
-
1.
The metric m1 measures the three entities as (bad, good, good).
-
2.
The metric m21 measures the three entities as (bad, good, good).
-
3.
The metric m3 measures the three entities as (bad, good, good).
-
4.
Considering that the impact of each quality level in each metric is different. Assume that this impact is measured by a black-box model, which gives weights described in Table
Table 17 Weights identified by black-box model 17. The weighted sum distribution and the frequency distribution are described in Table
Table 18 Weighted and frequency distribution (Color table online) 18.
1.1.5 What is the probability that the given software project is in a good quality level?
We can apply the proposed formula in Eq. (15) as follows:

The following page provides an interpretation of how this probability can be calculated by modeling it as a Box-color ball problem.
The above example can also be modeled as a box-color ball problem. In Fig.
15, box1 contains small balls with two red balls and four green balls. Box2 contains big balls with one red ball and two green balls. Assuming that the preference of selecting a box with a big/small ball is measured by a black-box model, which gives box1 a weight (2.2) and box2 a weight (2.2). This simply means \({\mathbf{p}}\left( {{\mathbf{box}}1} \right) = \frac{0.6}{{2.8}},{\text{ and }}{\mathbf{p}}\left( {{\mathbf{box}}2} \right) = \frac{2.2}{{2.8}}\).
1.1.6 Find the probability of selecting a green ball?
With the use of a tree diagram, the sample space can be determined, as shown in Fig.
16. First, assign probabilities to each branch. Next, using the multiplication rule, multiply the probabilities for each branch. Finally, use the addition rule since a green ball can be obtained from box1 or box2.
1.2 Appendix 2
When we deal with feature evaluators, we have two categories which are feature weighting techniques and feature selection techniques.
For feature weighting schema, the concept is to give weight to the given features based on two factors which are relevance and/or redundancy. Relevance is about measuring how much the given feature is relevant to the class (target attribute) by examining the correlation between the given feature and the class. Redundancy is about measuring the redundant features to predict the class by examining the correlations among features. A feature is said to be redundant if one or more of the other features are highly correlated with it (Hall 1999).
For feature selection techniques, the concept is about selecting a subset of the given features. The selection can be achieved with the help of a feature weighting schema. For feature selection techniques, there are some techniques that can be used to remove irrelevant features, some techniques to remove redundant features, and some techniques to remove both irrelevant and redundant features. The purpose of feature selection techniques is to improve performance by removing redundant or/and irrelevant features (Hall 1999). In feature selection techniques, a good subset of features is the one that contains features highly correlated with the class and less correlated with each other (Hall 1999).
While feature weighting schema provides techniques to assign a weight to each feature, the feature selection techniques use these weights along with some selection techniques to select the most useful subset of features.
For example, the correlation-based feature selection technique (CFS) that is proposed by Mark Hall (Hall 1999), uses a feature weighting algorithm called RELIEF. CFS is based on the feature subset evaluation function which is given by Eq. (16). CFS is a simple filter algorithm that ranks feature subsets according to a correlation calculated by RELIEF. CFS is, in fact, a Pearson’s correlation coefficient (Hall 1999).
where \(Ms\) is the heuristic “merit” of a feature subset s containing \(k\) features, \(rcf\) is the mean feature-class correlation \(\left( {f{ } \in { }S} \right)\), and \(rff\) is the average feature-feature intercorrelation. The numerator of Eq. (16) can be thought of as providing an indication of how much a set of features can predict the class; the denominator of how much redundancy there is among the features.
According to the following example that is extracted from Hall’s thesis (Hall 1999), Table
19 shows the feature’s weights that are calculated by RELIEF, between the given feature and the other features and between the given feature and the class. Table
20 shows how these weights are utilized by CFS to select the best subset of features.
The search starts with an empty set of features. By applying the evaluation function, subsets in bold show a local improvement with respect to the previous best set.
Relief is a feature weight schema that can be utilized for feature selection problems. Relief is based on a statistical method to calculate the weight of the given feature (Kira and Rendell 1992). It is an algorithm that is inspired by instance-based learning. Relief can detect features that are statistically relevant to the target attribute. It considers nominal features (including Boolean) or numerical features (integer or real). The first version of this algorithm is based on a two-class classification problem where the target attribute (class or concept) has only two values (Kira and Rendell 1992). However, the relief is extended by Kononenko (Kononenko 1994) to be applicable for multi-class classification problems. According to the Relief algorithm (Kira and Rendell 1992), the training dataset (S) is separated into two sets \(\left\{ {S + ,{ }S - } \right\}\), instances with positive class value and instances with negative class value (assuming the class attribute has two values, positive and negative). Relief picks a sample with \(M\) instances triplets of an instance \(X\). The instance \(X\). is represented by a vector that is composed of \(p\) feature values \(\left\{ {x1,{ }x2,{ } \ldots ,{ }xp} \right\}\). where the given feature set is \(\left\{ {f1,{ }f2,{ } \ldots ,{ }fp} \right\}\). For each instance \(X\), the algorithm picks one of the positive instances \(\left\{ {S + } \right\}\). that is closest to \(X\)., and picks one of the negative instances s \(\left\{ {S - } \right\}{ }\). that is closest to \(X\).. Relief uses the p-dimensional Euclid diance for selecting the closest instances. Then, it determines which one is a near-hit and which one is a near-miss to the instance \(X\). Near-hit is the closest instance to \(X\). with same target value, and the Ne-miss is the closest instance to \(X\) with different class target value. After that, Relief calls a routine to update the feature weight vector \(W\) for every sample triplet and determines the average feature weight vector relevance (of all thfeates to the target concept). \(W\). is initialized with zeros: \((0,{ }0,{ } \ldots ,{ }\) a updad using the following formula:
\(W_{i} = W_{i} { } - diff\left( {x_{i} ,\,near\,Hit_{i} } \right)^{2} { }\, + \,diff\left( {x_{i} ,{ }\,near\,Miss_{i} } \right)^{2} { }\)where \(diff\). is a function that can be described for nominal feature values as follows:
Rights and permissions
About this article
Cite this article
Alqmase, M., Alshayeb, M. & Ghouti, L. Quality assessment framework to rank software projects. Autom Softw Eng 29, 41 (2022). https://doi.org/10.1007/s10515-022-00342-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-022-00342-0