Quality assessment framework to rank software projects

Alqmase, Mohammed; Alshayeb, Mohammad; Ghouti, Lahouari

doi:10.1007/s10515-022-00342-0

Quality assessment framework to rank software projects

Published: 18 May 2022

Volume 29, article number 41, (2022)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

557 Accesses
Explore all metrics

Abstract

Over the last three decades, many quality models have been proposed for software systems. These models employ software metrics to assess different quality attributes. However, without adequate thresholds, it is very hard to associate plausible interpretations with software quality attributes. Many attempts are reported in the literature to identify meaningful thresholds for software metrics. However, these attempts fail to clearly map the proposed thresholds to the assessment of software quality attributes. This paper aims at bridging this gap and provides a methodology for quality assessment models based on software metric thresholds. By doing so, software products can be easily ranked according to specific quality levels. Our methodology defines software metric thresholds to generate ordinal data. Then, the ordinal data is combined with a weighting scheme based on the Pearson correlation coefficient. The resulting weights are assigned to data categories in each software metric. Thanks to these weights, project quality levels are straightforwardly estimated. To assess the effectiveness of our software metric thresholding framework, we carry out an empirical study. The reported results clearly show that the proposed framework has a significant impact on the assessment and evaluation of the software product quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How Good Is My Project? Experiences from Projecting Software Quality Using a Reference Set

Software quality assessment model: a systematic mapping study

Article 26 July 2019

A projection-based approach to software quality evaluation from the users’ perspectives

Article 20 September 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

All datasets along with description are available in (Lincke et al. 2008). However, in the latest access (February 27, 2021), the link refers to address that could not be found. Other link to the dataset can be reached through this reference (Zou et al. 2003).

References

Jones, C.: Software Quality: The Key to Successful Software Engineering. In: Software engineering best practices: lessons from successful projects in the top companies, pp. 555–642. McGraw-Hill (2010a)
Google Scholar
Fenton, N., Bieman, J.: Software Metrics: A Rigorous and Practical Approach. CRC Press (2014)
Book MATH Google Scholar
Chappell, D: The three aspects of software quality: functional, structural, and process. Sponsored by Microsoft Corporation (2013)
SEI: Process maturity profile: Software CMM 2005 end-year," Software Engineering Institute, Carnegie Mellon University (2006)
Abdellatif, A., Alshayeb, M., Zahran, S., Niazi, M.: A measurement framework for software product maturity. Softw. Evol. Process p 32, (2018)
Ferreira, K.A., Bigonha, M.A., Bigonha, R.S., Mendes, L.F., Almeida, H.C.: Identifying thresholds for object-oriented software metrics. J. Syst. Softw. 85(2), 244–257 (2012)
Article Google Scholar
Chrissis, M.B., Konrad, M., Shrum, S.: CMMI Guidlines for Process Integration and Product Improvement. Addison-Wesley Longman Publishing Co., Inc., USA (2003)
Google Scholar
Loon, H.V.: Process Assessment and ISO/IEC 15504, Springer US, (2007)
International Organization for Standardization-ISO/IEC 15939:2002," 23 4 2020. [Online]. Available: https://www.iso.org/standard/29572.html.
Committee, S.E.S.: IEEE Standard for a Software Quality Metrics Methodology IEEE Std 1061™-1998 (R2009), 3 Park Avenue, New York, NY 10016–5997. The Institute of Electrical and Electronics Engineers Inc, USA (1998)
Google Scholar
McCall, J., Richards, P., Walters, G.: Factors in software quality: volume i. concepts and definitions of software quality, General Electric CO Sunnyvale CA (1977a)
McCall, J., Richards, P., Walters, G.: Factors in software quality. Volume-III. preliminary handbook on software quality for an acquisiton manager, General Electric CO Sunnyvale CA (1977b)
Boehm, B., Brown, J., Lipow, M.: Quantitative evaluation of software quality. In Proceedings of the 2nd international conference on Software engineering, (1976)
Boehm, B., Brown, J., Kaspar, H., Lipow, M., MacLeod, G.: Characteristics of Software Quality. Elsevier, North Holland (1978)
MATH Google Scholar
Grady, R., Caswell, D.: Software metrics: establishing a company-wide program. Prentice-Hall Inc (1987)
Google Scholar
Software engineering - Product quality - Part 1: Quality model," ISO/IEC 9126-1:2001, 23 4 2020. [Online]. Available: https://www.iso.org/standard/22749.html. [Accessed 23 4 2020]
"ISO/IEC 25010:2011-Systems and software engineering Systems and software Quality Requirements and Evaluation (SQuaRE): System and software quality models," ISO, 23 4 2020. [Online]. Available: https://www.iso.org/standard/35733.html. Accessed 23 4 2020
Tajima, D., Matsubara, T.: Special feature the computer software industry in Japan. Computer 14(05), 89–96 (1981)
Article Google Scholar
Origin, A.: Method for qualification and selection of open source software (QSOS)," 23 4 2020. [Online]. Available: http://www.qsos.org.
Wasserman, A., Pal, M., Chan, C.: The business readiness rating model: an evaluation framework for open source. In Proceedings of the EFOSS Workshop, Como, Italy, Como (2006)
Alvaro, A., de Almeida, E. S., Meira, S. L.: A software component maturity model (SCMM). In 33rd EUROMICRO conference on software engineering and advanced applications (EUROMICRO2007), (2007)
Al-Qutaish, R.E., Abran, A.: A maturity model of software product quality. J. Res. Pract. Inf. Technol. 43(4), 307 (2011)
Google Scholar
Gilb, T., Finzi, S.: Principles of Software Engineering Management. Addison-wesley Reading (1988)
MATH Google Scholar
Gilb, T.: Software Metrics. Winthrop Inc, Cambridge (1977)
MATH Google Scholar
Alshayeb, M., Abdellatif, A. K. Zahran, S., Niazi, M.: Towards a framework for software product maturity measurement. In The Tenth international conference on software engineering advances (2015)
Kitchenham, B., Walker, J.: A quantitative approach to monitoring software development. Softw. Eng. J. 4(1), 2–14 (1989)
Article Google Scholar
Jones, C.: Software Engineering Best Practices. In: Chapter 9 Software quality: the key to successful software engineering, pp. 555–643. McGraw-Hill, New York (2010)
Google Scholar
Alves, T.L., Ypma, C., Visser, J.: Deriving metric thresholds from benchmark data. In 2010 IEEE international conference on software maintenance (2010)
Shatnawi, R., Li, W., Swain, J., Newman, T.: Finding software metrics threshold values using ROC curves. J. Softw. Maint. Evol. Res. Pract. 22, 1–16 (2010)
Article Google Scholar
Alqmase, M., Alshayeb, M., Ghouti, L.: Threshold extraction framework for software metrics. J. Comput. Sci. Technol. 34(5), 1063–1078 (2019)
Article Google Scholar
Veado, L., Vale, G. Fernandes, E. Figueiredo, E.: TDTool: threshold derivation tool. In Proceedings of the 20th international conference on evaluation and assessment in software engineering (2016)
Oliveira, P., Lima, F., Valente, M.T., Serebrenik, A.: RTTOOL-a tool for extracting relative thresholds for source code metrics. In 30th IEEE international conference on software maintenance and evolution tool Track, Victoria (2014)
Zhang, H.: An investigation of the relationships between lines of code and defects. In 2009 IEEE international conference on software maintenance (2009)
Lipow, M.: Number of faults per line of code. IEEE Trans. Software Eng. 4, 437–439 (1982)
Article Google Scholar
Yamashita, K., Huang, C., Nagappan, M., Kamei Y., Mockus, A. Hassan, A, Ubayashi, N.: Thresholds for size and complexity metrics: a case study from the perspective of defect density. In 2016 IEEE international conference on software quality, reliability and security (QRS), (2016)
Oliveira, P., Valente, M.T, Lima, F.P.: Extracting relative thresholds for source code metrics. In 2014 Software evolution week-IEEE Conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE) (2014)
Do Vale, G.A., Figueiredo, E.M.L.: A method to derive metric thresholds for software product lines. In 2015 29th Brazilian symposium on software engineering (2015)
Masramon, G.P., Muñoz, L.A.B.: Toward better feature weighting algorithms: a focus on Relief," arXiv preprint http://arxiv.org/abs/1509.03755 (2015)
Mori, T.: Information gain ratio as term weight: the case of summarization of ir results. In COLING 2002: The 19th international conference on computational linguistics (2002)
Kent, J.: Information gain and a general measure of correlation. Biometrika 70, 163–173 (1983)
Article MathSciNet MATH Google Scholar
Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson Correlation Coefficient. In: Noise reduction in speech processing, pp. 1–4. Springer (2009)
Google Scholar
Steuer, R., Kurths, J., Daub, C., Weise, J., Selbig, J.: The mutual information: detecting and evaluating dependencies between variables. Bioinformatics 18, S231–S240 (2002)
Article Google Scholar
Weka, G.: Weka documentation," machine learning group at the University of Waikato, [Online]. Available: http://infochim.u-strasbg.fr/cgi-bin/weka-3-9-1/doc/weka/attributeSelection/CorrelationAttributeEval.html. Accessed 23 March 2020
Mountassir, A., Benbrahim, H., Berrada, I.: An empirical study to address the problem of unbalanced data sets in sentiment classification. In IEEE international conference on systems, man, and cybernetics (SMC), (2012)
Han, J., Pei, J., Kamber, M.: Data mining: concepts and techniques. Elsevier (2011)
MATH Google Scholar
Chmielewski, M., Grzymala-Busse, J.: Global discretization of continuous attributes as preprocessing for machine learning. In Third international workshop on rough sets and soft computing, (1994)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6(4), 393–423 (2002)
Article MathSciNet Google Scholar
Hall, M.A.: Thesis-Correlation-based Feature Selection for Machine Learning, Hamilton. The University of Waikato, Department of Computer Science, NewZealand (1999)
Google Scholar
He, P., Li, B., Liu, X., Chen, J., Ma, Y.: An empirical study on software defect prediction with a simplified metric set. Inf. Softw. Technol. 59, 170–190 (2014)
Article Google Scholar
Spinellis, D.: "Tool writing: a forgotten art? IEEE Softw. 22(4), 9–11 (2005)
Article Google Scholar
Lincke, R., Lundberg, J., Löwe, W.: Comparing software metrics tools. In: Proceedings of the 2008 international symposium on software testing and analysis, New York, NY, USA, (2008)
Jureczko, M., Madeyski, L., Spinellis, D.: PROMISE repository: software defect prediction dataset. (2010a). [Online]. Available: http://openscience.us/repo/. Accessed 2018
Jureczko, M., Madeyski, L.: Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th international conference on predictive models in software engineering, Timiundefinedoara, Romania, (2010)
Jureczko, M., Spinellis, D.: Using object-oriented design metrics to predict software defects. In Models and methods of system dependability, Oficyna Wydawnicza Politechniki Wroclawskiej, (2010b)
Zou, K., Tuncali, K., Silverman, S.: Correlation and simple linear regression. Radiology 227(3), 617–662 (2003)
Article Google Scholar
Bluman, A.G.: Probability and Counting Rules. In: Elementary statistics: a step by step approach, pp. 185–255. McGraw-Hill, New York (2009)
Google Scholar
Kira, K., Rendell, L. A.: A practical approach to feature selection. In: Ninth international workshop on machine learning (1992).
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning (1994).
Jureczko, M., Madeyski, L., Spinellis, D.: Software defect prediction dataset (2010b). [Online]. Available: http://purl.org/MarianJureczko/MetricsRepo. Accessed 27 2 2021

Download references

Acknowledgements

M. Alqmase and M. Alshayeb would like to acknowledge the support provided by the Deanship of Scientific Research at King Fahd University of Petroleum and Minerals. L. Ghouti acknowledges the support of Prince Sultan University.

Author information

Authors and Affiliations

Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Mohammed Alqmase & Mohammad Alshayeb
Interdisciplinary Research Center for Intelligent Secure Systems, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Mohammad Alshayeb
Prince Sultan University, Riyadh, Saudi Arabia
Lahouari Ghouti

Authors

Mohammed Alqmase
View author publications
You can also search for this author inPubMed Google Scholar
Mohammad Alshayeb
View author publications
You can also search for this author inPubMed Google Scholar
Lahouari Ghouti
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohammed Alqmase.

Ethics declarations

Conflict of interest

The authors declare that they do not have any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Appendix 1

There are three basic interpretations of probability: classical probability, empirical probability, and subjective probability (Bluman 2009).

In the classical probability formula, the probability of any event E is

$$ \,p\left( E \right) = \frac{{Numner{ }\,of\,outcomes{ }\,in\,E}}{Total\,numner\,of\,outcomes\,in\,the\,sample\,space} $$

(13)

This probability uses the sample space (S). It assumes that all outcomes in (S) are equally likely to occur.

In the empirical probability formula, given frequency distribution, the probability of an event being in a given class is

$$ p\left( E \right) = \frac{{frequency{ }\,of\,the\,class}}{{total\,frequencies\,in\,the{ }\,distribution}} $$

(14)

This probability is based on observation. The difference between classical and empirical probability is that empirical probability relies on actual experience to determine the likelihood of outcomes (Bluman 2009).

In this work, we proposed a generalized version of the empirical probability formula to adapt the weights extracted by black-box models (e.g., AI models). These weights provide abstract semantics of the impact of many factors. Instead of considering many factors while building the probabilistic models, we tried to abstract them inside the weights, and then the proposed formula will be applied to those weights to build a probabilistic model. This idea is useful to abstract the complexity and allows building something above something. This will help model a very complex concept. The proposed formula can be defined as follows: Given a weighted sum distribution, the probability of an event being in a given class is

$$ p\left( E \right) = \frac{{the{ }\,weighted\,sum\,of\,{ }the\,class}}{{the\,total{ }\,weighted\,sum\,in\,the\,distribution{ }}} $$

(15)

This probability is also empirical probability and is based on weights identified by observers. Observers usually are AI models that analyze huge amounts of data to study many relations and abstract the impact of many factors as weights. The following examples will give an interpretation of this formula.

Example 1

Given a software project with three entities E = {e₁, e₂, e₃} described in Fig. 7

The metric m₁ measures the three entities as (good, good, bad).
The metric m₂ measures the three entities as (good, bad, bad).
Table
Table 14 Frequency distribution (Color table online)
Full size table
14 shows the frequency distribution.

1.1.1 What is the probability that the given software project is in a good quality level?

This simply can be solved by the frequency formula presented in Eq. (14) as follows:

The next examples show cases where the weights are involved, and the proposed formula in Eq. (15) is applied.

Example 2

Given a software project with three entities E = {e₁, e₂, e₃} described in Fig.

8.

1.
The metric m₁ measures the three entities as (good, good, bad).
2.
The metric m₂ measures the three entities as (good, bad, bad).
3.
Considering that the impact of m₂ to identify the quality of the project is twice m₁ where the impact is measured by a black-box model, which gives m₁ a weight (0.2) and m₂ a weight (0.4). The weighted sum distribution and the frequency distribution are displayed in Table
Table 15 Weighted and frequency distribution (Color table online)
Full size table
15.

1.1.2 What is the probability that the given software project is in a good quality level?

As mentioned above, the impact of metric m₂ to identify the quality of the project is twice m₁. This impact is given by these weights m₁ = 0.2 and m₂ = 0.4. While there is a new factor which is the impact of each software metric on identifying the quality of the software, then the proposed formula presented in Eq. (15) is applied to compute the probability that the given software project is of a good quality level.

The following page provides an interpretation of the proposed formula to calculate the probability.

The above example can also be modeled using multiplication rules, addition rules, and conditional probability. While the impact of metric m₂ to identify the quality of the project is twice m₁, then this simply means ${\varvec{p}}({\varvec{m}}_{1} ) = \frac{1}{3},{\text{ and }}{\varvec{p}}({\varvec{m}}_{2} ) = \frac{2}{3}$.

With the use of a tree diagram, the sample space can be determined, as shown in Fig.

9. First, assign probabilities to each branch. Next, using the multiplication rule, multiply the probabilities for each branch. Finally, use the addition rule since a good quality level can be identified by metrics m₁ or m₂.

$$ {\varvec{p}}\left( {{\varvec{good}}} \right) = p\left( {{\varvec{m}}_{1} } \right) \cdot p\left( {{\varvec{good}}|{\varvec{m}}_{1} } \right) + p\left( {{\varvec{m}}_{2} } \right) \cdot p\left( {{\varvec{good}}|{\varvec{m}}_{2} } \right) $$

$$ {\varvec{p}}\left( {{\varvec{good}}} \right) = \frac{2}{9} + { }\frac{2}{9} = \frac{4}{9} $$

This shows the same result with comparing to the proposed formula, which provides interpretation. In the next page, we will show another example that helps understand the idea. We will model our problem as a box-color ball problem.

Example 3

In this example, the problem described in example-2 is modeled here as a box-color ball problem. In Fig.

10, the box m₁ contains small balls with 1 red ball and two green balls. Box m₂ contains big balls with two red balls and one green ball. The preference of selecting a box with a big ball is twice of selecting a box with small balls. Assuming that the preference is measured by a black-box model, which gives m₁ a weight (0.2) and m₂ a weight (0.4). This simply means ${\varvec{p}}({\varvec{m}}_{1} ) = \frac{1}{3},{\text{ and }}{\varvec{p}}({\varvec{m}}_{2} ) = \frac{2}{3}$.

1.1.3 Find the probability of selecting a green ball?

With the use of a tree diagram, the sample space can be determined, as shown in Fig.

11. First, assign probabilities to each branch. Next, using the multiplication rule, multiply the probabilities for each branch. Finally, use the addition rule since a green ball can be obtained from the box m₁ or box m₂.

$$ {\varvec{p}}\left( {{\varvec{green}}} \right) = p\left( {{\varvec{m}}_{1} } \right) \cdot p\left( {{\varvec{green}}|{\varvec{m}}_{1} } \right) + p\left( {{\varvec{m}}_{2} } \right) \cdot p\left( {{\varvec{green}}|{\varvec{m}}_{2} } \right) $$

$$ {\varvec{p}}\left( {{\varvec{green}}} \right) = \frac{2}{9} + { }\frac{2}{9} = \frac{4}{9} $$

Example 4

Given a software project with three entities E = {e₁, e₂, e₃} described in Fig.

12.

1.
The metric m₁ measures the three entities as (good, good, bad).
2.
The metric m₂ measures measures the three entities as (good, good, bad).
3.
The metric m₃ measures the three entities as (good, bad, bad).
4.
Considering that the impact of m₂ to identify the quality of the project is twice m₁, and m₃ is twice m₂ where the impact is measured by a black-box model, which gives m₁ a weight (1.1), m₂ a weight (2.2) and m₃ a weight (4.4). The weighted sum distribution and the frequency distribution are described in Table
Table 16 Weighted and frequency distribution (Color table online)
Full size table
16.

1.1.4 What is the probability that the given software project is in a good quality level?

We can apply the proposed formula in Eq. (15) as follows:

The following page provides an interpretation of how this probability can be calculated using multiplication rules, addition rules, and conditional probability.

The above example can also be modeled as follows:

While the impact of metric m₂ to identify the quality of the project is twice m₁, and m₃ is twice m₂, then this simply means ${\varvec{p}}({\varvec{m}}_{1} ) = \frac{1}{7}$, ${\varvec{p}}({\varvec{m}}_{2} ) = \frac{2}{7}$, and ${\varvec{p}}({\varvec{m}}_{3} ) = \frac{4}{7}$.

With the use of a tree diagram, the sample space can be determined, as shown in Fig.

13. First, assign probabilities to each branch. Next, using the multiplication rule, multiply the probabilities for each branch. Finally, use the addition rule since a good quality level can be identified by metrics m₁, m₂ or m₃.

$$ {\varvec{p}}\left( {{\varvec{good}}} \right) = p\left( {{\varvec{m}}_{1} } \right) \cdot p\left( {{\varvec{good}}|{\varvec{m}}_{1} } \right) + p\left( {{\varvec{m}}_{2} } \right) \cdot p\left( {{\varvec{good}}|{\varvec{m}}_{2} } \right) + p\left( {{\varvec{m}}_{3} } \right) \cdot p\left( {{\varvec{good}}|{\varvec{m}}_{3} } \right) $$

$$ {\varvec{p}}\left( {{\varvec{good}}} \right) = \frac{2}{21} + { }\frac{4}{21}{ } + { }\frac{4}{21} = \frac{10}{{21}} = \frac{11}{{23.1}} = 0.48 $$

This shows the same result with comparing to the proposed formula, which provides interpretation.

Example 5

Given a software project with three entities E = {e₁, e₂, e₃} described in Fig.

14

1.
The metric m₁ measures the three entities as (bad, good, good).
2.
The metric m₂₁ measures the three entities as (bad, good, good).
3.
The metric m₃ measures the three entities as (bad, good, good).
4.
Considering that the impact of each quality level in each metric is different. Assume that this impact is measured by a black-box model, which gives weights described in Table
Table 17 Weights identified by black-box model
Full size table
17. The weighted sum distribution and the frequency distribution are described in Table
Table 18 Weighted and frequency distribution (Color table online)
Full size table
18.

1.1.5 What is the probability that the given software project is in a good quality level?

We can apply the proposed formula in Eq. (15) as follows:

The following page provides an interpretation of how this probability can be calculated by modeling it as a Box-color ball problem.

The above example can also be modeled as a box-color ball problem. In Fig.

15, box1 contains small balls with two red balls and four green balls. Box2 contains big balls with one red ball and two green balls. Assuming that the preference of selecting a box with a big/small ball is measured by a black-box model, which gives box1 a weight (2.2) and box2 a weight (2.2). This simply means ${\mathbf{p}}\left( {{\mathbf{box}}1} \right) = \frac{0.6}{{2.8}},{\text{ and }}{\mathbf{p}}\left( {{\mathbf{box}}2} \right) = \frac{2.2}{{2.8}}$.

1.1.6 Find the probability of selecting a green ball?

With the use of a tree diagram, the sample space can be determined, as shown in Fig.

16. First, assign probabilities to each branch. Next, using the multiplication rule, multiply the probabilities for each branch. Finally, use the addition rule since a green ball can be obtained from box1 or box2.

$$ {\varvec{p}}\left( {{\varvec{green}}} \right) = p\left( {box1} \right) \cdot p{\text{(green|}}box1) + p\left( {box2} \right) \cdot p{\text{(green|}}box2) $$

$$ {\varvec{p}}\left( {{\varvec{green}}} \right) = \frac{0.6}{{2.8}} \cdot \frac{2}{3} + \frac{2.2}{{2.8}} \cdot \frac{2}{3} = \frac{2}{3} = 0.67 $$

1.2 Appendix 2

When we deal with feature evaluators, we have two categories which are feature weighting techniques and feature selection techniques.

For feature weighting schema, the concept is to give weight to the given features based on two factors which are relevance and/or redundancy. Relevance is about measuring how much the given feature is relevant to the class (target attribute) by examining the correlation between the given feature and the class. Redundancy is about measuring the redundant features to predict the class by examining the correlations among features. A feature is said to be redundant if one or more of the other features are highly correlated with it (Hall 1999).

For feature selection techniques, the concept is about selecting a subset of the given features. The selection can be achieved with the help of a feature weighting schema. For feature selection techniques, there are some techniques that can be used to remove irrelevant features, some techniques to remove redundant features, and some techniques to remove both irrelevant and redundant features. The purpose of feature selection techniques is to improve performance by removing redundant or/and irrelevant features (Hall 1999). In feature selection techniques, a good subset of features is the one that contains features highly correlated with the class and less correlated with each other (Hall 1999).

While feature weighting schema provides techniques to assign a weight to each feature, the feature selection techniques use these weights along with some selection techniques to select the most useful subset of features.

For example, the correlation-based feature selection technique (CFS) that is proposed by Mark Hall (Hall 1999), uses a feature weighting algorithm called RELIEF. CFS is based on the feature subset evaluation function which is given by Eq. (16). CFS is a simple filter algorithm that ranks feature subsets according to a correlation calculated by RELIEF. CFS is, in fact, a Pearson’s correlation coefficient (Hall 1999).

$$ Ms = \frac{{kr_{cf} }}{{\sqrt {k + k\left( {k - 1} \right)r_{ff} } }} $$

(16)

where $Ms$ is the heuristic “merit” of a feature subset s containing $k$ features, $rcf$ is the mean feature-class correlation $\left( {f{ } \in { }S} \right)$, and $rff$ is the average feature-feature intercorrelation. The numerator of Eq. (16) can be thought of as providing an indication of how much a set of features can predict the class; the denominator of how much redundancy there is among the features.

According to the following example that is extracted from Hall’s thesis (Hall 1999), Table

Table 19 Feature correlations calculated from the "Golf" dataset using Relief. This table is extracted from Hall’s thesis (Hall 1999)

Full size table

19 shows the feature’s weights that are calculated by RELIEF, between the given feature and the other features and between the given feature and the class. Table

Table 20 A forward selection search using the correlations in Table 2 . This table is extracted from Hall’s thesis (Hall 1999)

Full size table

20 shows how these weights are utilized by CFS to select the best subset of features.

The search starts with an empty set of features. By applying the evaluation function, subsets in bold show a local improvement with respect to the previous best set.

Relief is a feature weight schema that can be utilized for feature selection problems. Relief is based on a statistical method to calculate the weight of the given feature (Kira and Rendell 1992). It is an algorithm that is inspired by instance-based learning. Relief can detect features that are statistically relevant to the target attribute. It considers nominal features (including Boolean) or numerical features (integer or real). The first version of this algorithm is based on a two-class classification problem where the target attribute (class or concept) has only two values (Kira and Rendell 1992). However, the relief is extended by Kononenko (Kononenko 1994) to be applicable for multi-class classification problems. According to the Relief algorithm (Kira and Rendell 1992), the training dataset (S) is separated into two sets $\left\{ {S + ,{ }S - } \right\}$, instances with positive class value and instances with negative class value (assuming the class attribute has two values, positive and negative). Relief picks a sample with $M$ instances triplets of an instance $X$. The instance $X$. is represented by a vector that is composed of $p$ feature values $\left\{ {x1,{ }x2,{ } \ldots ,{ }xp} \right\}$. where the given feature set is $\left\{ {f1,{ }f2,{ } \ldots ,{ }fp} \right\}$. For each instance $X$, the algorithm picks one of the positive instances $\left\{ {S + } \right\}$. that is closest to $X$., and picks one of the negative instances s $\left\{ {S - } \right\}{ }$. that is closest to $X$.. Relief uses the p-dimensional Euclid diance for selecting the closest instances. Then, it determines which one is a near-hit and which one is a near-miss to the instance $X$. Near-hit is the closest instance to $X$. with same target value, and the Ne-miss is the closest instance to $X$ with different class target value. After that, Relief calls a routine to update the feature weight vector $W$ for every sample triplet and determines the average feature weight vector relevance (of all thfeates to the target concept). $W$. is initialized with zeros: $(0,{ }0,{ } \ldots ,{ }$ a updad using the following formula:

$$ For{ }\,i \,=\,1{ }\,to{ }p{ } $$

$W_{i} = W_{i} { } - diff\left( {x_{i} ,\,near\,Hit_{i} } \right)^{2} { }\, + \,diff\left( {x_{i} ,{ }\,near\,Miss_{i} } \right)^{2} { }$where $diff$. is a function that can be described for nominal feature values as follows:

$$ Diff\left( {x,y} \right) = \left\{ {\begin{array}{*{20}l} 0 \hfill & {if\, x\, and \,y \,are \,the \,same} \hfill \\ {} \hfill & \cdot \hfill \\ 1 \hfill & {if\, x \,and\, y \,are \,different} \hfill \\ \end{array} } \right. $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alqmase, M., Alshayeb, M. & Ghouti, L. Quality assessment framework to rank software projects. Autom Softw Eng 29, 41 (2022). https://doi.org/10.1007/s10515-022-00342-0

Download citation

Received: 15 June 2021
Accepted: 13 April 2022
Published: 18 May 2022
DOI: https://doi.org/10.1007/s10515-022-00342-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality assessment framework to rank software projects

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

How Good Is My Project? Experiences from Projecting Software Quality Using a Reference Set

Software quality assessment model: a systematic mapping study

A projection-based approach to software quality evaluation from the users’ perspectives

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 Appendix 1

Example 1

1.1.1 What is the probability that the given software project is in a good quality level?

Example 2

1.1.2 What is the probability that the given software project is in a good quality level?

Example 3

1.1.3 Find the probability of selecting a green ball?

Example 4

1.1.4 What is the probability that the given software project is in a good quality level?

Example 5

1.1.5 What is the probability that the given software project is in a good quality level?

1.1.6 Find the probability of selecting a green ball?

1.2 Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now