Abstract
The basic contribution of this paper is the presentation of two methods that can be used to design a practical software change classification system based on data mining methods from rough set theory. These methods incorporate recent advances in rough set theory related to coping with the uncertainty in making change decisions either during software development or during post-deployment of a software system. Two well-known software engineering data sets have been used as means of benchmarking the proposed classification methods, and also to facilitate comparison with other published studies on the same data sets. Two technologies in computation intelligence (CI) are used in the design of the software change classification systems described in this paper, namely, rough sets (a granular computing technology) and genetic algorithms. Using 10-fold cross validated paired t-test, this paper also compares the rough set classification learning method with the Waikato Environment for Knowledge Analysis (WEKA) classification learning method. The contribution of this paper is the presentation of two models for software change classification based on two CI technologies.
Similar content being viewed by others
References
Basili, V. and Perricone, B.T. 1984. Software errors and complexity: An empirical investigation, IEEE Transactions on Software Engineering 10(6): 728–738.
Bazan, J.G. 1998. A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables, In (Polkowski and Skowron, 1998a), pp. 321–365.
Bazan, J.G. 2000. RSES and RSESlib-A collection of tools for rough set computations, In W. Ziarko and Y. Yao (eds.), Rough Sets and Current Trends in Computing, Lecture Notes in Artificial Intelligence, Vol. 2005, Springer-Verlag, Berlin, pp. 106–113.
Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., and Wroblewski, J. 2000. Rough set algorithms in classification problem, In L. Polkowski, S. Tsumoto, and T.Y. Lin (eds.), Rough Set Methods and Applications, Physica-Verlag, New York, pp. 49–88.
Bazan, J.G., Szczuka, M.S., and Wroblewski, J. 2002. A new version of the rough set exploration system, In J.J.Alpigini, J.F. Peters, A. Skowron, and N. Zhong (eds.), Rough Sets and Current Trends in Computing, Lecture Notes in Artificial Intelligence, Vol. 2475, Springer-Verlag, Berlin, pp. 397–404.
Belady, L.A. 1979. On software complexity, In Proceedings of the Workshop on Quantitative Software Models for Reliability, IEEE No. TH0067-9, New York, pp. 90–94.
Belady, L.A. and Evangelisti, C.J. 1980. A graphic representation of structured programs, IBM Systems Journal 19(4): 542–553.
Belady, L.A. and Evangelisti, C.J. 1981. System partitioning and its measure, The Journal of Systems and Software 2: 23–29.
Beyer, W.H. 1968. Handbook of Tables for Probability and Statistics, CRC Press, Ohio.
Briand, L.C., Basili, V.R., and Thomas,W.M. 1992. A pattern recognition approach for software engineering data analysis, IEEE Transactions on Software Engineering 18(11): 931–942.
Cusumano, M.A. 1991. Japan's Software Factories, Oxford University Press, Oxford.
Dietterich, T.G. 1998. Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation 10(7): 1895–1924.
Fenton, N.E. and Kaposi, A.A. 1987. Metrics and software structure, Journal of Information and Software Technology 29: 301–320.
Fenton, N.E. and Pleeger, S.L. 1997. Software Metrics: A Rigorous & Practical Approach, PWS Publishing Company, Boston, MA.
Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA.
Gryzmala-Busse, J.W. 1992. LERS-A system for learning from examples based on rough sets, In (Slowinski, 1992), pp. 3–18.
Gryzmala-Busse, J.W. 1998. LERS: A knowledge discovery system, In (Polkowski and Skowron, 1998b), pp. 562–565.
Halstead, M.H. 1977. Elements of Software Science, Elsevier, New York.
Hogg, R.V. and Tanis, E.A. 1977. Probability and Statistical Inference. Macmillan Publishing Co., Inc., New York.
Hussein, A. and Dietterich, T.G. 1992. Efficient algorithms for identifying relevant features, In Proc. of the 9th Canadian Conf. on Artificial Intelligence, Vancouver, BC, pp. 38–45.
Ichino, M. and Sklansky, J. 1984. Optimal feature selection by zero-one programming, IEEE Trans. Sys. Man & Cyb. SMC-14(5): 737–746.
Jensen, H.A. and Vairavan, K. 1985. An experimental study of software metrics for real-time software, IEEE Transactions on Software Engineering 11(2): 231–234.
Johnson, D.S. 1974. Approximation algorithms for combinatorial problems, Journal of Computer and System Sciences 9: 256–278.
Khoshgoftaar, T.M. and Allen, E.B. 1994. Predicting software quality during testing using neural network models: A comparative study, Int. J. of Reliability, Quality and Safety Engineering 1(3): 303–319.
Khoshgoftaar, T.M. and Allen, E.B. 1998. Neural networks for software quality prediction, In W. Pedrycz and J.F. Peters (eds.), Computational Intelligence in Software Engineering, World Scientific, Singapore, pp. 33–63.
Khoshgoftaar, T.M. and Munson, J.C. 1990. The lines of code metric as a predictor of program faults: A critical analysis, In Proceedings of Computer Software and Applications Conference (COMPSAC), pp. 408–413.
Khoshgoftaar, T.M., Munson, J.C., Bhattacharya, B.B., and Richardson, G.D. 1992. Predictive modeling techniques of software quality from software measures, IEEE Trans. on Software Engineering 18(11): 979–986.
Khoshgoftaar, T.M., Szabo, R.M., and Woodcock, T.G. 1994. An empirical study of program quality during testing and maintenance, Software Quality Journal 3: 137–151.
Kitchenham, B. and Pickard, L. 1987. Towards a constructive quality model-Part II: Statistical techniques for modeling software quality in the ESPRIT REQUEST project, Software Engineering Journal 2(4): 114–126.
Komorowski, J., Pawlak, Z., Polkowski, L., and Skowron, A. 1999. Rough sets: A tutorial, In S.K. Pal and A. Skowron (eds.), Rough Fuzzy Hybridization: A New Trend in Decision-Making, Springer-Verlag, Berlin, pp. 3–98.
Koza, J.R. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection, The MIT Press, Cambridge, MA.
Lind, R.K. and Vairavan, K. 1989. An experimental investigation of software metrics and their relationships to software development effort, IEEE Trans. on Software Engineering 15(5): 649–653.
Mayer, A. and Sykes, A.M. 1992. Statistical methods for the analysis of software metrics data, Software Quality Journal 1: 209–223.
McCabe, T. 1976. A complexity measure, IEEE Trans. on Software Engineering 2(4): 308–320.
Mitchell, T.M. 1997. Machine Learning, McGraw-Hill, New York.
Modrzejewski, M. 1993. Feature selection using rough set theory, In Proceedings of the ECML, pp. 213–226.
Munson, J.C. and Khoshgoftaar, J.C. 1990. Regression modeling of software quality: Empirical investigation, Information and Software Technology 32(2): 106–114.
Nguyen, H.S. and Nguyen, S.H. 1998a. Discretization methods in data mining, In (Polkowski and Skowron, 1998a), pp. 451–482.
Nguyen, S.H. and Nguyen, H.S. 1998b. Pattern extraction from data, Fundamenta Informaticae 34: 1–16.
Ohrn, A. 1999. Discernibility and Rough Sets in Medicine: Tools and Applications, Ph.D. Thesis, Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.
Pagallo, G. and Haussler, D. 1990. Boolean feature discovery in empirical learning, Machine Learning 5(1): 71–100.
Pao, Y.-H. and Bozma, I. 1986. Quantization of numerical sensor data for inductive learning, In J.S. Kowalik (ed.), Coupling Symbolic and Numeric Computing in Expert Systems, Elsevier Science, Amsterdam, pp. 69–81.
Pawlak, Z. (1991). Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Boston, MA.
Pawlak, Z., Peters, J.F., Skowron, A., Suraj, Z., Ramanna, S., and Borkowski, M. 2001. Rough measures: Theory and Applications, In S. Hirano, M. Inuiguchi, and S. Tsumoto (eds.), Rough Set Theory and Granular Computing Bulletin of the International Rough Set Society, Vol. 5, No. 1/2, pp. 177–184.
Pawlak, Z., Peters, J.F., Skowron, A., Suraj, Z., Ramanna, S., and Borkowski, M. 2002. Rough measures, rough integrals, and sensor fusion, In S. Hirano, M. Inuiguchi, and S. Tsumoto (eds.), Rough Sets and Granular Computing, Physica-Verlag, Berlin.
Pawlak, Z. and Skowron, A. 1994. Rough membership functions, In R. Yager, M. Fedrizzi, and J. Kacprzyk (eds.), Advances in the Dempster-Shafer Theory of Evidence, John Wiley & Sons, New York, pp. 251–271.
Pedrycz, W., Han, L., Peters, J.F., Ramanna, S., and Zhai, R. 2001. Calibration of software quality: Fuzzy neural and rough neural approaches, Neurocomputing 36: 149–170.
Pedrycz, W. and Peters, J.F. 1997. Computational intelligence in software engineering, In Proceedings of the Canadian Conf. on Electrical & Computer Engineering, pp. 253–257.
Pedrycz, W. and Peters, J.F. 1998. Computational Intelligence in Software Engineering, World Scientific, Singapore.
Peters, J.F., Han, L., and Ramanna, S. 2000. The Choquet integral in a rough software cost estimation system, In M. Grabisch, T. Murofushi, and M. Sugeno (eds.), Fuzzy Measures and Integrals: Theory and Applications, Springer-Verlag, Heidelberg, Germany, pp. 392–414.
Peters, J.F. and Pedrycz, W. 1999. Computational Intelligence, In J.G. Webster (ed.),Encyclopedia of Electrical and Electronic Engineering, 22 vols, John Wiley & Sons, Inc., New York.
Peters, J.F. and Ramanna, S. 1999. A rough sets approach to assessing software quality: Concepts and rough Petri net models, In S.K. Pal and A. Skowron (eds.), Rough-Fuzzy Hybridization: New Trends in Decision Making, Springer-Verlag, Berlin, pp. 349–380.
Peters, J.F. and Skowron, A. 2002. A rough set approach to knowledge discovery, International Journal of Intelligent Systems 17(2): 109–112.
Peters, J.F., Skowron, A., Suraj, Z., Pedrycz, W., Pizzi, N. and Ramanna, S. 2003. Classification of meteorological volumetric radar data using rough set methods, Pattern Recognition Letters 24(6): 911–920.
Polkowski, L. and Skowron, A., eds. 1998a. Rough Sets in Knowledge Discovery, Vol. 1, Physica-Verlag, Berlin.
Polkowski, L. and Skowron, A., eds. 1998b. Rough Sets in Knowledge Discovery, Vol. 2, Physica-Verlag, Berlin.
Polkowski, L. and Skowron, A., eds. 1998c. Rough Sets and Current Trends in Computing, Lecture Notes in Artificial Intelligence, Vol. 1424, Springer-Verlag, Berlin.
Quinlan, J.R. 1986. Induction of decision trees, Machine Learning 1(1): 81–106.
Sal, J., Lehman, A., and Creighton, L. 2001. JMP Start Statistics: A Guide to Statistics and Data Analysis, Statistical Analysis Systems (SAS) Institute, Duxbury, Pacific Grove, CA.
Skowron, A. and Rauszer, C. 1992. The discernibility matrices and functions in information systems, In (Slowinski, 1992), pp. 331–362.
Skowron, A. and Polkowski, L. 1997. Synthesis of decision systems from data tables, In T.Y. Lin and N. Cercone (eds.), Rough Sets and Data Mining: Analysis for Imprecise Data, Kluwer Academic Publishers, Boston, pp. 259–300.
Skowron, A., Stepaniuk, J., and Peters, J.F. 2001. Extracting patterns using information granules, In S. Hirano, M. Inuiguchi, and S. Tsumoto (eds.), Proc. of Int. Workshop on Rough Set Theory and Granular Computing (RSTGC'01), Matsue, Shimane, pp. 135–142.
Skowron, A., Stepaniuk, J., and Peters, J.F. 2002. Towards discovery of relevant patterns from parameterized schemes of information granule construction, In S. Hirano, M. Inuiguchi, and S. Tsumoto (eds.), Rough Sets and Granular Computing, Physica-Verlag, Berlin.
Skowron, A. and Swiniarski, R.W. 2002. Information granulation and pattern recognition, In S. Pal, L. Polkowski, and A. Skowron (eds.), Rough-Neuro Computing, Physica-Verlag, Berlin, pp. 636–670.
Slowinski, R., ed. 1992. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht.
Stepaniuk, J. 1998. Approximation spaces, reducts and representatives, In (Polkowski and Skowron, 1998b), pp. 295–306.
Rosetta. 1999. http://www.idi.ntnu.no/~aleks/rosetta/
RSES. 2002. http://logic.mimuw.edu.pl/~rses/
Tanaka, H. and Maeda, Y. 1998. Reduction methods for medical data, In (Polkowski and Skowron, 1998b), pp. 295–306.
WEKA. 2002. http://www.cs.waikato.ac.nz/ml/weka
Witten, I.H. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kauffman Publishers, San Francisco.
Wroblewski, J. 1995. Finding minimal reducts using genetic algorithms, In Proc. of the 2nd Annual Joint Conf. on Information Sciences, Wrightsville Beach, NC, pp. 186–189.
Wroblewski, J. 1998a. Genetic algorithms in decomposition and classification problem, In (Polkowski and Skowron, 1998a), pp. 471–487.
Wroblewski, J. 1998b. Covering with reducts-A fast algorithm for rule generation, In (Polkowski and Skowron, 1998c), pp. 402–407.
Zuse, H. 1990. Software Complexity: Measures and Methods, W. deGruyter, New York.
Zuse, H. 1998. A Framework for Software Measurement, W. deGruyter, New York.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Peters, J.F., Ramanna, S. Towards a Software Change Classification System: A Rough Set Approach. Software Quality Journal 11, 121–147 (2003). https://doi.org/10.1023/A:1023764510838
Issue Date:
DOI: https://doi.org/10.1023/A:1023764510838