skip to main content
research-article

Application of K-Medoids with Kd-Tree for Software Fault Prediction

Published:05 May 2011Publication History
Skip Abstract Section

Abstract

Software fault prediction area is subject to problems like non availability of fault data which makes the application of supervised techniques difficult. In such cases unsupervised approaches like clustering are helpful. In this paper, K-Medoids clustering approach has been applied for software fault prediction. To overcome the inherent computational complexity of KMedoids algorithm a data structure called Kd-Tree has been used to identify data agents in the datasets. Partitioning Around Medoids is applied on these data agents and this results in a set of medoids. All the remaining data points are assigned to the nearest medoids thus obtained to get the final clusters. Software fault prediction error analysis results show that our approach outperforms all unsupervised approaches in the case of one given real dataset and gives best values for the evaluation parameters. For other real datasets, our results are comparable to other techniques. Performance evaluation of our technique with other techniques has been done. Results show that our technique reduces the total number of distance calculations drastically since the number of data agents is much less than the number of data points.

References

  1. R. T. Ng and J. Han (1994): Efficient and Effective Methods for spatial Data Mining, In proceedings of the 20th VLDB Conference Santiago, Chile, pp. 144--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Kaufman and P. J. Rousseeuw (1990): Finding Groups in Data: An introduction to Cluster Analysis, John Wiley and Sons.Google ScholarGoogle Scholar
  3. P. S. Bishnu and V. Bhattacherjee (2009): A New Approach to K-Medoids Algorithm, In proc. Of HPCA 2009, BHU, Varanasi, pp. 9--11.Google ScholarGoogle Scholar
  4. V. Bhattacherjee, P. K. Mahanti, and S. Kumar (2009): Complexity Metrics for Analogy Based Effort Estimation, Journal of Theoretical and applied Information Technology, vol. 6, no. 1, pp. 001--008.Google ScholarGoogle Scholar
  5. S. Vicinanza, M. J. Prietulla and T. Mukhopadhyay (1990): Case Based Reasoning in Software Effort Estimation, In Proc. of 11th Int. Conf. on Information Systems. pp. 149--158.Google ScholarGoogle Scholar
  6. S. Kumar, V. Bhattacherjee and J. Pal (2011): Software Effort Prediction -- A Fuzzy Logic Approach, ACCT 2011, Rohtak India (Accepted).Google ScholarGoogle Scholar
  7. S. Zhong, T. M. Khoshgoftaar, and N. Seliya, (2004): Unsupervised Learning for Expert-based Software Quality Estimation. In Proc. of the 8th Int. Symp. on High Assurance Systems Engineering, Tampa, FL, pp. 149--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Zhong, T. M. Khoshgoftaar, and N. Seliya (2004): Analyzing Software Measurement Data with Clustering Techniques. IEEE Intelligent Systems, vol. 19, no. 2, pp. 20--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Seliya and T. M. Khoshgoftaar (2002): Software Quality Classification Modeling Using the PRINT Decision algorithm. In Proc. of the 4th Int. Conf. on Tools with AI, Washington DC, pp. 365--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Seliya, T. M. Khoshgoftaar and S. Zhong (2005): Analyzing Software Quality with Limited Fault- Proneness Defect Data, Proc. 9th IEEE International Symposium on High-Assurance Systems Engineering, pp. 89--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Catal, U. Sevim, and B. Diri, (2009): Clustering and Metrics Threshold Based Software Fault Prediction of Unlabeled Program Modules. In Proc. 6th International IEEE conference on Information Technology: New generations, pp. 199--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Bhattacherjee and P.S. Bishnu (2010): Unsupervised Learning Approach to Fault Prediction in Software Module, Proc. of National Conference on Computing and Systems 2010, Burdwan, India, pp. 101--108.Google ScholarGoogle Scholar
  13. K. E. Emam, S. Benlarbi, and N. Goel (2001): Comparing Case Based Reasoning Classifiers for Predicting High Risk Software Component, Journal of Systems and software, vol. 55, no. 3, pp. 301--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Han and M. Kamber (2007): Data Mining Concepts and Techniques, 2nd edition, Morgan Kaufmann Publishers, pp. 404--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. C. N. Barioni, H. L. Razente, A. J. M. Traina and C. Traina Jr. (2006): An efficient approach to scale up K-Medoids based algorithms in large databases, In Brazilian Symposium on Databases (SBBD), Florianópolis, SC, Brazil. SBC, pp. 265--279.Google ScholarGoogle Scholar
  16. R. Kashef and M. S. Kamel (2008): Efficient Bisecting KMedoids and its Application in Gene Expression Analysis, Lecture Notes in Computer Science, Vol. 5112/2008, pp. 423--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. J. Redmond and C. Heneghan (2007): A method for initializing the K-Means clustering algorithm using Kd-trees, Pattern recognition letters 28(2007) pp. 965--973, Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. L. Narayan, C. A. Murthy and S. K. Pal (2006): Maxdiff Kd-Trees for data condensation, Pattern recognition letters, vol. 27, pp 187--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. d. Berg, O. Cheong, M. Kreveld and M. Overmars (2008): Computational geometry algorithms and applications, Springer, 3rd edition, pp. 309--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. H. Dunhum (2006): Data Mining- Introductory and Advanced Topics, Pearson Education, pp. 139--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Marinescu (2004): Detection strategies: Metrics-based Rules for Detecting Design Flaws, Proc. of 20th International conf. on software maintenance, Chicago, IL, pp. 350--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Jung, H. Park, and D. Z. Du (2003): A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering, Journal of Global Optimization, vol. 25, pp. 91--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Bhattacherjee and P. S. Bishnu (2011): Software Fault Prediction Using K-Medoids algorithm, ICPQROM 2011, ISI Delh. India (Accepted).Google ScholarGoogle Scholar

Index Terms

  1. Application of K-Medoids with Kd-Tree for Software Fault Prediction

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM SIGSOFT Software Engineering Notes
                  ACM SIGSOFT Software Engineering Notes  Volume 36, Issue 2
                  March 2011
                  116 pages
                  ISSN:0163-5948
                  DOI:10.1145/1943371
                  Issue’s Table of Contents

                  Copyright © 2011 Authors

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 5 May 2011

                  Check for updates

                  Qualifiers

                  • research-article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader