skip to main content
research-article

Application of K-Medoids with Kd-Tree for Software Fault Prediction

Published: 05 May 2011 Publication History

Abstract

Software fault prediction area is subject to problems like non availability of fault data which makes the application of supervised techniques difficult. In such cases unsupervised approaches like clustering are helpful. In this paper, K-Medoids clustering approach has been applied for software fault prediction. To overcome the inherent computational complexity of KMedoids algorithm a data structure called Kd-Tree has been used to identify data agents in the datasets. Partitioning Around Medoids is applied on these data agents and this results in a set of medoids. All the remaining data points are assigned to the nearest medoids thus obtained to get the final clusters. Software fault prediction error analysis results show that our approach outperforms all unsupervised approaches in the case of one given real dataset and gives best values for the evaluation parameters. For other real datasets, our results are comparable to other techniques. Performance evaluation of our technique with other techniques has been done. Results show that our technique reduces the total number of distance calculations drastically since the number of data agents is much less than the number of data points.

References

[1]
R. T. Ng and J. Han (1994): Efficient and Effective Methods for spatial Data Mining, In proceedings of the 20th VLDB Conference Santiago, Chile, pp. 144--155.
[2]
L. Kaufman and P. J. Rousseeuw (1990): Finding Groups in Data: An introduction to Cluster Analysis, John Wiley and Sons.
[3]
P. S. Bishnu and V. Bhattacherjee (2009): A New Approach to K-Medoids Algorithm, In proc. Of HPCA 2009, BHU, Varanasi, pp. 9--11.
[4]
V. Bhattacherjee, P. K. Mahanti, and S. Kumar (2009): Complexity Metrics for Analogy Based Effort Estimation, Journal of Theoretical and applied Information Technology, vol. 6, no. 1, pp. 001--008.
[5]
S. Vicinanza, M. J. Prietulla and T. Mukhopadhyay (1990): Case Based Reasoning in Software Effort Estimation, In Proc. of 11th Int. Conf. on Information Systems. pp. 149--158.
[6]
S. Kumar, V. Bhattacherjee and J. Pal (2011): Software Effort Prediction -- A Fuzzy Logic Approach, ACCT 2011, Rohtak India (Accepted).
[7]
S. Zhong, T. M. Khoshgoftaar, and N. Seliya, (2004): Unsupervised Learning for Expert-based Software Quality Estimation. In Proc. of the 8th Int. Symp. on High Assurance Systems Engineering, Tampa, FL, pp. 149--155.
[8]
S. Zhong, T. M. Khoshgoftaar, and N. Seliya (2004): Analyzing Software Measurement Data with Clustering Techniques. IEEE Intelligent Systems, vol. 19, no. 2, pp. 20--27.
[9]
N. Seliya and T. M. Khoshgoftaar (2002): Software Quality Classification Modeling Using the PRINT Decision algorithm. In Proc. of the 4th Int. Conf. on Tools with AI, Washington DC, pp. 365--374.
[10]
N. Seliya, T. M. Khoshgoftaar and S. Zhong (2005): Analyzing Software Quality with Limited Fault- Proneness Defect Data, Proc. 9th IEEE International Symposium on High-Assurance Systems Engineering, pp. 89--98.
[11]
C. Catal, U. Sevim, and B. Diri, (2009): Clustering and Metrics Threshold Based Software Fault Prediction of Unlabeled Program Modules. In Proc. 6th International IEEE conference on Information Technology: New generations, pp. 199--204.
[12]
V. Bhattacherjee and P.S. Bishnu (2010): Unsupervised Learning Approach to Fault Prediction in Software Module, Proc. of National Conference on Computing and Systems 2010, Burdwan, India, pp. 101--108.
[13]
K. E. Emam, S. Benlarbi, and N. Goel (2001): Comparing Case Based Reasoning Classifiers for Predicting High Risk Software Component, Journal of Systems and software, vol. 55, no. 3, pp. 301--320.
[14]
J. Han and M. Kamber (2007): Data Mining Concepts and Techniques, 2nd edition, Morgan Kaufmann Publishers, pp. 404--408.
[15]
M. C. N. Barioni, H. L. Razente, A. J. M. Traina and C. Traina Jr. (2006): An efficient approach to scale up K-Medoids based algorithms in large databases, In Brazilian Symposium on Databases (SBBD), Florianópolis, SC, Brazil. SBC, pp. 265--279.
[16]
R. Kashef and M. S. Kamel (2008): Efficient Bisecting KMedoids and its Application in Gene Expression Analysis, Lecture Notes in Computer Science, Vol. 5112/2008, pp. 423--434.
[17]
S. J. Redmond and C. Heneghan (2007): A method for initializing the K-Means clustering algorithm using Kd-trees, Pattern recognition letters 28(2007) pp. 965--973, Elsevier.
[18]
B. L. Narayan, C. A. Murthy and S. K. Pal (2006): Maxdiff Kd-Trees for data condensation, Pattern recognition letters, vol. 27, pp 187--200.
[19]
M. d. Berg, O. Cheong, M. Kreveld and M. Overmars (2008): Computational geometry algorithms and applications, Springer, 3rd edition, pp. 309--315.
[20]
M. H. Dunhum (2006): Data Mining- Introductory and Advanced Topics, Pearson Education, pp. 139--142.
[21]
R. Marinescu (2004): Detection strategies: Metrics-based Rules for Detecting Design Flaws, Proc. of 20th International conf. on software maintenance, Chicago, IL, pp. 350--359.
[22]
Y. Jung, H. Park, and D. Z. Du (2003): A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering, Journal of Global Optimization, vol. 25, pp. 91--111.
[23]
V. Bhattacherjee and P. S. Bishnu (2011): Software Fault Prediction Using K-Medoids algorithm, ICPQROM 2011, ISI Delh. India (Accepted).

Cited By

View all
  • (2021)A hybrid shape-based image clustering using time-series analysisMultimedia Tools and Applications10.1007/s11042-020-09765-x80:3(3793-3808)Online publication date: 1-Jan-2021
  • (2020)A Novel Approach for Software Defect prediction Based on the Power Law FunctionApplied Sciences10.3390/app1005189210:5(1892)Online publication date: 10-Mar-2020
  • (2019)Intelligent maintenance frameworks of large-scale grid using genetic algorithm and K-Mediods clustering methodsWorld Wide Web10.1007/s11280-019-00705-wOnline publication date: 13-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes
ACM SIGSOFT Software Engineering Notes  Volume 36, Issue 2
March 2011
116 pages
ISSN:0163-5948
DOI:10.1145/1943371
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2011
Published in SIGSOFT Volume 36, Issue 2

Check for updates

Author Tags

  1. K-Medoids
  2. Kd-Tree
  3. Software fault prediction

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A hybrid shape-based image clustering using time-series analysisMultimedia Tools and Applications10.1007/s11042-020-09765-x80:3(3793-3808)Online publication date: 1-Jan-2021
  • (2020)A Novel Approach for Software Defect prediction Based on the Power Law FunctionApplied Sciences10.3390/app1005189210:5(1892)Online publication date: 10-Mar-2020
  • (2019)Intelligent maintenance frameworks of large-scale grid using genetic algorithm and K-Mediods clustering methodsWorld Wide Web10.1007/s11280-019-00705-wOnline publication date: 13-Jul-2019
  • (2014)A nature-inspired hybrid Fuzzy C-means algorithm for better clustering of biological data sets2014 International Conference on Data Science & Engineering (ICDSE)10.1109/ICDSE.2014.6974615(76-82)Online publication date: Aug-2014
  • (2013)Hyper-Quadtree-Based K-Means Algorithm for Software Fault PredictionComputational Intelligence, Cyber Security and Computational Models10.1007/978-81-322-1680-3_12(107-118)Online publication date: 27-Nov-2013
  • (2012)Software Fault Prediction Using Quad Tree-Based K-Means Clustering AlgorithmIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2011.16324:6(1146-1150)Online publication date: 1-Jun-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media