research-article

Application of K-Medoids with Kd-Tree for Software Fault Prediction

Authors:

V. BhattacherjeeAuthors Info & Claims

ACM SIGSOFT Software Engineering Notes, Volume 36, Issue 2

Pages 1 - 6

https://doi.org/10.1145/1943371.1943381

Published: 05 May 2011 Publication History

Abstract

Software fault prediction area is subject to problems like non availability of fault data which makes the application of supervised techniques difficult. In such cases unsupervised approaches like clustering are helpful. In this paper, K-Medoids clustering approach has been applied for software fault prediction. To overcome the inherent computational complexity of KMedoids algorithm a data structure called Kd-Tree has been used to identify data agents in the datasets. Partitioning Around Medoids is applied on these data agents and this results in a set of medoids. All the remaining data points are assigned to the nearest medoids thus obtained to get the final clusters. Software fault prediction error analysis results show that our approach outperforms all unsupervised approaches in the case of one given real dataset and gives best values for the evaluation parameters. For other real datasets, our results are comparable to other techniques. Performance evaluation of our technique with other techniques has been done. Results show that our technique reduces the total number of distance calculations drastically since the number of data agents is much less than the number of data points.

References

[1]

R. T. Ng and J. Han (1994): Efficient and Effective Methods for spatial Data Mining, In proceedings of the 20th VLDB Conference Santiago, Chile, pp. 144--155.

Digital Library

[2]

L. Kaufman and P. J. Rousseeuw (1990): Finding Groups in Data: An introduction to Cluster Analysis, John Wiley and Sons.

[3]

P. S. Bishnu and V. Bhattacherjee (2009): A New Approach to K-Medoids Algorithm, In proc. Of HPCA 2009, BHU, Varanasi, pp. 9--11.

[4]

V. Bhattacherjee, P. K. Mahanti, and S. Kumar (2009): Complexity Metrics for Analogy Based Effort Estimation, Journal of Theoretical and applied Information Technology, vol. 6, no. 1, pp. 001--008.

[5]

S. Vicinanza, M. J. Prietulla and T. Mukhopadhyay (1990): Case Based Reasoning in Software Effort Estimation, In Proc. of 11th Int. Conf. on Information Systems. pp. 149--158.

[6]

S. Kumar, V. Bhattacherjee and J. Pal (2011): Software Effort Prediction -- A Fuzzy Logic Approach, ACCT 2011, Rohtak India (Accepted).

[7]

S. Zhong, T. M. Khoshgoftaar, and N. Seliya, (2004): Unsupervised Learning for Expert-based Software Quality Estimation. In Proc. of the 8th Int. Symp. on High Assurance Systems Engineering, Tampa, FL, pp. 149--155.

Digital Library

[8]

S. Zhong, T. M. Khoshgoftaar, and N. Seliya (2004): Analyzing Software Measurement Data with Clustering Techniques. IEEE Intelligent Systems, vol. 19, no. 2, pp. 20--27.

Digital Library

[9]

N. Seliya and T. M. Khoshgoftaar (2002): Software Quality Classification Modeling Using the PRINT Decision algorithm. In Proc. of the 4th Int. Conf. on Tools with AI, Washington DC, pp. 365--374.

Digital Library

[10]

N. Seliya, T. M. Khoshgoftaar and S. Zhong (2005): Analyzing Software Quality with Limited Fault- Proneness Defect Data, Proc. 9th IEEE International Symposium on High-Assurance Systems Engineering, pp. 89--98.

Digital Library

[11]

C. Catal, U. Sevim, and B. Diri, (2009): Clustering and Metrics Threshold Based Software Fault Prediction of Unlabeled Program Modules. In Proc. 6th International IEEE conference on Information Technology: New generations, pp. 199--204.

Digital Library

[12]

V. Bhattacherjee and P.S. Bishnu (2010): Unsupervised Learning Approach to Fault Prediction in Software Module, Proc. of National Conference on Computing and Systems 2010, Burdwan, India, pp. 101--108.

[13]

K. E. Emam, S. Benlarbi, and N. Goel (2001): Comparing Case Based Reasoning Classifiers for Predicting High Risk Software Component, Journal of Systems and software, vol. 55, no. 3, pp. 301--320.

Digital Library

[14]

J. Han and M. Kamber (2007): Data Mining Concepts and Techniques, 2nd edition, Morgan Kaufmann Publishers, pp. 404--408.

Digital Library

[15]

M. C. N. Barioni, H. L. Razente, A. J. M. Traina and C. Traina Jr. (2006): An efficient approach to scale up K-Medoids based algorithms in large databases, In Brazilian Symposium on Databases (SBBD), Florianópolis, SC, Brazil. SBC, pp. 265--279.

[16]

R. Kashef and M. S. Kamel (2008): Efficient Bisecting KMedoids and its Application in Gene Expression Analysis, Lecture Notes in Computer Science, Vol. 5112/2008, pp. 423--434.

Digital Library

[17]

S. J. Redmond and C. Heneghan (2007): A method for initializing the K-Means clustering algorithm using Kd-trees, Pattern recognition letters 28(2007) pp. 965--973, Elsevier.

Digital Library

[18]

B. L. Narayan, C. A. Murthy and S. K. Pal (2006): Maxdiff Kd-Trees for data condensation, Pattern recognition letters, vol. 27, pp 187--200.

Digital Library

[19]

M. d. Berg, O. Cheong, M. Kreveld and M. Overmars (2008): Computational geometry algorithms and applications, Springer, 3rd edition, pp. 309--315.

Digital Library

[20]

M. H. Dunhum (2006): Data Mining- Introductory and Advanced Topics, Pearson Education, pp. 139--142.

Digital Library

[21]

R. Marinescu (2004): Detection strategies: Metrics-based Rules for Detecting Design Flaws, Proc. of 20th International conf. on software maintenance, Chicago, IL, pp. 350--359.

Digital Library

[22]

Y. Jung, H. Park, and D. Z. Du (2003): A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering, Journal of Global Optimization, vol. 25, pp. 91--111.

Digital Library

[23]

V. Bhattacherjee and P. S. Bishnu (2011): Software Fault Prediction Using K-Medoids algorithm, ICPQROM 2011, ISI Delh. India (Accepted).

Cited By

Mondal ADey NFong SAshour A(2021)A hybrid shape-based image clustering using time-series analysisMultimedia Tools and Applications10.1007/s11042-020-09765-x80:3(3793-3808)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1007/s11042-020-09765-x
Ren JLiu F(2020)A Novel Approach for Software Defect prediction Based on the Power Law FunctionApplied Sciences10.3390/app1005189210:5(1892)Online publication date: 10-Mar-2020
https://doi.org/10.3390/app10051892
Wang WLou BLi XLou XJin NYan K(2019)Intelligent maintenance frameworks of large-scale grid using genetic algorithm and K-Mediods clustering methodsWorld Wide Web10.1007/s11280-019-00705-wOnline publication date: 13-Jul-2019
https://doi.org/10.1007/s11280-019-00705-w
Show More Cited By

Recommendations

An improved K-medoids algorithm based on step increasing and optimizing medoids

The proposed clustering algorithm improves performance and preserves efficiency.We propose a candidate medoids subset to optimize the clustering medoids.We propose increasing the medoid methods in a step-wise fashion.Results report better performances ...
Analysis of K-Means and K-Medoids Algorithm For Big Data

Clustering plays a very vital role in exploring data, creating predictions and to overcome the anomalies in the data. Clusters that contain collateral, identical characteristics in a dataset are grouped using reiterative techniques. As the data in real ...
Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes

ACM SIGSOFT Software Engineering Notes Volume 36, Issue 2

March 2011

116 pages

ISSN:0163-5948

DOI:10.1145/1943371

Issue’s Table of Contents

Copyright © 2011 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2011

Published in SIGSOFT Volume 36, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
447
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mondal ADey NFong SAshour A(2021)A hybrid shape-based image clustering using time-series analysisMultimedia Tools and Applications10.1007/s11042-020-09765-x80:3(3793-3808)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1007/s11042-020-09765-x
Ren JLiu F(2020)A Novel Approach for Software Defect prediction Based on the Power Law FunctionApplied Sciences10.3390/app1005189210:5(1892)Online publication date: 10-Mar-2020
https://doi.org/10.3390/app10051892
Wang WLou BLi XLou XJin NYan K(2019)Intelligent maintenance frameworks of large-scale grid using genetic algorithm and K-Mediods clustering methodsWorld Wide Web10.1007/s11280-019-00705-wOnline publication date: 13-Jul-2019
https://doi.org/10.1007/s11280-019-00705-w
Arunanand TNazeer KPalakal MPradhan M(2014)A nature-inspired hybrid Fuzzy C-means algorithm for better clustering of biological data sets2014 International Conference on Data Science & Engineering (ICDSE)10.1109/ICDSE.2014.6974615(76-82)Online publication date: Aug-2014
https://doi.org/10.1109/ICDSE.2014.6974615
Sasidharan RSriram P(2013)Hyper-Quadtree-Based K-Means Algorithm for Software Fault PredictionComputational Intelligence, Cyber Security and Computational Models10.1007/978-81-322-1680-3_12(107-118)Online publication date: 27-Nov-2013
https://doi.org/10.1007/978-81-322-1680-3_12
Bishnu PBhattacherjee V(2012)Software Fault Prediction Using Quad Tree-Based K-Means Clustering AlgorithmIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2011.16324:6(1146-1150)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1109/TKDE.2011.163

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents