Abstract
Software fault prediction area is subject to problems like non availability of fault data which makes the application of supervised techniques difficult. In such cases unsupervised approaches like clustering are helpful. In this paper, K-Medoids clustering approach has been applied for software fault prediction. To overcome the inherent computational complexity of KMedoids algorithm a data structure called Kd-Tree has been used to identify data agents in the datasets. Partitioning Around Medoids is applied on these data agents and this results in a set of medoids. All the remaining data points are assigned to the nearest medoids thus obtained to get the final clusters. Software fault prediction error analysis results show that our approach outperforms all unsupervised approaches in the case of one given real dataset and gives best values for the evaluation parameters. For other real datasets, our results are comparable to other techniques. Performance evaluation of our technique with other techniques has been done. Results show that our technique reduces the total number of distance calculations drastically since the number of data agents is much less than the number of data points.
- R. T. Ng and J. Han (1994): Efficient and Effective Methods for spatial Data Mining, In proceedings of the 20th VLDB Conference Santiago, Chile, pp. 144--155. Google ScholarDigital Library
- L. Kaufman and P. J. Rousseeuw (1990): Finding Groups in Data: An introduction to Cluster Analysis, John Wiley and Sons.Google Scholar
- P. S. Bishnu and V. Bhattacherjee (2009): A New Approach to K-Medoids Algorithm, In proc. Of HPCA 2009, BHU, Varanasi, pp. 9--11.Google Scholar
- V. Bhattacherjee, P. K. Mahanti, and S. Kumar (2009): Complexity Metrics for Analogy Based Effort Estimation, Journal of Theoretical and applied Information Technology, vol. 6, no. 1, pp. 001--008.Google Scholar
- S. Vicinanza, M. J. Prietulla and T. Mukhopadhyay (1990): Case Based Reasoning in Software Effort Estimation, In Proc. of 11th Int. Conf. on Information Systems. pp. 149--158.Google Scholar
- S. Kumar, V. Bhattacherjee and J. Pal (2011): Software Effort Prediction -- A Fuzzy Logic Approach, ACCT 2011, Rohtak India (Accepted).Google Scholar
- S. Zhong, T. M. Khoshgoftaar, and N. Seliya, (2004): Unsupervised Learning for Expert-based Software Quality Estimation. In Proc. of the 8th Int. Symp. on High Assurance Systems Engineering, Tampa, FL, pp. 149--155. Google ScholarDigital Library
- S. Zhong, T. M. Khoshgoftaar, and N. Seliya (2004): Analyzing Software Measurement Data with Clustering Techniques. IEEE Intelligent Systems, vol. 19, no. 2, pp. 20--27. Google ScholarDigital Library
- N. Seliya and T. M. Khoshgoftaar (2002): Software Quality Classification Modeling Using the PRINT Decision algorithm. In Proc. of the 4th Int. Conf. on Tools with AI, Washington DC, pp. 365--374. Google ScholarDigital Library
- N. Seliya, T. M. Khoshgoftaar and S. Zhong (2005): Analyzing Software Quality with Limited Fault- Proneness Defect Data, Proc. 9th IEEE International Symposium on High-Assurance Systems Engineering, pp. 89--98. Google ScholarDigital Library
- C. Catal, U. Sevim, and B. Diri, (2009): Clustering and Metrics Threshold Based Software Fault Prediction of Unlabeled Program Modules. In Proc. 6th International IEEE conference on Information Technology: New generations, pp. 199--204. Google ScholarDigital Library
- V. Bhattacherjee and P.S. Bishnu (2010): Unsupervised Learning Approach to Fault Prediction in Software Module, Proc. of National Conference on Computing and Systems 2010, Burdwan, India, pp. 101--108.Google Scholar
- K. E. Emam, S. Benlarbi, and N. Goel (2001): Comparing Case Based Reasoning Classifiers for Predicting High Risk Software Component, Journal of Systems and software, vol. 55, no. 3, pp. 301--320. Google ScholarDigital Library
- J. Han and M. Kamber (2007): Data Mining Concepts and Techniques, 2nd edition, Morgan Kaufmann Publishers, pp. 404--408. Google ScholarDigital Library
- M. C. N. Barioni, H. L. Razente, A. J. M. Traina and C. Traina Jr. (2006): An efficient approach to scale up K-Medoids based algorithms in large databases, In Brazilian Symposium on Databases (SBBD), Florianópolis, SC, Brazil. SBC, pp. 265--279.Google Scholar
- R. Kashef and M. S. Kamel (2008): Efficient Bisecting KMedoids and its Application in Gene Expression Analysis, Lecture Notes in Computer Science, Vol. 5112/2008, pp. 423--434. Google ScholarDigital Library
- S. J. Redmond and C. Heneghan (2007): A method for initializing the K-Means clustering algorithm using Kd-trees, Pattern recognition letters 28(2007) pp. 965--973, Elsevier. Google ScholarDigital Library
- B. L. Narayan, C. A. Murthy and S. K. Pal (2006): Maxdiff Kd-Trees for data condensation, Pattern recognition letters, vol. 27, pp 187--200. Google ScholarDigital Library
- M. d. Berg, O. Cheong, M. Kreveld and M. Overmars (2008): Computational geometry algorithms and applications, Springer, 3rd edition, pp. 309--315. Google ScholarDigital Library
- M. H. Dunhum (2006): Data Mining- Introductory and Advanced Topics, Pearson Education, pp. 139--142. Google ScholarDigital Library
- R. Marinescu (2004): Detection strategies: Metrics-based Rules for Detecting Design Flaws, Proc. of 20th International conf. on software maintenance, Chicago, IL, pp. 350--359. Google ScholarDigital Library
- Y. Jung, H. Park, and D. Z. Du (2003): A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering, Journal of Global Optimization, vol. 25, pp. 91--111. Google ScholarDigital Library
- V. Bhattacherjee and P. S. Bishnu (2011): Software Fault Prediction Using K-Medoids algorithm, ICPQROM 2011, ISI Delh. India (Accepted).Google Scholar
Index Terms
- Application of K-Medoids with Kd-Tree for Software Fault Prediction
Recommendations
An improved K-medoids algorithm based on step increasing and optimizing medoids
The proposed clustering algorithm improves performance and preserves efficiency.We propose a candidate medoids subset to optimize the clustering medoids.We propose increasing the medoid methods in a step-wise fashion.Results report better performances ...
Analysis of K-Means and K-Medoids Algorithm For Big Data
Clustering plays a very vital role in exploring data, creating predictions and to overcome the anomalies in the data. Clusters that contain collateral, identical characteristics in a dataset are grouped using reiterative techniques. As the data in real ...
Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology
This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
Comments