Abstract
Identification of reusable components during the process of software development is an essential activity. Data mining techniques can be applied for identifying set of software components having dependence amongst each other. In this paper an attempt has been made to identify the group of classes having dependence amongst each other existing in the same repository. We explore document clustering technique based on tf-idf weighing to cluster classes from vast collection of class coupling data for particular java project/program. For this purpose firstly dynamic analysis of java application is done using UML diagrams to collect class import coupling data. Then in second step, this coupling data of each class is treated as a document and represented using VSM (using TF and IDF). Then finally in the third step basic K-mean clustering technique is applied to find clusters of classes. Further each cluster is ranked for its goodness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abrantesy, A.J., Marquesz, J.S.: A Method for Dynamic Clustering of Data. In: Proceedings of the British Machine Vision Conference, pp. 154–163 (1998)
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: ACM, SIGMOD, pp. 207–216 (1993)
Alzghool, M., Inkpen, D.: Clustering the Topics using TF-IDF for Model Fusion. In: ACM Proceeding of the 2nd PhD Workshop on Information and Knowledge Management, pp. 97–100 (2008)
Arisholm, E.: Dynamic Coupling Measurement for Object-Oriented Software. IEEE Transactions on Software Engineering 30(8), 491–506 (2004)
Bhatia, P.K., Mann, R.: An Approach to Measure Software Reusability of OO Design. In: Proceedings of the 2nd National Conference on Challenges & Opportunities in Information Technology, pp. 26–30 (2008)
Cosine Similarity, http://en.wikipedia.org/wiki/Cosine_similarity
Czibula, I.G., Serban, G.: Hierarchical Clustering Based Design Patterns Identification. Int. J. of Computers Communications & Control 3, 248–252 (2008)
Eickhoff, F. Ellis, J., Demurjian, S., Needham, D.: A Reuse Definition, Assessment, and Analysis Framework for UML. In: International Conference on Software Engineering (2003), http://www.engr.uconn.edu/~steve/Cse298300/eickhofficse2003submit.pdf
Fung, B.C.M., Wang, K., Esterz, M.: Hierarchical Document Clustering Using Frequent Itemsets. In: Proceedings of the third SIAM International Conference on Data Mining (2003)
Gupta, V., Chhabra, J.K.: Measurement of Dynamic Metrics Using Dynamic Analysis of Programs. In: Proceedings of the Applied Computing Conference, pp. 81–86 (2008)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data, An introduction to Cluster Analysis. John Wiley & Sons, Inc., Chichester (1990)
Kiran, G.V.R., Shankar, K.R., Pudi, V.: Frequent Itemset based Hierarchical Document Clustering using Wikipedia as External Knowledge. In: Proceeding pf Intl Conference on Knowledge-Based and Intelligent Information Engineering Systems (2010)
Li, W., Chen, C., Wang, J.: PCS: An Efficient Clustering Method for High-Dimensional Data. In: Proceedings of the 4th International Conference on Data Mining (DMIN 2008), July 14-17 (2008)
Ng, R.T., Han, J.: Efficient and effective clustering methods or spatial data mining. In: Proceeding of VLDB conference, pp. 144–155 (1994)
Rao, I.K.R.: Data Mining and Clustering Techniques. In: Proceeding of DRTC Workshop on Semantic Web (2003)
Shiva, S.J., Shala, L.: Software Reuse: Research and Practice. In: Proceedings of the IEEE International Conference on Information Technology, pp. 603–609 (2007)
Taha, W., Crosby, S., Swadi, K.: A New Approach to Data Mining for Software Design. In: 3rd International Conference on Computer Science, Software Engineering, Information Technology, e-Business, and Applications (2004)
Xiao, Y.: A Survey of Document Clustering Techniques & Comparison of LDA and moVMF. In: CS 229 Machine Learning Final Projects (2010)
Xie, T., Pei, J.: Data mining for Software Engineering, http://ase.csc.ncsu.edu/dmse/dmse.pdf
Yossef, Z.B., Guy, I.: Cluster Ranking with an Application to Mining Mailbox Networks. In: ACM Proceedings of the Sixth International Conference on Data Mining (2006)
Zhang, T., Ramakrishnan, R., Birch, L.M.: An efficient data clustering method for very large data-bases. In: ACM SIGMOD, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Parashar, A., Chhabra, J.K. (2011). Clustering Dynamic Class Coupling Data to Measure Class Reusability Pattern. In: Mantri, A., Nandi, S., Kumar, G., Kumar, S. (eds) High Performance Architecture and Grid Computing. HPAGC 2011. Communications in Computer and Information Science, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22577-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-22577-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22576-5
Online ISBN: 978-3-642-22577-2
eBook Packages: Computer ScienceComputer Science (R0)