Skip to main content

Clustering Dynamic Class Coupling Data to Measure Class Reusability Pattern

  • Conference paper
High Performance Architecture and Grid Computing (HPAGC 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 169))

Abstract

Identification of reusable components during the process of software development is an essential activity. Data mining techniques can be applied for identifying set of software components having dependence amongst each other. In this paper an attempt has been made to identify the group of classes having dependence amongst each other existing in the same repository. We explore document clustering technique based on tf-idf weighing to cluster classes from vast collection of class coupling data for particular java project/program. For this purpose firstly dynamic analysis of java application is done using UML diagrams to collect class import coupling data. Then in second step, this coupling data of each class is treated as a document and represented using VSM (using TF and IDF). Then finally in the third step basic K-mean clustering technique is applied to find clusters of classes. Further each cluster is ranked for its goodness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abrantesy, A.J., Marquesz, J.S.: A Method for Dynamic Clustering of Data. In: Proceedings of the British Machine Vision Conference, pp. 154–163 (1998)

    Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: ACM, SIGMOD, pp. 207–216 (1993)

    Google Scholar 

  3. Alzghool, M., Inkpen, D.: Clustering the Topics using TF-IDF for Model Fusion. In: ACM Proceeding of the 2nd PhD Workshop on Information and Knowledge Management, pp. 97–100 (2008)

    Google Scholar 

  4. Arisholm, E.: Dynamic Coupling Measurement for Object-Oriented Software. IEEE Transactions on Software Engineering 30(8), 491–506 (2004)

    Article  Google Scholar 

  5. Bhatia, P.K., Mann, R.: An Approach to Measure Software Reusability of OO Design. In: Proceedings of the 2nd National Conference on Challenges & Opportunities in Information Technology, pp. 26–30 (2008)

    Google Scholar 

  6. Cosine Similarity, http://en.wikipedia.org/wiki/Cosine_similarity

  7. Czibula, I.G., Serban, G.: Hierarchical Clustering Based Design Patterns Identification. Int. J. of Computers Communications & Control 3, 248–252 (2008)

    Google Scholar 

  8. Eickhoff, F. Ellis, J., Demurjian, S., Needham, D.: A Reuse Definition, Assessment, and Analysis Framework for UML. In: International Conference on Software Engineering (2003), http://www.engr.uconn.edu/~steve/Cse298300/eickhofficse2003submit.pdf

  9. Fung, B.C.M., Wang, K., Esterz, M.: Hierarchical Document Clustering Using Frequent Itemsets. In: Proceedings of the third SIAM International Conference on Data Mining (2003)

    Google Scholar 

  10. Gupta, V., Chhabra, J.K.: Measurement of Dynamic Metrics Using Dynamic Analysis of Programs. In: Proceedings of the Applied Computing Conference, pp. 81–86 (2008)

    Google Scholar 

  11. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data, An introduction to Cluster Analysis. John Wiley & Sons, Inc., Chichester (1990)

    MATH  Google Scholar 

  12. Kiran, G.V.R., Shankar, K.R., Pudi, V.: Frequent Itemset based Hierarchical Document Clustering using Wikipedia as External Knowledge. In: Proceeding pf Intl Conference on Knowledge-Based and Intelligent Information Engineering Systems (2010)

    Google Scholar 

  13. Li, W., Chen, C., Wang, J.: PCS: An Efficient Clustering Method for High-Dimensional Data. In: Proceedings of the 4th International Conference on Data Mining (DMIN 2008), July 14-17 (2008)

    Google Scholar 

  14. Ng, R.T., Han, J.: Efficient and effective clustering methods or spatial data mining. In: Proceeding of VLDB conference, pp. 144–155 (1994)

    Google Scholar 

  15. Rao, I.K.R.: Data Mining and Clustering Techniques. In: Proceeding of DRTC Workshop on Semantic Web (2003)

    Google Scholar 

  16. Shiva, S.J., Shala, L.: Software Reuse: Research and Practice. In: Proceedings of the IEEE International Conference on Information Technology, pp. 603–609 (2007)

    Google Scholar 

  17. Taha, W., Crosby, S., Swadi, K.: A New Approach to Data Mining for Software Design. In: 3rd International Conference on Computer Science, Software Engineering, Information Technology, e-Business, and Applications (2004)

    Google Scholar 

  18. Xiao, Y.: A Survey of Document Clustering Techniques & Comparison of LDA and moVMF. In: CS 229 Machine Learning Final Projects (2010)

    Google Scholar 

  19. Xie, T., Pei, J.: Data mining for Software Engineering, http://ase.csc.ncsu.edu/dmse/dmse.pdf

  20. Yossef, Z.B., Guy, I.: Cluster Ranking with an Application to Mining Mailbox Networks. In: ACM Proceedings of the Sixth International Conference on Data Mining (2006)

    Google Scholar 

  21. Zhang, T., Ramakrishnan, R., Birch, L.M.: An efficient data clustering method for very large data-bases. In: ACM SIGMOD, pp. 103–114 (1996)

    Google Scholar 

  22. http://en.wikipedia.org/wiki/Distance

  23. http://en.wikipedia.org/wiki/Euclidean_distance

  24. http://en.wikipedia.org/wiki/Metric_mathematics

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Parashar, A., Chhabra, J.K. (2011). Clustering Dynamic Class Coupling Data to Measure Class Reusability Pattern. In: Mantri, A., Nandi, S., Kumar, G., Kumar, S. (eds) High Performance Architecture and Grid Computing. HPAGC 2011. Communications in Computer and Information Science, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22577-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22577-2_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22576-5

  • Online ISBN: 978-3-642-22577-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics