Skip to main content
Log in

Euclidean space based hierarchical clusterers combinations: an application to software clustering

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Hierarchical clustering groups similar entities on the basis of some similarity (or distance) association and results in a tree like structure, called dendrogram. Dendrograms represent clusters in a nested manner, where at each step an entity makes a new cluster or merges into an existing cluster. Hierarchical clustering has many applications, therefore researchers have made efforts to come up with improved hierarchical clustering approaches. An approach that has received attention is based on combining clustering results, since different hierarchical clustering algorithms produce different dendrograms and their combination has produced more promising results as compared to individual hierarchical clustering. This paper proposes the hierarchical clustering combination (HCC) approach which uses the different types of structural features present in the dendrogram. Firstly, the dendrograms are represented in a 4+N (4 is the extracted number of features and can be extended to N number) dimensional euclidean space (4+NDES) which results in vector matrices. 4+NDES is the structural representation of the dendrogram which contains not only the relative features but also the absolute features of the entities in the dendrogram. Then the vector matrices are aggregated and the distance is calculated between each two vector using the Euclidean distance measure. The final hierarchy is obtained using a recovery tool like individual hierarchical clustering. 4+NDES-HCC utilizes the structural contents of the dendrogram and has the flexibility to handle an increasing number of features. The proposed approach is tested for software clustering which plays an important role in maintenance of software systems. The experimental results of the proposed approach and comparative analysis with existing approaches reveal the effectiveness of the HCC for software clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://www.cse.yorku.ca/~bil/downloads/.

    • First Cluster: (t,u)

    • Second Cluster: ((t,u)v)

    • Third Cluster: (((t,u)v)w)

    • Fourth Cluster: ((((t,u)v)w)x)

    • Fifth Cluster: (((((t,u)v)w)x)(y,z))

  2. http://www.junit.org/.

  3. http://www.grepcode.com/.

  4. http://perun.pmf.uns.ac.rs/radovanovic/dmsem/cd/install/Weka/doc/html/Weka%203.4.5.htm.

  5. Combination can be calculated using following formula: \(\left( {\begin{array}{c}n\\ k\end{array}}\right) ={}^{n}C_{k}=\frac{n!}{k!(n-k)!}\) where n is the number of clusterers and k is number of choice.

  6. https://docs.google.com/spreadsheets/d/1HUbXsaGxrODRQ7NB2mdMAZXHRr2GJsX-gdnWcqClBGM/edit?usp=sharing.

References

  1. Abi-Antoun, M., Ammar, N., Hailat, Z.: Extraction of ownership object graphs from object-oriented code. In: Proceedings of the 8th International ACM SIGSOFT Conference on Quality of Software Architectures—QoSA ’12, p. 133. ACM Press, New York (2012). https://doi.org/10.1145/2304696.2304719

  2. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)

    Book  Google Scholar 

  3. Amarjeet, Chhabra, J.K.: Harmony search based remodularization for object-oriented software systems. Comput. Lang. Syst. Struct. 47, 153–169 (2017). https://doi.org/10.1016/j.cl.2016.09.003

    Article  Google Scholar 

  4. Amarjeet, Chhabra, J.K.: TA-ABC: two-archive artificial bee colony for multi-objective software module clustering problem. J. Intell. Syst. (2017). https://doi.org/10.1515/jisys-2016-0253

    Article  Google Scholar 

  5. Andritsos, P., Tzerpos, V.: Information-theoretic software clustering. IEEE Trans. Softw. Eng. 31(2), 150–165 (2005). https://doi.org/10.1109/TSE.2005.25

    Article  Google Scholar 

  6. Anquetil, N., Lethbridge, T.C.: Experiments with clustering as a software remodularization method. In: Working Conference on Reverse Engineering, pp. 235–255. IEEE (1999). https://doi.org/10.1109/WCRE.1999.806964

  7. Anquetil, N., Lethbridge, T.C.: Comparative study of clustering algorithms and abstract representations for software remodularisation. IEE Proc. Softw. 150(3), 185–201 (2003). https://doi.org/10.1049/ip-sen:20030581

    Article  Google Scholar 

  8. Bittencourt, R.A., Guerrero, D.D.S.: Comparison of graph clustering algorithms for recovering software architecture module views. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 251–254. IEEE (2009). https://doi.org/10.1109/CSMR.2009.28

  9. Candela, I., Bavota, G., Russo, B., Oliveto, R.: Using cohesion and coupling for software remodularization. ACM Trans. Softw. Eng. Methodol. 25(3), 1–28 (2016). https://doi.org/10.1145/2928268

    Article  Google Scholar 

  10. Choi, S.S., Sung-Hyuk, C., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8(1), 43–48 (2010). http://ezproxy.uthm.edu.my/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=59856128&site=ehost-live&scope=site

  11. Chong, C.Y., Lee, S.P., Ling, T.C.: Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach. Inf. Softw. Technol. 55(11), 1994–2012 (2013). https://doi.org/10.1016/j.infsof.2013.07.002

    Article  Google Scholar 

  12. Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering, pp. 88–96. IEEE (2010). https://doi.org/10.1109/CSMR.2010.36

  13. Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.: Weighing lexical information for software clustering in the context of architecture recovery. Empir. Softw. Eng. 21(1), 72–103 (2016). https://doi.org/10.1007/s10664-014-9347-3

    Article  Google Scholar 

  14. Cui, J.F., Chae, H.S.: Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Inf. Softw. Technol. 53(6), 601–614 (2011). https://doi.org/10.1016/j.infsof.2011.01.006

    Article  Google Scholar 

  15. Davey, J., Burd, E.: Evaluating the suitability of data clustering for software remodularisation. In: Working Conference on Reverse Engineering, pp. 268–276. IEEE (2000). https://doi.org/10.1109/WCRE.2000.891478

  16. Deursen, A.V., Kuipers, T.: Finding classes in legacy code using cluster analysis. In: Workshop on Object Oriented Reengineering, pp. 1–5 (1997)

  17. Dugerdil, P., Jossi, S.: Reverse-architecting legacy software based on roles: an industrial experiment. In: Software and Data Technologies, pp. 114–127. Springer, Berlin (2009). https://doi.org/10.1007/978-3-540-88655-6_9

    Google Scholar 

  18. El-Ramly, M., Iglinski, P., Stroulia, E., Sorenson, P., Matichuk, B.: Modeling the system-user dialog using interaction traces. In: Proceedings of the Eighth Working Conference on Reverse Engineering, pp. 208–217 (2001). https://doi.org/10.1109/WCRE.2001.957825

  19. François-Joseph Lapointe, P.L.: Comparison tests for dendrograms: a comparative evaluation. J. Classif. 12(2), 265–282 (1995). https://doi.org/10.1007/BF03040858

    Article  Google Scholar 

  20. Garcia, J., Ivkovic, I., Medvidovic, N.: A comparative analysis of software architecture recovery techniques. In: IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 486–496. IEEE (2013). https://doi.org/10.1109/ASE.2013.6693106

  21. Glorie, M., Zaidman, A., van Deursen, A., Hofland, L.: Splitting a large software repository for easing future software evolution-an industrial experience report. J. Softw. Maint. Evol. Res. Pract. 21(2), 113–141 (2009). https://doi.org/10.1002/smr.401

    Article  Google Scholar 

  22. Gueheneuc, Y.G., Antoniol, G.: DeMIMA: a multilayered approach for design pattern identification. IEEE Trans. Softw. Eng. 34(5), 667–684 (2008). https://doi.org/10.1109/TSE.2008.48

    Article  Google Scholar 

  23. Hall, M., Walkinshaw, N., McMinn, P.: Supervised software modularisation. In: IEEE International Conference on Software Maintenance (ICSM), pp. 472–481. IEEE (2012). https://doi.org/10.1109/ICSM.2012.6405309

  24. Hoffman, K.: Analysis in Euclidean Space. Courier Corporation, Mineola (2013)

    MATH  Google Scholar 

  25. Huang, J., Liu, J., Yao, X.: A multi-agent evolutionary algorithm for software module clustering problems. Soft Comput. 21(12), 3415–3428 (2017). https://doi.org/10.1007/s00500-015-2018-5

    Article  Google Scholar 

  26. Ibrahim, A., Rayside, D., Kashef, R.: Cooperative based software clustering on dependency graphs. In: Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–6. IEEE, Canada (2014). https://doi.org/10.1109/CCECE.2014.6900911

  27. Izadkhah, H., Elgedawy, I., Isazadeh, A.: E-CDGM: an evolutionary call-dependency graph modularization approach for software systems. Cybern. Inf. Technol. 16(3), 70–90 (2016). https://doi.org/10.1515/cait-2016-0035

    Article  Google Scholar 

  28. Jahnke, J.: Reverse engineering software architecture using rough clusters. In: IEEE Annual Meeting of the Fuzzy Information, vol. 1, pp. 4–9. IEEE (2004). https://doi.org/10.1109/NAFIPS.2004.1336239

  29. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)

    MATH  Google Scholar 

  30. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504

    Article  Google Scholar 

  31. Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967). https://doi.org/10.1007/BF02289588

    Article  MATH  Google Scholar 

  32. Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C.: k-Attractors: a clustering algorithm for software measurement data analysis. In: IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), pp. 358–365. IEEE (2007). https://doi.org/10.1109/ICTAI.2007.31

  33. Kashef, R.F., Kamel, M.S.: Cooperative clustering. Pattern Recogn. 43(6), 2315–2329 (2010). https://doi.org/10.1016/j.patcog.2009.12.018

    Article  MATH  Google Scholar 

  34. Koschke, R., Eisenbarth, T.: A framework for experimental evaluation of clustering techniques. In: International Workshop on Program Comprehension, pp. 201–210. IEEE Computer Society (2000). https://doi.org/10.1109/WPC.2000.852494

  35. Kramer, H.H., Uchoa, E., Fampa, M., Köhler, V., Vanderbeck, F.: Column generation approaches for the software clustering problem. Comput. Optim. Appl. 64(3), 843–864 (2016). https://doi.org/10.1007/s10589-015-9822-9

    Article  MathSciNet  MATH  Google Scholar 

  36. Kumari, A.C., Srinivas, K.: Hyper-heuristic approach for multi-objective software module clustering. J. Syst. Softw. 117, 384–401 (2016). https://doi.org/10.1016/j.jss.2016.04.007

    Article  Google Scholar 

  37. Lakhotia, A.: A unified framework for expressing software subsystem classification techniques. J. Syst. Softw. 36(3), 211–231 (1997). https://doi.org/10.1016/0164-1212(95)00098-4

    Article  Google Scholar 

  38. Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: IEEE International Conference on Software Engineering (ICSE), vol. 2, pp. 69–78. IEEE, ACM, USA, Canada (2015). https://doi.org/10.1109/ICSE.2015.136

  39. Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Measuring the impact of code dependencies on software architecture recovery techniques. IEEE Trans. Softw. Eng. (2017). https://doi.org/10.1109/TSE.2017.2671865

    Article  Google Scholar 

  40. Lutellier, T.: Measuring the impact of code dependencies on software architecture recovery techniques. Ph.D. thesis, University of Waterloo (2015)

  41. Mahmoud, A., Niu, N.: Evaluating software clustering algorithms in the context of program comprehension. In: International Conference on Program Comprehension (ICPC), pp. 162–171. IEEE, USA (2013). https://doi.org/10.1109/ICPC.2013.6613844

  42. Maqbool, O., Babri, H.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Eighth European Conference on Software Maintenance and Reengineering, pp. 15–24. IEEE (2004). https://doi.org/10.1109/CSMR.2004.1281402

  43. Maqbool, O., Babri, H.: Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng. 33(11), 759–780 (2007). https://doi.org/10.1109/TSE.2007.70732

    Article  Google Scholar 

  44. Mirzaei, A., Rahmati, M.: A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations. IEEE Trans. Fuzzy Syst. 18(1), 27–39 (2010). https://doi.org/10.1109/TFUZZ.2009.2034531

    Article  Google Scholar 

  45. Mirzaei, A., Rahmati, M., Ahmadi, M.: A new method for hierarchical clustering combination. Intell. Data Anal. 12, 549–571 (2008)

    Article  Google Scholar 

  46. Mitchell, B.S., Mancoridis, S.: Comparing the decompositions produced by software clustering algorithms using similarity measurements. In: International Conference on Software Maintenance, pp. 744–753. IEEE (2001). https://doi.org/10.1109/ICSM.2001.972795

  47. Mitchell, B.S., Mancoridis, S.: On the automatic modularization of software systems using the Bunch tool. IEEE Trans. Softw. Eng. 32(3), 193–208 (2006). https://doi.org/10.1109/TSE.2006.31

    Article  Google Scholar 

  48. Muhammad, S., Maqbool, O., Abbasi, A.Q.: Evaluating relationship categories for clustering object-oriented software systems. IET Softw. 6(3), 260 (2012). https://doi.org/10.1049/iet-sen.2011.0061

    Article  Google Scholar 

  49. Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983). https://doi.org/10.1093/comjnl/26.4.354

    Article  MATH  Google Scholar 

  50. Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, pp. 111–116. IEEE (2010). https://doi.org/10.1109/CIMSiM.2010.34

  51. Naseem, R., Maqbool, O., Muhammad, S.: Improved similarity measures for software clustering. In: European Conference on Software Maintenance and Reengineering (CSMR), pp. 45–54. IEEE, Pakistan (2011). https://doi.org/10.1109/CSMR.2011.9

  52. Naseem, R., Maqbool, O., Muhammad, S.: Cooperative clustering for software modularization. J. Syst. Softw. 86(8), 2045–2062 (2013). https://doi.org/10.1016/j.jss.2013.03.080

    Article  Google Scholar 

  53. Naseem, R., Deris, M.B.M., Li, J., Shahzad, S.: Improved binary similarity measures for software modularization. Front. Inf. Technol. Electron. Eng. 18(8), 1–28 (2017)

    Article  Google Scholar 

  54. Patel, C., Hamou-Lhadj, A., Rilling, J.: Software clustering using dynamic analysis and static dependencies. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 27–36. IEEE (2009). https://doi.org/10.1109/CSMR.2009.62

  55. Paulson, J., Succi, G., Eberlein, A.: An empirical study of open-source and closed-source software products. IEEE Trans. Softw. Eng. 30(4), 246–256 (2004). https://doi.org/10.1109/TSE.2004.1274044

    Article  Google Scholar 

  56. Podani, J.: Simulation of random dendrograms and comparison tests: some comments. J. Classif. 17(1), 123–142 (2000). https://doi.org/10.1007/s003570000007

    Article  MathSciNet  MATH  Google Scholar 

  57. Praditwong, K., Harman, M., Yao, X.: Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng. 37(2), 264–282 (2011). https://doi.org/10.1109/TSE.2010.26

    Article  Google Scholar 

  58. Pukelsheim, F.: The three sigma rule. Am. Stat. 48(2), 88–91 (1994). http://www.jstor.org/stable/2684253

    MathSciNet  Google Scholar 

  59. Rashedi, E., Mirzaei, A.: A hierarchical clusterer ensemble method based on boosting theory. Knowl. Based Syst. 45, 83–93 (2013). https://doi.org/10.1016/j.knosys.2013.02.009

    Article  Google Scholar 

  60. Rashedi, E., Mirzaei, A., Rahmati, M.: An information theoretic approach to hierarchical clustering combination. Neurocomputing 148, 487–497 (2015). https://doi.org/10.1016/j.neucom.2014.07.014

    Article  Google Scholar 

  61. Saeed, M., Maqbool, O., Babri, H., Hassan, S., Sarwar, S.: Software clustering techniques and the use of combined algorithm. In: Seventh European Conference on Software Maintenance and Reengineering, pp. 301–306. IEEE Computer Society (2003). https://doi.org/10.1109/CSMR.2003.1192438

  62. Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and K-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods, pp. 103–112. IEEE (2010). https://doi.org/10.1109/SEFM.2010.19

  63. Seriai, A., Sadou, S., Sahraoui, H.A.: Enactment of components extracted from an object-oriented application. In: The European Conference on Software Architecture (ECSA), pp. 234–249 (2014). https://doi.org/10.1007/978-3-319-09970-5_22

    Chapter  Google Scholar 

  64. Shah, Z., Naseem, R., Orgun, M., Mahmood, A.N., Shahzad, S.: Software clustering using automated feature subset selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) International Conference on Advanced Data Mining and Applications (ADMA). Lecture Notes in Computer Science, vol. 8347, pp. 47–58. Springer, Italy (2013). https://doi.org/10.1007/978-3-642-53917-6_5

    Chapter  Google Scholar 

  65. Shtern, M., Tzerpos, V.: On the comparability of software clustering algorithms. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 64–67. IEEE (2010). https://doi.org/10.1109/ICPC.2010.25

  66. Siddique, F., Maqbool, O.: Enhancing comprehensibility of software clustering results. IET Softw. 6(4), 283 (2012). https://doi.org/10.1049/iet-sen.2012.0027

    Article  Google Scholar 

  67. Stavropoulou, I., Grigoriou, M., Kontogiannis, K.: Case study on which relations to use for clustering-based software architecture recovery. Empir. Softw. Eng. 2017, 1–46 (2017). https://doi.org/10.1007/s10664-016-9459-z

    Article  Google Scholar 

  68. Synytskyy, N., Holt, R.C., Davis, I.: Browsing software architectures with LSEdit. In: 13th International Workshop on Program Comprehension, pp. 176–178. IEEE (2005). https://doi.org/10.1109/WPC.2005.11

  69. Tonella, P.: Concept analysis for module restructuring. IEEE Trans. Softw. Eng. 27(4), 351–363 (2001). https://doi.org/10.1109/32.917524

    Article  Google Scholar 

  70. Tzerpos, V., Holt, R.C.: ACDC: an algorithm for comprehension-driven clustering. In: Working Conference on Reverse Engineering, pp. 258–267. IEEE (2000). https://doi.org/10.1109/WCRE.2000.891477

  71. Tzerpos, V., Holt, R.C.: MoJo: a distance metric for software clusterings. In: Working Conference on Reverse Engineering, pp. 187–193. IEEE (1999). https://doi.org/10.1109/WCRE.1999.806959

  72. Tzerpos, V., Holt, R.C.: Software botryology. Automatic clustering of software systems. In: International Workshop on Database and Expert Systems Applications, pp. 811–818. IEEE (1998). https://doi.org/10.1109/DEXA.1998.707499

  73. Tzerpos, V.: An optimal algorithm for MoJo distance. In: Proceedings of the 11th IEEE International Workshop on Program Comprehension, pp. 227–235. IEEE Computer Society (2003). https://doi.org/10.1109/WPC.2003.1199206

  74. Vasconcelos, A., Werner, C.: Architecture recovery and evaluation aiming at. In: Software Architectures, Components, and Applications, pp. 72–89. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-77619-2_5

    Google Scholar 

  75. Wang, Y., Liu, P., Guo, H., Li, H., Chen, X.: Improved hierarchical clustering algorithm for software architecture recovery. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics, pp. 247–250 (2010). https://doi.org/10.1109/ICICCI.2010.45

  76. Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004, pp. 194–203. IEEE (2004). https://doi.org/10.1109/WPC.2004.1311061

  77. Wen, Z., Tzerpos, V.: Evaluating similarity measures for software decompositions. In: Proceedings of the 20th IEEE International Conference on Software Maintenance, pp. 368–377. IEEE (2004). https://doi.org/10.1109/ICSM.2004.1357822

  78. Wiggerts, T.: Using clustering algorithms in legacy systems remodularization. In: Working Conference on Reverse Engineering, pp. 33–43. IEEE (1997). https://doi.org/10.1109/WCRE.1997.624574

  79. Wu, J., Hassan, A., Holt, R.C.: Comparison of clustering algorithms in the context of software evolution. In: 21st IEEE International Conference on Software Maintenance, pp. 525–535. IEEE (2005). https://doi.org/10.1109/ICSM.2005.31

  80. Xanthos, S., Goodwin, N.: Clustering object-oriented software systems using spectral graph partitioning. Urbana 51(1), 1–5 (2006)

    Google Scholar 

  81. Zheng, L.I., Li, T.A.O., Ding, C.: A framework for hierarchical ensemble clustering. ACM Trans. Knowl. Discov. Data 9(2), 1–23 (2014)

    Article  Google Scholar 

  82. Zhong, L., Xue, L., Zhang, N., Xia, J., Chen, J.: A tool to support software clustering using the software evolution information. In: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 304–307. IEEE (2016). https://doi.org/10.1109/ICSESS.2016.7883072

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rashid Naseem.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naseem, R., Deris, M.M., Maqbool, O. et al. Euclidean space based hierarchical clusterers combinations: an application to software clustering. Cluster Comput 22 (Suppl 3), 7287–7311 (2019). https://doi.org/10.1007/s10586-017-1408-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1408-0

Keywords

Navigation