Abstract
Hierarchical clustering groups similar entities on the basis of some similarity (or distance) association and results in a tree like structure, called dendrogram. Dendrograms represent clusters in a nested manner, where at each step an entity makes a new cluster or merges into an existing cluster. Hierarchical clustering has many applications, therefore researchers have made efforts to come up with improved hierarchical clustering approaches. An approach that has received attention is based on combining clustering results, since different hierarchical clustering algorithms produce different dendrograms and their combination has produced more promising results as compared to individual hierarchical clustering. This paper proposes the hierarchical clustering combination (HCC) approach which uses the different types of structural features present in the dendrogram. Firstly, the dendrograms are represented in a 4+N (4 is the extracted number of features and can be extended to N number) dimensional euclidean space (4+NDES) which results in vector matrices. 4+NDES is the structural representation of the dendrogram which contains not only the relative features but also the absolute features of the entities in the dendrogram. Then the vector matrices are aggregated and the distance is calculated between each two vector using the Euclidean distance measure. The final hierarchy is obtained using a recovery tool like individual hierarchical clustering. 4+NDES-HCC utilizes the structural contents of the dendrogram and has the flexibility to handle an increasing number of features. The proposed approach is tested for software clustering which plays an important role in maintenance of software systems. The experimental results of the proposed approach and comparative analysis with existing approaches reveal the effectiveness of the HCC for software clustering.
Similar content being viewed by others
Notes
-
First Cluster: (t,u)
-
Second Cluster: ((t,u)v)
-
Third Cluster: (((t,u)v)w)
-
Fourth Cluster: ((((t,u)v)w)x)
-
Fifth Cluster: (((((t,u)v)w)x)(y,z))
-
Combination can be calculated using following formula: \(\left( {\begin{array}{c}n\\ k\end{array}}\right) ={}^{n}C_{k}=\frac{n!}{k!(n-k)!}\) where n is the number of clusterers and k is number of choice.
References
Abi-Antoun, M., Ammar, N., Hailat, Z.: Extraction of ownership object graphs from object-oriented code. In: Proceedings of the 8th International ACM SIGSOFT Conference on Quality of Software Architectures—QoSA ’12, p. 133. ACM Press, New York (2012). https://doi.org/10.1145/2304696.2304719
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)
Amarjeet, Chhabra, J.K.: Harmony search based remodularization for object-oriented software systems. Comput. Lang. Syst. Struct. 47, 153–169 (2017). https://doi.org/10.1016/j.cl.2016.09.003
Amarjeet, Chhabra, J.K.: TA-ABC: two-archive artificial bee colony for multi-objective software module clustering problem. J. Intell. Syst. (2017). https://doi.org/10.1515/jisys-2016-0253
Andritsos, P., Tzerpos, V.: Information-theoretic software clustering. IEEE Trans. Softw. Eng. 31(2), 150–165 (2005). https://doi.org/10.1109/TSE.2005.25
Anquetil, N., Lethbridge, T.C.: Experiments with clustering as a software remodularization method. In: Working Conference on Reverse Engineering, pp. 235–255. IEEE (1999). https://doi.org/10.1109/WCRE.1999.806964
Anquetil, N., Lethbridge, T.C.: Comparative study of clustering algorithms and abstract representations for software remodularisation. IEE Proc. Softw. 150(3), 185–201 (2003). https://doi.org/10.1049/ip-sen:20030581
Bittencourt, R.A., Guerrero, D.D.S.: Comparison of graph clustering algorithms for recovering software architecture module views. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 251–254. IEEE (2009). https://doi.org/10.1109/CSMR.2009.28
Candela, I., Bavota, G., Russo, B., Oliveto, R.: Using cohesion and coupling for software remodularization. ACM Trans. Softw. Eng. Methodol. 25(3), 1–28 (2016). https://doi.org/10.1145/2928268
Choi, S.S., Sung-Hyuk, C., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8(1), 43–48 (2010). http://ezproxy.uthm.edu.my/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=59856128&site=ehost-live&scope=site
Chong, C.Y., Lee, S.P., Ling, T.C.: Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach. Inf. Softw. Technol. 55(11), 1994–2012 (2013). https://doi.org/10.1016/j.infsof.2013.07.002
Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering, pp. 88–96. IEEE (2010). https://doi.org/10.1109/CSMR.2010.36
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.: Weighing lexical information for software clustering in the context of architecture recovery. Empir. Softw. Eng. 21(1), 72–103 (2016). https://doi.org/10.1007/s10664-014-9347-3
Cui, J.F., Chae, H.S.: Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Inf. Softw. Technol. 53(6), 601–614 (2011). https://doi.org/10.1016/j.infsof.2011.01.006
Davey, J., Burd, E.: Evaluating the suitability of data clustering for software remodularisation. In: Working Conference on Reverse Engineering, pp. 268–276. IEEE (2000). https://doi.org/10.1109/WCRE.2000.891478
Deursen, A.V., Kuipers, T.: Finding classes in legacy code using cluster analysis. In: Workshop on Object Oriented Reengineering, pp. 1–5 (1997)
Dugerdil, P., Jossi, S.: Reverse-architecting legacy software based on roles: an industrial experiment. In: Software and Data Technologies, pp. 114–127. Springer, Berlin (2009). https://doi.org/10.1007/978-3-540-88655-6_9
El-Ramly, M., Iglinski, P., Stroulia, E., Sorenson, P., Matichuk, B.: Modeling the system-user dialog using interaction traces. In: Proceedings of the Eighth Working Conference on Reverse Engineering, pp. 208–217 (2001). https://doi.org/10.1109/WCRE.2001.957825
François-Joseph Lapointe, P.L.: Comparison tests for dendrograms: a comparative evaluation. J. Classif. 12(2), 265–282 (1995). https://doi.org/10.1007/BF03040858
Garcia, J., Ivkovic, I., Medvidovic, N.: A comparative analysis of software architecture recovery techniques. In: IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 486–496. IEEE (2013). https://doi.org/10.1109/ASE.2013.6693106
Glorie, M., Zaidman, A., van Deursen, A., Hofland, L.: Splitting a large software repository for easing future software evolution-an industrial experience report. J. Softw. Maint. Evol. Res. Pract. 21(2), 113–141 (2009). https://doi.org/10.1002/smr.401
Gueheneuc, Y.G., Antoniol, G.: DeMIMA: a multilayered approach for design pattern identification. IEEE Trans. Softw. Eng. 34(5), 667–684 (2008). https://doi.org/10.1109/TSE.2008.48
Hall, M., Walkinshaw, N., McMinn, P.: Supervised software modularisation. In: IEEE International Conference on Software Maintenance (ICSM), pp. 472–481. IEEE (2012). https://doi.org/10.1109/ICSM.2012.6405309
Hoffman, K.: Analysis in Euclidean Space. Courier Corporation, Mineola (2013)
Huang, J., Liu, J., Yao, X.: A multi-agent evolutionary algorithm for software module clustering problems. Soft Comput. 21(12), 3415–3428 (2017). https://doi.org/10.1007/s00500-015-2018-5
Ibrahim, A., Rayside, D., Kashef, R.: Cooperative based software clustering on dependency graphs. In: Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–6. IEEE, Canada (2014). https://doi.org/10.1109/CCECE.2014.6900911
Izadkhah, H., Elgedawy, I., Isazadeh, A.: E-CDGM: an evolutionary call-dependency graph modularization approach for software systems. Cybern. Inf. Technol. 16(3), 70–90 (2016). https://doi.org/10.1515/cait-2016-0035
Jahnke, J.: Reverse engineering software architecture using rough clusters. In: IEEE Annual Meeting of the Fuzzy Information, vol. 1, pp. 4–9. IEEE (2004). https://doi.org/10.1109/NAFIPS.2004.1336239
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967). https://doi.org/10.1007/BF02289588
Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C.: k-Attractors: a clustering algorithm for software measurement data analysis. In: IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), pp. 358–365. IEEE (2007). https://doi.org/10.1109/ICTAI.2007.31
Kashef, R.F., Kamel, M.S.: Cooperative clustering. Pattern Recogn. 43(6), 2315–2329 (2010). https://doi.org/10.1016/j.patcog.2009.12.018
Koschke, R., Eisenbarth, T.: A framework for experimental evaluation of clustering techniques. In: International Workshop on Program Comprehension, pp. 201–210. IEEE Computer Society (2000). https://doi.org/10.1109/WPC.2000.852494
Kramer, H.H., Uchoa, E., Fampa, M., Köhler, V., Vanderbeck, F.: Column generation approaches for the software clustering problem. Comput. Optim. Appl. 64(3), 843–864 (2016). https://doi.org/10.1007/s10589-015-9822-9
Kumari, A.C., Srinivas, K.: Hyper-heuristic approach for multi-objective software module clustering. J. Syst. Softw. 117, 384–401 (2016). https://doi.org/10.1016/j.jss.2016.04.007
Lakhotia, A.: A unified framework for expressing software subsystem classification techniques. J. Syst. Softw. 36(3), 211–231 (1997). https://doi.org/10.1016/0164-1212(95)00098-4
Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: IEEE International Conference on Software Engineering (ICSE), vol. 2, pp. 69–78. IEEE, ACM, USA, Canada (2015). https://doi.org/10.1109/ICSE.2015.136
Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Measuring the impact of code dependencies on software architecture recovery techniques. IEEE Trans. Softw. Eng. (2017). https://doi.org/10.1109/TSE.2017.2671865
Lutellier, T.: Measuring the impact of code dependencies on software architecture recovery techniques. Ph.D. thesis, University of Waterloo (2015)
Mahmoud, A., Niu, N.: Evaluating software clustering algorithms in the context of program comprehension. In: International Conference on Program Comprehension (ICPC), pp. 162–171. IEEE, USA (2013). https://doi.org/10.1109/ICPC.2013.6613844
Maqbool, O., Babri, H.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Eighth European Conference on Software Maintenance and Reengineering, pp. 15–24. IEEE (2004). https://doi.org/10.1109/CSMR.2004.1281402
Maqbool, O., Babri, H.: Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng. 33(11), 759–780 (2007). https://doi.org/10.1109/TSE.2007.70732
Mirzaei, A., Rahmati, M.: A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations. IEEE Trans. Fuzzy Syst. 18(1), 27–39 (2010). https://doi.org/10.1109/TFUZZ.2009.2034531
Mirzaei, A., Rahmati, M., Ahmadi, M.: A new method for hierarchical clustering combination. Intell. Data Anal. 12, 549–571 (2008)
Mitchell, B.S., Mancoridis, S.: Comparing the decompositions produced by software clustering algorithms using similarity measurements. In: International Conference on Software Maintenance, pp. 744–753. IEEE (2001). https://doi.org/10.1109/ICSM.2001.972795
Mitchell, B.S., Mancoridis, S.: On the automatic modularization of software systems using the Bunch tool. IEEE Trans. Softw. Eng. 32(3), 193–208 (2006). https://doi.org/10.1109/TSE.2006.31
Muhammad, S., Maqbool, O., Abbasi, A.Q.: Evaluating relationship categories for clustering object-oriented software systems. IET Softw. 6(3), 260 (2012). https://doi.org/10.1049/iet-sen.2011.0061
Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983). https://doi.org/10.1093/comjnl/26.4.354
Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, pp. 111–116. IEEE (2010). https://doi.org/10.1109/CIMSiM.2010.34
Naseem, R., Maqbool, O., Muhammad, S.: Improved similarity measures for software clustering. In: European Conference on Software Maintenance and Reengineering (CSMR), pp. 45–54. IEEE, Pakistan (2011). https://doi.org/10.1109/CSMR.2011.9
Naseem, R., Maqbool, O., Muhammad, S.: Cooperative clustering for software modularization. J. Syst. Softw. 86(8), 2045–2062 (2013). https://doi.org/10.1016/j.jss.2013.03.080
Naseem, R., Deris, M.B.M., Li, J., Shahzad, S.: Improved binary similarity measures for software modularization. Front. Inf. Technol. Electron. Eng. 18(8), 1–28 (2017)
Patel, C., Hamou-Lhadj, A., Rilling, J.: Software clustering using dynamic analysis and static dependencies. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 27–36. IEEE (2009). https://doi.org/10.1109/CSMR.2009.62
Paulson, J., Succi, G., Eberlein, A.: An empirical study of open-source and closed-source software products. IEEE Trans. Softw. Eng. 30(4), 246–256 (2004). https://doi.org/10.1109/TSE.2004.1274044
Podani, J.: Simulation of random dendrograms and comparison tests: some comments. J. Classif. 17(1), 123–142 (2000). https://doi.org/10.1007/s003570000007
Praditwong, K., Harman, M., Yao, X.: Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng. 37(2), 264–282 (2011). https://doi.org/10.1109/TSE.2010.26
Pukelsheim, F.: The three sigma rule. Am. Stat. 48(2), 88–91 (1994). http://www.jstor.org/stable/2684253
Rashedi, E., Mirzaei, A.: A hierarchical clusterer ensemble method based on boosting theory. Knowl. Based Syst. 45, 83–93 (2013). https://doi.org/10.1016/j.knosys.2013.02.009
Rashedi, E., Mirzaei, A., Rahmati, M.: An information theoretic approach to hierarchical clustering combination. Neurocomputing 148, 487–497 (2015). https://doi.org/10.1016/j.neucom.2014.07.014
Saeed, M., Maqbool, O., Babri, H., Hassan, S., Sarwar, S.: Software clustering techniques and the use of combined algorithm. In: Seventh European Conference on Software Maintenance and Reengineering, pp. 301–306. IEEE Computer Society (2003). https://doi.org/10.1109/CSMR.2003.1192438
Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and K-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods, pp. 103–112. IEEE (2010). https://doi.org/10.1109/SEFM.2010.19
Seriai, A., Sadou, S., Sahraoui, H.A.: Enactment of components extracted from an object-oriented application. In: The European Conference on Software Architecture (ECSA), pp. 234–249 (2014). https://doi.org/10.1007/978-3-319-09970-5_22
Shah, Z., Naseem, R., Orgun, M., Mahmood, A.N., Shahzad, S.: Software clustering using automated feature subset selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) International Conference on Advanced Data Mining and Applications (ADMA). Lecture Notes in Computer Science, vol. 8347, pp. 47–58. Springer, Italy (2013). https://doi.org/10.1007/978-3-642-53917-6_5
Shtern, M., Tzerpos, V.: On the comparability of software clustering algorithms. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 64–67. IEEE (2010). https://doi.org/10.1109/ICPC.2010.25
Siddique, F., Maqbool, O.: Enhancing comprehensibility of software clustering results. IET Softw. 6(4), 283 (2012). https://doi.org/10.1049/iet-sen.2012.0027
Stavropoulou, I., Grigoriou, M., Kontogiannis, K.: Case study on which relations to use for clustering-based software architecture recovery. Empir. Softw. Eng. 2017, 1–46 (2017). https://doi.org/10.1007/s10664-016-9459-z
Synytskyy, N., Holt, R.C., Davis, I.: Browsing software architectures with LSEdit. In: 13th International Workshop on Program Comprehension, pp. 176–178. IEEE (2005). https://doi.org/10.1109/WPC.2005.11
Tonella, P.: Concept analysis for module restructuring. IEEE Trans. Softw. Eng. 27(4), 351–363 (2001). https://doi.org/10.1109/32.917524
Tzerpos, V., Holt, R.C.: ACDC: an algorithm for comprehension-driven clustering. In: Working Conference on Reverse Engineering, pp. 258–267. IEEE (2000). https://doi.org/10.1109/WCRE.2000.891477
Tzerpos, V., Holt, R.C.: MoJo: a distance metric for software clusterings. In: Working Conference on Reverse Engineering, pp. 187–193. IEEE (1999). https://doi.org/10.1109/WCRE.1999.806959
Tzerpos, V., Holt, R.C.: Software botryology. Automatic clustering of software systems. In: International Workshop on Database and Expert Systems Applications, pp. 811–818. IEEE (1998). https://doi.org/10.1109/DEXA.1998.707499
Tzerpos, V.: An optimal algorithm for MoJo distance. In: Proceedings of the 11th IEEE International Workshop on Program Comprehension, pp. 227–235. IEEE Computer Society (2003). https://doi.org/10.1109/WPC.2003.1199206
Vasconcelos, A., Werner, C.: Architecture recovery and evaluation aiming at. In: Software Architectures, Components, and Applications, pp. 72–89. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-77619-2_5
Wang, Y., Liu, P., Guo, H., Li, H., Chen, X.: Improved hierarchical clustering algorithm for software architecture recovery. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics, pp. 247–250 (2010). https://doi.org/10.1109/ICICCI.2010.45
Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004, pp. 194–203. IEEE (2004). https://doi.org/10.1109/WPC.2004.1311061
Wen, Z., Tzerpos, V.: Evaluating similarity measures for software decompositions. In: Proceedings of the 20th IEEE International Conference on Software Maintenance, pp. 368–377. IEEE (2004). https://doi.org/10.1109/ICSM.2004.1357822
Wiggerts, T.: Using clustering algorithms in legacy systems remodularization. In: Working Conference on Reverse Engineering, pp. 33–43. IEEE (1997). https://doi.org/10.1109/WCRE.1997.624574
Wu, J., Hassan, A., Holt, R.C.: Comparison of clustering algorithms in the context of software evolution. In: 21st IEEE International Conference on Software Maintenance, pp. 525–535. IEEE (2005). https://doi.org/10.1109/ICSM.2005.31
Xanthos, S., Goodwin, N.: Clustering object-oriented software systems using spectral graph partitioning. Urbana 51(1), 1–5 (2006)
Zheng, L.I., Li, T.A.O., Ding, C.: A framework for hierarchical ensemble clustering. ACM Trans. Knowl. Discov. Data 9(2), 1–23 (2014)
Zhong, L., Xue, L., Zhang, N., Xia, J., Chen, J.: A tool to support software clustering using the software evolution information. In: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 304–307. IEEE (2016). https://doi.org/10.1109/ICSESS.2016.7883072
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Naseem, R., Deris, M.M., Maqbool, O. et al. Euclidean space based hierarchical clusterers combinations: an application to software clustering. Cluster Comput 22 (Suppl 3), 7287–7311 (2019). https://doi.org/10.1007/s10586-017-1408-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1408-0