Abstract
The representation of objects is crucial for the learning process, often having a large impact on the application performance. The dissimilarity space (DS) is one of such representations, which is built by applying a dissimilarity measure between objects (e.g., Euclidean distance). However, other measures can be applied to generate more informative data representations. This paper focuses on the application of second-order dissimilarity measures, namely the Shared Nearest Neighbor (SNN) and the Dissimilarity Increments (Dinc), to produce new DSs that lead to a better description of the data, by reducing the overlap of the classes and by increasing the discriminative power of features. Experimental results show that the application of the proposed DSs provide significant benefits for unsupervised learning tasks. When compared with Feature and Euclidean space, the proposed SNN and Dinc spaces allow improving the performance of traditional hierarchical clustering algorithms, and also help in the visualization task, by leading to higher area under the precision/recall curve values.
Similar content being viewed by others
Notes
Similarity-based Pattern Analysis and Recognition project: http://simbad-fp7.eu.
From the package Python gower: https://pypi.org/project/gower/.
References
Aidos H, Fred A (2012) Statistical modeling of dissimilarity increments for \(d\)-dimensional data: application in partitional clustering. Pattern Recogn 45(9):3061–3071
Aidos H, Fred A (2015a) Consensus of clusterings based on high-order dissimilarities. In: Partitional clustering algorithms, pp 311–349. Springer
Aidos H, Fred A (2015b) A novel data representation based on dissimilarity increments. In: Proceedings international workshop of similarity-based pattern recognition(SIMBAD), pp 1–14
Aidos H, Fred A, Duin R (2012) Classification using high order dissimilarities in non-euclidean spaces. In: Proceedings of the international conference on pattern recognition applications and methods (ICPRAM), pp 306–309
Batista D, Aidos H, Fred A, Santos J, Ferreira RC, das Neves RC (2018) Protecting the ECG signal in cloud-based user identification system: a dissimilarity representation approach. In: Proceedings of the international joint conference on biomedical engineering systems and technologies (BIOSTEC) vol 4, pp 78–86
Batista L, Granger E, Sabourin R (2010) Applying dissimilarity representation to off-line signature verification. In: International conference on pattern recognition (ICPR), pp 1433–1436
Baydogan MG, Runger G (2016) Time series representation and similarity based on local autopatterns. Data Min Knowl Disc 30(2):476–509
Bicego M (2005) Odor classification using similarity-based representation. Sens Actuat B Chem 110(2):225–230
Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2016) Prototype generation on structural data using dissimilarity space representation. Neural Comput Appl, pp 1–10
Cao H, Bernard S, Heutte L, Sabourin R (2018) Dissimilarity-based representation for radiomics applications. arXiv preprint arXiv:1803.04460
Chen Y, Garcia EK, Gupta MR, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776
Cheplygina V, Tax DMJ, Loog M (2016) Dissimilarity-based ensembles for multiple instance learning. IEEE Trans Neural Netw Learn Syst 27(6):1379–1391
De Santis E, Martino A, Rizzi A, Mascioli FMF (2018) Dissimilarity space representations and automatic feature selection for protein function prediction. In: 2018 International joint conference on neural networks (IJCNN), pp 1–8
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley
Duin R, Pekalska E (2010) Non-Euclidean dissimilarities: causes and informativeness. In: Proceedings joint IAPR international workshop (SSPR/SPR) structural, syntactic, and statistical pattern recognition, pp 324–333
Duin RPW, Bicego M, Orozco-Alzate M, Kim S-W, Loog M (2014) Metric learning in dissimilarity space for improved nearest neighbor performance. In: Structural, syntactic, and statistical pattern recognition—proceedings joint IAPR international workshops (SSPR/SPR)
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different size, shape, and densities in noisy high dimensional data. In: Proceedings of the SIAM international conference on data mining (SDM), pp 47–58
Eskander GS, Sabourin R, Granger E (2013) Dissimilarity representation for handwritten signature verification. In: Proceedings of the international workshop on automated forensic handwriting analysis: a satellite workshop of international conference on document analysis and recognition (AFHA), pp 26–30
Fred A (2001) Finding consistent clusters in data partitions. In: Proceedings international workshop multiple classifier systems (MCS), pp 309–318
Fred A, Leitão J (2003) A new cluster isolation criterion based on dissimilarity increments. IEEE Trans Pattern Anal Mach Intell 25(8):944–958
García S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435
Ho TK, Basu M, Law MHC (2006) Measures of geometrical complexity in classification problems. In: Data complexity in pattern recognition, pp 3–23. Springer
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 22(11):1025–1034
Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 538–543
Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Advances in knowledge discovery and data mining, Pacific-Asia conference (PAKDD), pp 577–593
Kaski S, Nikkilä J, Ojo M, Venna J, Törönen P, Castrén E (2003) Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinform 4(1):48
Lee JA, Verleysen M (2010) Unsupervised dimensionality reduction: overview and recent advances. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8
Liao L, Noble WS (2003) Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J Comput Biol 10(6):857–868
Livi L (2017) Designing labeled graph classifiers by exploiting the rényi entropy of the dissimilarity representation. Entropy 19(5):216–241
Marques F, Carreiras C, Lourenço A, Fred A, Ferreira R (2015) ECG biometrcis using a dissimilarity space representation. In: Proceedings of the international conference on bio-inspired systems and signal processing (BIOSIGNALS), pp 350–359
Martins JG, Oliveira LS, Britto AS Jr, Sabourin R (2015) Forest species recognition based on dynamic classifier selection and dissimilarity feature vector representation. Mach Vis Appl 26(2):279–293
Moreno JG, Dias G, Cleuziou G (2013) Post-retrieval clustering using third-order similarity measures. In: Proceedings of the annual meeting of the association for computational linguistics (ACl), pp 153–158
Orozco-Alzate M, Duin R, Castellanos-Domínguez G (2009) A generalization of dissimilarity representations using feature lines and feature planes. Pattern Recogn 30(3):242–254
Orozco-Alzate M, Castro-Cabrera PA, Bicego M, Londoño-Bonilla JM (2015) The DTW-based representation space for seismic pattern classification. Comput Geosci
Pekalska E, Duin RPW (2002) Dissimilarity representations allow for building good classifiers. Pattern Recogn Lett 23:943–956
Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. World Scientific Pub Co Inc
Pelillo M (ed) (2013) Similarity-based pattern analysis and recognition. Springer
Pinheiro RHW, Cavalcanti GDC, Tsang IR (2017) Combining dissimilarity spaces for text categorization. Inf Sci 406–407:87–101
Plasencia-Calaña Y, Cheplygina V, Duin RPW, García-Reyes E, Orozco-Alzate M, Tax DMJ, Loog M (2013) On the informativeness of asymmetric dissimilarities. In: Similarity-based pattern recognition - proceedings international workshop (SIMBAD), pp 75–89
Pohle T, Knees P, Schedl M, Widmer G (2006) Automatically adapting the structure of audio similarity spaces. In: Proceedings of the workshop on learning the semantics of audio signals (LSAS), pp 66–75
Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. In: AAAI. http://networkrepository.com
Satta R, Fumera G, Roli F (2012) Fast person re-identification based on dissimilarity representations. Pattern Recogn Lett 33:1838–1848
Schleif F-M, Zhu X, Hammer B (2012) A conformal classifier for dissimilarity data. AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB. In: Artificial intelligence applications and innovations - AIAI international workshops, pp 234–243
Tavenard R, Faouzi J, Vandewiele G, Divo F, Androz G, Holtz C, Payne M, Yurchak R, Rußwurm M, Kolar K, Woods E (2020) Tslearn, a machine learning toolkit for time series data. J Mach Learn Res 21(118):1–6
Theodorakopoulos I, Kastaniotis D, Economou G, Fotopoulos S (2014) Pose-based human recognition via sparse representation in dissimilarity space. J Vis Commun Image Represent 25(1):12–23
Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Elsevier Academic Press
Ulas A, Duin RPW, Castellani U, Loog M, Mirtuono P, Bicego M, Murino V, Bellani M, Cerruti S, Tansella M, Brambilla P (2011) Dissimilarity-based detection of schizophrenia. Int J Imaging Syst Technol 21(2):179–192
Acknowledgements
This work was partially supported by Fundação para a Ciência e a Tecnologia (FCT) through project AIpALS, ref. PTDC/CCI-CIF/4613/2020, and the LASIGE Research Unit, ref. UIDB/00408/2020 and ref. UIDP/00408/2020.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz, Ian Davidson.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aidos, H. Exploiting second-order dissimilarity representations for hierarchical clustering and visualization. Data Min Knowl Disc 36, 1371–1400 (2022). https://doi.org/10.1007/s10618-022-00836-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-022-00836-1