Skip to main content
Log in

Exploiting second-order dissimilarity representations for hierarchical clustering and visualization

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The representation of objects is crucial for the learning process, often having a large impact on the application performance. The dissimilarity space (DS) is one of such representations, which is built by applying a dissimilarity measure between objects (e.g., Euclidean distance). However, other measures can be applied to generate more informative data representations. This paper focuses on the application of second-order dissimilarity measures, namely the Shared Nearest Neighbor (SNN) and the Dissimilarity Increments (Dinc), to produce new DSs that lead to a better description of the data, by reducing the overlap of the classes and by increasing the discriminative power of features. Experimental results show that the application of the proposed DSs provide significant benefits for unsupervised learning tasks. When compared with Feature and Euclidean space, the proposed SNN and Dinc spaces allow improving the performance of traditional hierarchical clustering algorithms, and also help in the visualization task, by leading to higher area under the precision/recall curve values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Similarity-based Pattern Analysis and Recognition project: http://simbad-fp7.eu.

  2. http://archive.ics.uci.edu/ml.

  3. From the package Python gower: https://pypi.org/project/gower/.

References

  • Aidos H, Fred A (2012) Statistical modeling of dissimilarity increments for \(d\)-dimensional data: application in partitional clustering. Pattern Recogn 45(9):3061–3071

    Article  Google Scholar 

  • Aidos H, Fred A (2015a) Consensus of clusterings based on high-order dissimilarities. In: Partitional clustering algorithms, pp 311–349. Springer

  • Aidos H, Fred A (2015b) A novel data representation based on dissimilarity increments. In: Proceedings international workshop of similarity-based pattern recognition(SIMBAD), pp 1–14

  • Aidos H, Fred A, Duin R (2012) Classification using high order dissimilarities in non-euclidean spaces. In: Proceedings of the international conference on pattern recognition applications and methods (ICPRAM), pp 306–309

  • Batista D, Aidos H, Fred A, Santos J, Ferreira RC, das Neves RC (2018) Protecting the ECG signal in cloud-based user identification system: a dissimilarity representation approach. In: Proceedings of the international joint conference on biomedical engineering systems and technologies (BIOSTEC) vol 4, pp 78–86

  • Batista L, Granger E, Sabourin R (2010) Applying dissimilarity representation to off-line signature verification. In: International conference on pattern recognition (ICPR), pp 1433–1436

  • Baydogan MG, Runger G (2016) Time series representation and similarity based on local autopatterns. Data Min Knowl Disc 30(2):476–509

    Article  MathSciNet  Google Scholar 

  • Bicego M (2005) Odor classification using similarity-based representation. Sens Actuat B Chem 110(2):225–230

    Article  Google Scholar 

  • Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2016) Prototype generation on structural data using dissimilarity space representation. Neural Comput Appl, pp 1–10

  • Cao H, Bernard S, Heutte L, Sabourin R (2018) Dissimilarity-based representation for radiomics applications. arXiv preprint arXiv:1803.04460

  • Chen Y, Garcia EK, Gupta MR, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776

    MathSciNet  MATH  Google Scholar 

  • Cheplygina V, Tax DMJ, Loog M (2016) Dissimilarity-based ensembles for multiple instance learning. IEEE Trans Neural Netw Learn Syst 27(6):1379–1391

    Article  Google Scholar 

  • De Santis E, Martino A, Rizzi A, Mascioli FMF (2018) Dissimilarity space representations and automatic feature selection for protein function prediction. In: 2018 International joint conference on neural networks (IJCNN), pp 1–8

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley

  • Duin R, Pekalska E (2010) Non-Euclidean dissimilarities: causes and informativeness. In: Proceedings joint IAPR international workshop (SSPR/SPR) structural, syntactic, and statistical pattern recognition, pp 324–333

  • Duin RPW, Bicego M, Orozco-Alzate M, Kim S-W, Loog M (2014) Metric learning in dissimilarity space for improved nearest neighbor performance. In: Structural, syntactic, and statistical pattern recognition—proceedings joint IAPR international workshops (SSPR/SPR)

  • Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different size, shape, and densities in noisy high dimensional data. In: Proceedings of the SIAM international conference on data mining (SDM), pp 47–58

  • Eskander GS, Sabourin R, Granger E (2013) Dissimilarity representation for handwritten signature verification. In: Proceedings of the international workshop on automated forensic handwriting analysis: a satellite workshop of international conference on document analysis and recognition (AFHA), pp 26–30

  • Fred A (2001) Finding consistent clusters in data partitions. In: Proceedings international workshop multiple classifier systems (MCS), pp 309–318

  • Fred A, Leitão J (2003) A new cluster isolation criterion based on dissimilarity increments. IEEE Trans Pattern Anal Mach Intell 25(8):944–958

    Article  Google Scholar 

  • García S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435

    Article  Google Scholar 

  • Ho TK, Basu M, Law MHC (2006) Measures of geometrical complexity in classification problems. In: Data complexity in pattern recognition, pp 3–23. Springer

  • Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  • Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 22(11):1025–1034

    Article  Google Scholar 

  • Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 538–543

  • Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Advances in knowledge discovery and data mining, Pacific-Asia conference (PAKDD), pp 577–593

  • Kaski S, Nikkilä J, Ojo M, Venna J, Törönen P, Castrén E (2003) Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinform 4(1):48

    Article  Google Scholar 

  • Lee JA, Verleysen M (2010) Unsupervised dimensionality reduction: overview and recent advances. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8

  • Liao L, Noble WS (2003) Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J Comput Biol 10(6):857–868

    Article  Google Scholar 

  • Livi L (2017) Designing labeled graph classifiers by exploiting the rényi entropy of the dissimilarity representation. Entropy 19(5):216–241

    Article  Google Scholar 

  • Marques F, Carreiras C, Lourenço A, Fred A, Ferreira R (2015) ECG biometrcis using a dissimilarity space representation. In: Proceedings of the international conference on bio-inspired systems and signal processing (BIOSIGNALS), pp 350–359

  • Martins JG, Oliveira LS, Britto AS Jr, Sabourin R (2015) Forest species recognition based on dynamic classifier selection and dissimilarity feature vector representation. Mach Vis Appl 26(2):279–293

    Article  Google Scholar 

  • Moreno JG, Dias G, Cleuziou G (2013) Post-retrieval clustering using third-order similarity measures. In: Proceedings of the annual meeting of the association for computational linguistics (ACl), pp 153–158

  • Orozco-Alzate M, Duin R, Castellanos-Domínguez G (2009) A generalization of dissimilarity representations using feature lines and feature planes. Pattern Recogn 30(3):242–254

    Article  Google Scholar 

  • Orozco-Alzate M, Castro-Cabrera PA, Bicego M, Londoño-Bonilla JM (2015) The DTW-based representation space for seismic pattern classification. Comput Geosci

  • Pekalska E, Duin RPW (2002) Dissimilarity representations allow for building good classifiers. Pattern Recogn Lett 23:943–956

    Article  Google Scholar 

  • Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. World Scientific Pub Co Inc

  • Pelillo M (ed) (2013) Similarity-based pattern analysis and recognition. Springer

  • Pinheiro RHW, Cavalcanti GDC, Tsang IR (2017) Combining dissimilarity spaces for text categorization. Inf Sci 406–407:87–101

    Article  Google Scholar 

  • Plasencia-Calaña Y, Cheplygina V, Duin RPW, García-Reyes E, Orozco-Alzate M, Tax DMJ, Loog M (2013) On the informativeness of asymmetric dissimilarities. In: Similarity-based pattern recognition - proceedings international workshop (SIMBAD), pp 75–89

  • Pohle T, Knees P, Schedl M, Widmer G (2006) Automatically adapting the structure of audio similarity spaces. In: Proceedings of the workshop on learning the semantics of audio signals (LSAS), pp 66–75

  • Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. In: AAAI. http://networkrepository.com

  • Satta R, Fumera G, Roli F (2012) Fast person re-identification based on dissimilarity representations. Pattern Recogn Lett 33:1838–1848

    Article  Google Scholar 

  • Schleif F-M, Zhu X, Hammer B (2012) A conformal classifier for dissimilarity data. AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB. In: Artificial intelligence applications and innovations - AIAI international workshops, pp 234–243

  • Tavenard R, Faouzi J, Vandewiele G, Divo F, Androz G, Holtz C, Payne M, Yurchak R, Rußwurm M, Kolar K, Woods E (2020) Tslearn, a machine learning toolkit for time series data. J Mach Learn Res 21(118):1–6

    MATH  Google Scholar 

  • Theodorakopoulos I, Kastaniotis D, Economou G, Fotopoulos S (2014) Pose-based human recognition via sparse representation in dissimilarity space. J Vis Commun Image Represent 25(1):12–23

    Article  Google Scholar 

  • Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Elsevier Academic Press

  • Ulas A, Duin RPW, Castellani U, Loog M, Mirtuono P, Bicego M, Murino V, Bellani M, Cerruti S, Tansella M, Brambilla P (2011) Dissimilarity-based detection of schizophrenia. Int J Imaging Syst Technol 21(2):179–192

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by Fundação para a Ciência e a Tecnologia (FCT) through project AIpALS, ref. PTDC/CCI-CIF/4613/2020, and the LASIGE Research Unit, ref. UIDB/00408/2020 and ref. UIDP/00408/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helena Aidos.

Additional information

Responsible editor: Johannes Fürnkranz, Ian Davidson.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aidos, H. Exploiting second-order dissimilarity representations for hierarchical clustering and visualization. Data Min Knowl Disc 36, 1371–1400 (2022). https://doi.org/10.1007/s10618-022-00836-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-022-00836-1

Keywords

Navigation