Skip to main content
Log in

Chain-detection Between Clusters

  • Schwerpunktbeitrag
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

Chains connecting two or more different clusters are a well known problem of clustering algorithms like DBSCAN or Single Linkage Clustering. Since already a small number of points resulting from, e. g., noise can form such a chain and build a bridge between different clusters, it can happen that the results of the clustering algorithm are distorted: several disparate clusters get merged into one. This single-link effect is rather known but to the best of our knowledge there are no satisfying solutions which extract those chains, yet. We present a new algorithm detecting not only straight chains between clusters, but also bent and noisy ones. Users are able to choose between eliminating one dimensional and higher dimensional chains connecting clusters to receive the underlying cluster structure. Also, the desired straightness can be set by the user. As this paper is an extension of [8], we apply our technique not only in combination with DBSCAN but also with single link hierarchical clustering. On a real world dataset containing traffic accidents in Great Britain we were able to detect chains emerging from streets between cities and villages, which led to clusters composed of diverse villages. Additionally, we analyzed the robustness regarding the variance of chains in synthetic experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. https://www.kaggle.com/daveianhickey/2000-16-traffic-flow-england-scotland-wales/data

  2. https://github.com/deric/clustering-benchmark/blob/master/src/main/resources/datasets/artificial/cure-t2-4k.arff

References

  1. Balcan MF, Liang Y, Gupta P (2014) Robust hierarchical clustering. J Mach Learn Res 15(1):3831–3871

    MathSciNet  MATH  Google Scholar 

  2. Birant D, Kut A (2007) St-dbscan: an algorithm for clustering spatial-temporal data. Data Knowl Eng 60(1):208–221

    Article  Google Scholar 

  3. Day WH, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1(1):7–24

    Article  Google Scholar 

  4. Ester M, Kriegel HP, Sander J, Xu X et al (1996a) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231

    Google Scholar 

  5. Glasbey C (1987) Complete linkage as a multiple stopping rule for single linkage clustering. J Classif 4(1):103–109

    Article  Google Scholar 

  6. Held J, Beer A, Seidl T (2019) Chain-detection for dbscan. In: BTW 2019–Workshopband

    Google Scholar 

  7. He Y, Tan H, Luo W, Mao H, Ma D, Feng S, Fan J (2011) Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems. IEEE, 2011. pp 473–480

  8. Hyvärinen A, Karhunen J, Oja E (2004) Independent component analysis vol 46. John Wiley & Sons, Hoboken

    Google Scholar 

  9. Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans Royal Soc A 374(2065):20150202

    Article  MathSciNet  Google Scholar 

  10. Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359

    Article  Google Scholar 

  11. Ruiz C, Spiliopoulou M, Menasalvas E (2007) C‑dbscan: Density-based clustering with constraints. In: International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing. Springer, Berlin, Heidelberg, 2007. pp 216–223

  12. Sibson R (1973) Slink: an optimally efficient algorithm for the single-link cluster method. Comput J 16(1):30–34

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work has been funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibilities for its content.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Beer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Held, J., Beer, A. & Seidl, T. Chain-detection Between Clusters. Datenbank Spektrum 19, 219–230 (2019). https://doi.org/10.1007/s13222-019-00324-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-019-00324-9

Keywords

Navigation