Abstract
The goal of this paper is to present two new algorithms conceptually close to density-based clustering. Both algorithms deal with problems no worse than the dbscan algorithm, and additionally, flexscan deals with nonuniform distributions of data. The complexity of both algorithms is \(O(n \log n)\) in contrary to the well-known dbscan algorithm which complexity is \(O(n^2)\). Additionally, we show that the complexity of dbscan cannot be reduced to \(O(n \log n)\) just by using locality sensitive hashing trees (or either r-trees or kd-trees).
In the final part of the paper, we present results on benchmark datasets. Results clearly show the superiority of the proposed algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The \(\log ^*\) denotes iterative logarithm.
References
Barton, T., Bruna, T., Kordik, P.: Chameleon 2: an improved graph-based clustering algorithm. ACM Trans. Knowl. Discov. Data 13(1), 10:2–10:27 (2019)
Barton, T., Bruna, T., Kordik, P.: Web page (2021). https://github.com/deric/clustering-benchmark
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Data structures for disjoint sets. In: Introduction to Algorithms, pp. 571–572. MIT Press, Cambridge (2009)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Evangelos Simoudis, J.H., Fayyad, U.M. (eds.) Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231. AAAI Press (1996)
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21(3), 768–769 (1965)
Indyk, P., Motwani, R.: Approximate nearest neighbor–towards removing the curse of dimensionality. In: The Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Karypis, G., Han, E.H.S., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Lloyd, S.P.: Least square quantization in PCM. Technical Report, Bell Telephone Laboratories Paper (1957)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489
Orliński, M., Jankowski, N.: Fast t-SNE algorithm with forest of balanced LSH trees and hybrid computation of repulsive forces. Knowl. Based Syst. 206, 1–16 (2020). https://doi.org/10.1016/j.knosys.2020.106318
Steinhaus, H.: Sur la division des corps matériels en parties. Bull. Acad. Polon. Sci. Cl. III. 4(1956), 801–804 (1957)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Jankowski, N. (2021). Revdbscan and Flexscan—\(O(n\log n)\) Clustering Algorithms. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_75
Download citation
DOI: https://doi.org/10.1007/978-3-030-92307-5_75
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)