Abstract
Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.
Supported by St. Petersburg State University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Clark S.: Vector space models of lexical meaning. In: Handbook of Contemporary Semantics, 2nd edn. Blackwell, Oxford (2014)
Everitt, B.S.: Cluster Analysis, 5th edn. Wiley, Chichester (2011)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Calirnski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
Baxter, M.J.: Exploratory Multivariate Analysis in Archaeology. Edinburgh (1994)
Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset. J. Am. Stat. Assoc. 98(463), 750–763 (2003)
Granichin, O.N., Shalymov, D.S., Avros, R., Volkovich, Z.: A randomized algorithm for estimating the number of clusters. Autom. Remote Control 72(4), 754–765 (2011)
Avros, R., Granichin, O., Shalymov, D., Volkovich, Z., Weber, G.-W.: Randomized algorithm of finding the true number of clusters based on Chebychev polynomial approximation. Intell. Syst. Ref. Libr. 23, 131–155 (2012)
Zhang, G., Zhang, C., Zhang, H.: Improved K-means algorithm based on density Canopy. Knowl.-Based Syst. 145, 1–14 (2018)
Jiali, W., Yue, Z., Xv, L.: Automatic cluster number selection by finding density peaks. In: 2nd IEEE International Conference on Computer and Communications, ICCC 2016 – Proceedings, pp. 13–18 (2017). https://doi.org/10.1109/CompComm.2016.7924655
de Amorim, R.C., Hennig, C.: Recovering the number of clusters in data sets with noise features using feature rescaling factors. Information Sciences. 324, 126–145 (2015)
Lozkins, A., Bure, V.M.: Single hub location-allocation problem under robustness clustering concept. Vestnik Sankt-Peterburgskogo Universiteta, Prikladnaya Matematika, Informatika, Protsessy Upravleniya 13(4), 398–406 (2017)
Steinhaus, H.: Sur la division des corps materiels en parties. Bull. Acad. Polon. Sci. IV, 801–804 (1956)
Lloyd S.: Least square quantization in PCM’s. Bell Telephone Laboratories Paper (1957)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
Aldenderfer, M.S., Blashfield, R.K.: Cluster Analysis. Sage Publications, Beverly Hills (1984)
Orekhov, A.V.: Criterion for estimation of stress-deformed state of SD-materials. In: AIP Conference Proceedings, vol. 1959, p. 070028 (2018). https://doi.org/10.1063/1.5034703
McCaffrey, J.: Test run – k-means++ data clustering. MSDN Mag. 30(8), 62–68 (2015)
Bryant, J., Thompson, S.: Fundamentals of Media Effects. McGraw-Hill, New York (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Orekhov, A.V. (2019). Agglomerative Method for Texts Clustering. In: Bodrunova, S., et al. Internet Science. INSCI 2018. Lecture Notes in Computer Science(), vol 11551. Springer, Cham. https://doi.org/10.1007/978-3-030-17705-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-17705-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17704-1
Online ISBN: 978-3-030-17705-8
eBook Packages: Computer ScienceComputer Science (R0)