Agglomerative Method for Texts Clustering

Orekhov, Andrey V.

doi:10.1007/978-3-030-17705-8_2

Andrey V. Orekhov ORCID: orcid.org/0000-0001-7641-956X²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11551))

Included in the following conference series:

International Conference on Internet Science

2812 Accesses
5 Citations

Abstract

Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.

Supported by St. Petersburg State University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Clark S.: Vector space models of lexical meaning. In: Handbook of Contemporary Semantics, 2nd edn. Blackwell, Oxford (2014)
Google Scholar
Everitt, B.S.: Cluster Analysis, 5th edn. Wiley, Chichester (2011)
Book Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Calirnski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
MathSciNet MATH Google Scholar
Baxter, M.J.: Exploratory Multivariate Analysis in Archaeology. Edinburgh (1994)
Google Scholar
Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset. J. Am. Stat. Assoc. 98(463), 750–763 (2003)
Article Google Scholar
Granichin, O.N., Shalymov, D.S., Avros, R., Volkovich, Z.: A randomized algorithm for estimating the number of clusters. Autom. Remote Control 72(4), 754–765 (2011)
Article MathSciNet Google Scholar
Avros, R., Granichin, O., Shalymov, D., Volkovich, Z., Weber, G.-W.: Randomized algorithm of finding the true number of clusters based on Chebychev polynomial approximation. Intell. Syst. Ref. Libr. 23, 131–155 (2012)
Article MathSciNet Google Scholar
Zhang, G., Zhang, C., Zhang, H.: Improved K-means algorithm based on density Canopy. Knowl.-Based Syst. 145, 1–14 (2018)
Article Google Scholar
Jiali, W., Yue, Z., Xv, L.: Automatic cluster number selection by finding density peaks. In: 2nd IEEE International Conference on Computer and Communications, ICCC 2016 – Proceedings, pp. 13–18 (2017). https://doi.org/10.1109/CompComm.2016.7924655
de Amorim, R.C., Hennig, C.: Recovering the number of clusters in data sets with noise features using feature rescaling factors. Information Sciences. 324, 126–145 (2015)
Article MathSciNet Google Scholar
Lozkins, A., Bure, V.M.: Single hub location-allocation problem under robustness clustering concept. Vestnik Sankt-Peterburgskogo Universiteta, Prikladnaya Matematika, Informatika, Protsessy Upravleniya 13(4), 398–406 (2017)
MathSciNet Google Scholar
Steinhaus, H.: Sur la division des corps materiels en parties. Bull. Acad. Polon. Sci. IV, 801–804 (1956)
MathSciNet MATH Google Scholar
Lloyd S.: Least square quantization in PCM’s. Bell Telephone Laboratories Paper (1957)
Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Aldenderfer, M.S., Blashfield, R.K.: Cluster Analysis. Sage Publications, Beverly Hills (1984)
Book Google Scholar
Orekhov, A.V.: Criterion for estimation of stress-deformed state of SD-materials. In: AIP Conference Proceedings, vol. 1959, p. 070028 (2018). https://doi.org/10.1063/1.5034703
McCaffrey, J.: Test run – k-means++ data clustering. MSDN Mag. 30(8), 62–68 (2015)
Google Scholar
Bryant, J., Thompson, S.: Fundamentals of Media Effects. McGraw-Hill, New York (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

St. Petersburg State University, 7/9, Universitetskaya embankment, St. Petersburg, 199034, Russian Federation
Andrey V. Orekhov

Authors

Andrey V. Orekhov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey V. Orekhov .

Editor information

Editors and Affiliations

St. Petersburg State University, St. Petersburg, Russia
Svetlana S. Bodrunova
National Research University Higher School of Economics, St. Petersburg, Russia
Olessia Koltsova
SINTEF, Trondheim, Norway
Asbjørn Følstad
Inria, Le Chesnay, France
Harry Halpin
National Research University Higher School of Economics, Moscow, Russia
Polina Kolozaridi
National Research University Higher School of Economics, Moscow, Russia
Leonid Yuldashev
St. Petersburg State University, St. Petersburg, Russia
Anna Smoliarova
TU München, Munich, Germany
Heiko Niedermayer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Orekhov, A.V. (2019). Agglomerative Method for Texts Clustering. In: Bodrunova, S., et al. Internet Science. INSCI 2018. Lecture Notes in Computer Science(), vol 11551. Springer, Cham. https://doi.org/10.1007/978-3-030-17705-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-17705-8_2
Published: 17 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17704-1
Online ISBN: 978-3-030-17705-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics