Skip to main content

Sequentially Grouping Items into Clusters of Unspecified Number

  • Conference paper
  • First Online:
Recent Advances in Information and Communication Technology 2017 (IC2IT 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 566))

Included in the following conference series:

Abstract

When run, most traditional clustering algorithms require the number of clusters sought to be specified beforehand, and all clustered items to be present. These two, for practical applications very serious shortcomings are overcome by a straightforward sequential clustering algorithm. Its most crucial constituent is a distance measure whose suitable choice is discussed. It is shown how sequentially obtained cluster sets can be improved by reclustering, and how items considered as outliers can be removed. The method’s feasible applicability to text analysis is shown.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Interested readers may download these datasets (1.3 MB) from http://www.docanalyser.de/cd-clustering-corpora.zip.

References

  1. Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: HLT-NAACL 2006 Workshop on Textgraphs, pp. 73–80. Association for Computational Linguistics, Stroudsburg (2006)

    Google Scholar 

  2. Bock, H.H.: Automatische Klassifkation. Vandenhoeck & Ruprecht, Göttingen (1974)

    Google Scholar 

  3. Breuer, D.: Abstandsmaße für die multivariate adaptive Einbettung. MSc Thesis, Fernuniversität in Hagen (2014)

    Google Scholar 

  4. Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  5. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  6. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)

    Google Scholar 

  7. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  8. Estivill-Castro, V.: Why so many clustering algorithms - a position paper. ACM SIGKDD Explor. Newsl. 4(1), 65–75 (2002)

    Article  MathSciNet  Google Scholar 

  9. Kubek, M., Unger, H.: Centroid terms as text representatives. In: ACM Symposium on Document Engineering, pp. 99–102. ACM (2016)

    Google Scholar 

  10. Quasthoff, U., Wolff, C.: The Poisson collocation measure and its applications. In: 2nd International Workshop on Computational Approaches to Collocations, Vienna. IEEE (2002)

    Google Scholar 

  11. Rasmussen, E.: Clustering algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data structures and Algorithms, pp. 419–442. Prentice-Hall, Upper Saddle River (1992)

    Google Scholar 

  12. Schnell, P.: Eine Methode zur Auffindung von Gruppen. Biometrische Zeitschrift 6, 47–48 (1964)

    Google Scholar 

Download references

Acknowledgement

This work was supported by Rajamangala University of Technology Phra Nakhon.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang A. Halang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Komkhao, M., Kubek, M., Halang, W.A. (2018). Sequentially Grouping Items into Clusters of Unspecified Number. In: Meesad, P., Sodsee, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2017. IC2IT 2017. Advances in Intelligent Systems and Computing, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-319-60663-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60663-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60662-0

  • Online ISBN: 978-3-319-60663-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics