Document Clustering

Zhao, Ying; Karypis, George

doi:10.1007/978-1-4614-8265-9_1479

Document Clustering

Ying Zhao³ &
George Karypis⁴

Reference work entry
First Online: 01 January 2018

14 Accesses

Synonyms

High-dimensional clustering; Text clustering; Unsupervised learning on document datasets

Definition

At a high-level the problem of document clustering is defined as follows. Given a set S of n documents, we would like to partition them into a pre-determined number of k subsets S₁, S₂, …, S_k, such that the documents assigned to each subset are more similar to each other than the documents assigned to different subsets. Document clustering is an essential part of text mining and has many applications in information retrieval and knowledge management. Document clustering faces two big challenges: the dimensionality of the feature space tends to be high (i.e., a document collection often consists of thousands or tens of thousands unique words); the size of a document collection tends to be large.

Historical Background

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms as well as in facilitating...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Author information

Authors and Affiliations

Tsinghua University, Beijing, China
Ying Zhao
University of Minnesota, Minneapolis, MN, USA
George Karypis

Authors

Ying Zhao
View author publications
You can also search for this author in PubMed Google Scholar
George Karypis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

Department of Computer Science and Engineering, The University of California at Riverside, Bourns College of Engineering, Riverside, CA, USA
Dimitrios Gunopulos

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Zhao, Y., Karypis, G. (2018). Document Clustering. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1479

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_1479
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Document Clustering

Synonyms

Definition

Historical Background

Recommended Reading

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Synonyms

Definition

Historical Background

Buying options

Recommended Reading

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation