Principal Direction Divisive Partitioning

Boley, Daniel

doi:10.1023/A:1009740529316

Principal Direction Divisive Partitioning

Published: December 1998

Volume 2, pages 325–344, (1998)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Daniel Boley¹

604 Accesses
213 Citations
Explore all metrics

Abstract

We propose a new algorithm capable of partitioning a set of documents or other samples based on an embedding in a high dimensional Euclidean space (i.e., in which every document is a vector of real numbers). The method is unusual in that it is divisive, as opposed to agglomerative, and operates by repeatedly splitting clusters into smaller clusters. The documents are assembled into a matrix which is very sparse. It is this sparsity that permits the algorithm to be very efficient. The performance of the method is illustrated with a set of text documents obtained from the World Wide Web. Some possible extensions are proposed for further investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Anderberg, M.R. 1973. Cluster Analysis for Applications. Academic Press.
Berry, M.W., Dumais, S.T., and O'Brien, G.W. 1995. Using linear algebra for intelligent information retrieval. SIAM Review, 37:573-595.
Google Scholar
Bishop, C. and Tipping, M. 1998. A hierarchical latent variable model for data visualization. IEEE Trans. Patt. Anal. Mach. Intell., 20(3):281-293.
Article Google Scholar
Boley, D. 1998. Experimental PDDP Software. http://www.cs.umn.edu/∼boley/PDDP.html.
Boley, D., Gini, M., Gross, R., Han, E.-H., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. 1998. Document categorization and query generation on the world wide web using WebACE. AI Review, to appear.
Cheeseman, P. and Stutz, J. 1996. Bayesian Classification (AutoClass): Theory and Results. In Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.), AAAI/MIT Press, pp. 153-180.
Cutting, D., Karger, D., Pedersen, J., and Tukey, J. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. 15th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92), pp. 318-329.
Duda, R.O. and Hart, P.E. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons.
Frakes, W.B. and Baeza-Yates, R. 1992. Information Retrieval Data Structures and Algorithms. Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Golub, G.H. and van Loan, C.F. 1996. Matrix Computations, 3rd edition. Johns Hopkins Univ. Press.
Han, S., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. 1998. WebACE: AWeb Agent for Document Categorization and Exploration, Proceedings ACM Autonomous Agents'98 Conference, Minneapolis, MN. pp. 408-415.
Hull, D., Pederson, J., and Schütze, H. 1996. Method Combination for Document Filtering. ACM SIGIR 96, 279-287.
Jain, A. and Dubes, R.C. 1988. Algorithms for Clustering Data. Prentice Hall.
Lewis, D. 1997. Reuters-21578. http://www.research.att.com/∼lewis.
Lu, S. and Fu, K. 1978. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Transactions on Systems, Man and Cybernetics, 8:381-389.
Google Scholar
Moore, J., Han, S., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., and Mobasher, B. 1997. Web page categorization and feature selection using association rule and principal component clustering. 7th Workshop on Information Technologies and Systems (WITS'97), Atlanta.
Nadler, M. and Smith, E.P. 1993. Pattern Recognition Engineering. Wiley.
Northern Light, 1998. http://www.nlsearch.com.
Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513-523.
Article Google Scholar
Schütwe, H. and Silverstein, C. 1997. Projections for efficient document clustering. ACM SIGIR 97, pp. 74-81.
Google Scholar
Singhal, A., Buckley, C., and Mitra, M. 1996. Pivoted document length normalization. ACMSIGIR 96, pp. 21-29.
Google Scholar
Titterington, D., Smith, A., and Makov, U. 1985. Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons.
Zamir, O., Etzioni, O., Madani, O., and Karp, R. 1997. Fast and intuitive clustering of web documents. KDD 97.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Minnesota, 200 Union Street S.E., Rm 4-192, Minneapolis, MN, 55455, USA
Daniel Boley

Authors

Daniel Boley
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boley, D. Principal Direction Divisive Partitioning. Data Mining and Knowledge Discovery 2, 325–344 (1998). https://doi.org/10.1023/A:1009740529316

Download citation

Issue Date: December 1998
DOI: https://doi.org/10.1023/A:1009740529316

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Principal Direction Divisive Partitioning

Abstract

Access this article

Similar content being viewed by others

Diagonal Co-clustering Algorithm for Document-Word Partitioning

Optimal Partitions

Structure Preserving Embedding of Dissimilarity Data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Principal Direction Divisive Partitioning

Abstract

Access this article

Similar content being viewed by others

Diagonal Co-clustering Algorithm for Document-Word Partitioning

Optimal Partitions

Structure Preserving Embedding of Dissimilarity Data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation