Skip to main content
Log in

A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

This paper presents a probabilistic mixture modeling framework for the hierarchic organisation of document collections. It is demonstrated that the probabilistic corpus model which emerges from the automatic or unsupervised hierarchical organisation of a document collection can be further exploited to create a kernel which boosts the performance of state-of-the-art support vector machine document classifiers. It is shown that the performance of such a classifier is further enhanced when employing the kernel derived from an appropriate hierarchic mixture model used for partitioning a document corpus rather than the kernel associated with a flat non-hierarchic mixture model. This has important implications for document classification when a hierarchic ordering of topics exists. This can be considered as the effective combination of documents with no topic or class labels (unlabeled data), labeled documents, and prior domain knowledge (in the form of the known hierarchic structure), in providing enhanced document classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bartholomew, D.J. (1987). Latent Variable Models and Factor Analysis. London: Charles Griffin &; Co. Ltd.

    Google Scholar 

  • Bishop, C.M. (1995). Neural Networks for Pattern Recognition, Oxford: Oxford University Press.

    Google Scholar 

  • Bishop, C.M. and Tipping, M.E. (1998). A Hierarchical Latent Variable Model for Data Visualization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 281–293.

    Google Scholar 

  • Cheeseman, P. and Stutz, J. (1995). Bayesian Classification (AutoClass): Theory and Results. Advances in Knowledge Discovery and Data Mining (vol. II, pp. 153–180). AAAI Press.

    Google Scholar 

  • Chen, H. and Dumais, S. (2000). Bringing Order to the Web: Automatically Categorising Search Results. In Proceedings of CHI-00, ACM International Conference on Human Factors in Computing Systems (pp. 145–152). Den Haag, NL.

    Google Scholar 

  • Chickering, D.M. and Heckerman, D. (1996). Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables. Machine Learning, 29, 181–212.

    Google Scholar 

  • Chickering, D.M., Heckerman, D., and Meek, C. (1997). A Bayesian Approach to Learning Bayesian Networks with Local Structure. In Proceedings Uncertainty and Artificial Intelligence (UAI-97).

  • Cooper, G.F. and Herskovits, E. (1992). A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning, 9, 309–347.

    Google Scholar 

  • Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory. New York: John Willey &; Sons.

    Google Scholar 

  • Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S. (1998). Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the Fifteenth National Conference on Artificial Intellligence (AAAI98) (pp. 509–516).

  • Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41, 391–407.

    Google Scholar 

  • Dumais, S.T., Platt, J., Heckerman, D., and Sahami, M. (1998). Inductive Learning Algorithms and Representations for Text Categorisation. In Proceedings of the Seventh International Conference on Information and Knowledge Management (pp. 148–155).

  • Dumais, S.T. and Chen, H. (2000). Hierarchical Classification of Web Content. In N.J. Belkin, P. Ingwersen, and M.K. Leong (Eds.), Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval, Athens, GR (pp. 256–263). New York, US: ACM Press.

    Google Scholar 

  • Hanson, R., Stutz, J., and Cheeseman, P. (1991). Bayesian Classification with Correlation and Inheritance. In R. Myopoulos and J. Reiter (Eds.), Proceedings of the 12th International Joint Conference on Artificial Intelligence, Sydney, Australia, August 1991 (pp. 692–698). Morgan Kaufmann.

  • Heckerman, D., Geiger, D., and Chickering, D. (1995). Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20(3), 197–243.

    Google Scholar 

  • Hofmann, T. (1999). Probabilistic Latent Semantic Analysis. In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), San Fracisco, CA (pp. 289–296). Los Alfos, CA: Morgan Kaufmann.

    Google Scholar 

  • Hofmann, T. (2000). Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization. In Proceedings of Advances in Neural Information Processing Systems (NIPS'99) (vol. 12). Cambridge, MA: MIT Press.

    Google Scholar 

  • Jaakkola, T., Meilă, M., and Jebara, T. (2000). Maximum Entropy Discrimination. In Advances in Neural Information Processing Systems (vol. 12). Cambridge, MA: MIT Press.

    Google Scholar 

  • Jaakkola, T., Meila, M., and Jebara, T. (1999). Maximum Entropy Discrimination. Technical Report AITR-1668, MIT AI Lab.

  • Jaakola, T. and Haussler, D. (1999). Exploiting Generative Models in Discriminative Classifiers. In Proceedings of Advances in Neural Information Processing System (NIPS'98) (vol. 11, pp. 487–493). Cambridge, MA: MIT Press.

    Google Scholar 

  • Joachims, T. SVM light—Support Vector Machine. http://ais.gmd.de/thorsten/~svm light/.

  • Joachims, T. (1997). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML96.)

  • Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In C. Nédellec and C. Rouveirol (Eds.), Proceedings of ECML-98, 10th European Conference on Machine Learning, Chemnitz, DE, 1998. (pp. 137–142). Heidelberg, DE: Springer Verlag. Lecture Notes in Computer Science, vol. 1398.

    Google Scholar 

  • Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., and Saul, L.K. (1999). An Introduction to Variational Methods for Graphical Models. Machine Learning, 37, 183–233.

    Google Scholar 

  • Jordan, M.I. (1994). Hierarchical Mixture of Experts and the EM Algorithm. Neural Computation, 6(6), 181–214.

    Google Scholar 

  • McCallum, A. and Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. In AAAI-98 Workshop on Learning for Text Categorization.

  • McLachlan, G.J. and Krishnan, T. (1997). The EM Algorithm and Extensions. New York: John Wiley and Sons.

    Google Scholar 

  • Meilă, M. and Jordan, M.I. (2000). Learning with Mixtures of Trees. Journal of Machine Learning Research, 1, 1–48.

    Google Scholar 

  • Paradimitriou, C.H. and Raghavan, P. (1998). Latent Semantic Indexing: A Probabilistic Analysis. In Proceedings of the 17th ACM Symposium on the Priciples of Database Systems, Seattle, 1998 (pp. 159–168).

  • Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry. Singapore: World Scientific Publishing. Computer Science, vol. 15.

    Google Scholar 

  • Thiesson, B. (1997). Score and Information for Recursive Exponential Models with Incomplete Data. In D. Geiger, and P.P. Shenoy (Eds.), Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (UAI-97), San Francisco, August 1–3, 1997 (pp. 453–463). Los Alfas, CA: Morgan Kaufmann Publishers.

    Google Scholar 

  • Vaithyanatham, S. and Dom, B. (2000). Generalized Model Selection for Unsupervised Learning in High Dimensions. In Proceedings of Advances in Neural Information Processing Systems (NIPS'99) (vol. 12). Cambridge, MA: MIT Press.

    Google Scholar 

  • van Rijsbergen, C.J. (1979). Information Retrieval, 2nd edn. London: Butterworths.

    Google Scholar 

  • Vapnik, V.N. (1995). The Nature of Statistical Learning Theory. Berlin: Springer-Verlag.

    Google Scholar 

  • Vasconcelos, N. and Lippman, A. (1999). Learning Mixture Hierarchies. In Proceedings of Advances in Neural Information Processing Systems (NIPS'98) (vol. 11, pp. 606–612). Cambridge MA: MIT Press.

    Google Scholar 

  • Vinokourov, A. (2001). SoftwareDemo—AProbabilistic Approach to the Unsupervised Organisation of Document Collections Using Multinomial ASymmetric Hierarchical Analysis (MASHA). http://cis.paisley.ac.uk/vinoci0/ masha demo/.

  • Vinokourov, A. and Girolami, M. (2000). A Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents. In 15th International Conference on Pattern Recognition (ICPR'2000) (vol. 2, pp. 182–185). IEEE Computer Society.

    Google Scholar 

  • Weigend, A.S., Wiener, E.D., and Pedersen, J.O. (1999). Exploiting Hierarchy in Text Categorisation. Information Retrieval, 1(3), 193–216.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vinokourov, A., Girolami, M. A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections. Journal of Intelligent Information Systems 18, 153–172 (2002). https://doi.org/10.1023/A:1013677411002

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013677411002

Navigation