Abstract
This paper presents an approach to the topic extraction from text documents using probabilistic graphical models. Multiple-cause networks with latent variables are used and the Helmholtz machines are utilized to ease the learning and inference. The learning in this model is conducted in a purely data-driven way and does not require prespecified categories of the given documents. Topic words extraction experiments on the TDT-2 collection are presented. Especially, document clustering results on a subset of TREC-8 ad-hoc task data show the substantial reduction of the inference time without significant deterioration of performance.
Preview
Unable to display preview. Download preview PDF.
References
Dayan, P., Hinton, G.E., Neal, R. M., Zemel, R. S.: The Helmholtz machine. Neural Computation 7 (1995) 889–904
Dayan, P., Zemel, R.S.: Competition and multiple cause models. Neural Computation 7 (1995) 565–579
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science. 41 (1990) 391–407
deSa, V.R., deCharms, R.C., Merzenich, M.M.: Using Helmholtz machines to analyze multi-channel neuronal recordings. Advances in Neural Information Processing Systems 10 (1998) 131–137
Frey, B.J.: Graphical Models for Machine Learning and Digital Communication. The MIT Press (1998)
Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The wake-sleep algorithm for unsupervised neural networks. Science 268 (1995) 1158–1161.
Hofmann, T.: Probabilistic latent semantic indexing. Proceedings of the 22th International Conference on Research and Development in Information Retrieval (SIGIR) (1999) 50–57
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401 (1999) 788–791
Sahami, M., Hearst, M., Saund, E.: Applying the multiple cause mixture model to Text Categorization. Proceedings of the 13th International Conference on Machine Learning (1996) 435–443
Saund, E.: A multiple cause mixture model for unsupervised learning. Neural Computation 7 (1995) 51–71
Teh, Y.W., Hinton, G.E.: Rate-coded restricted Boltzmann machines for face recognition. Advances in Neural Information Processing Systems 13 (2001) 908–914
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chang, JH., Won Lee, J., Kim, Y., Zhang, BT. (2002). Topic Extraction from Text Documents Using Multiple-Cause Networks. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_47
Download citation
DOI: https://doi.org/10.1007/3-540-45683-X_47
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44038-3
Online ISBN: 978-3-540-45683-4
eBook Packages: Springer Book Archive