Abstract
Conveying information about who, what, when and where is a primary purpose of some genres of documents, typically news articles. To handle such information, statistical models that capture dependencies between named entities and topics can serve an important role. Although some relationships between who and where should be mentioned in such a document, no statistical topic models explicitly addressed the textual interactions between a who-entity and a where-entity. This paper presents a statistical model that directly captures dependencies between an arbitrary number of word types, such as who-entities, where-entities and topics, mentioned in each document. We show how this multitype topic model performs better at making predictions on entity networks, in which each vertex represents an entity and each edge weight represents how a pair of entities at the incident vertices is closely related, through our experiments on predictions of who-entities and links between them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J.: Introduction to Topic Detection and Tracking. In: Topic Detection and Tracking: Event-based Information Organization, ch. 1, Kluwer Academic Publishers, Dordrecht (2002)
Baeza-Yates, R., Ribeiro-Neto, B.: Retrieval Evaluation. In: Modern Information Retrieval, ch. 3, pp. 73–97. Addison-Wesley, Reading (1999)
Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An algorithm that learns what’s in a name. Machine Learning 34, 211–231 (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Callan, J.P., Croft, W.B., Harding, S.M.: The INQUERY retrieval system. In: Proceedings of the 3rd International Conference on Database and Expert Systems Applications, Valencia, Spain, pp. 78–83 (1992)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, pp. 50–57 (1999)
Newman, D., Chemudugunta, C., Smyth, P., Steyvers, M.: Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 680–686 (2006)
Robertson, S.: On GMAP: and other transformations. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, New York, NY, USA, pp. 78–83 (2006)
Steyvers, M., Griffiths, T.: Probabilistic Topic Models. In: Handbook of Latent Semantic Analysis, ch. 21, Lawrence Erbaum Associates (2007)
Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Advances in Neural Information Processing Systems, 15, Cambridge, MA, USA (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shiozaki, H., Eguchi, K., Ohkawa, T. (2008). Entity Network Prediction Using Multitype Topic Models. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_67
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)