Abstract
Knowledge of entity histories is often necessary for comprehensive understanding and characterization of entities. In this paper we introduce a novel task of history-based entity categorization. Taking a set of entity-related documents as an input we detect latent entity categories whose members share similar histories, effectively, grouping entities based on the similarities of their historical developments. Next, we generate comparative timelines for each generated group allowing users to spot similarities and differences in entity histories. We evaluate our approach on several datasets of different entity types demonstrating its effectiveness against competitive baselines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We experimentally set the value of \(\lambda \) to be 0.4.
- 2.
For example, https://en.wikipedia.org/wiki/1939.
- 3.
Note that the standard deviations of event occurrence times are 0 here as the total number of used events is quite small.
References
Au Yeung, C.M., Leung, H.F.: A formal model of ontology for handling fuzzy membership and typicality of instances. Comput. J. 53(3), 316–341 (2008)
Bairi, R.B., Carman, M., Ramakrishnan, G.: On the evolution of Wikipedia: dynamics of categories and articles. In: AAAI (2015)
Bamman, D., Smith, N.A.: Unsupervised discovery of biographical structure from text. TACL 2, 363–376 (2014)
Barsalou, L.W.: The instability of graded structure: implications for the nature of concepts. In: Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization, pp. 10139 (1987)
Brooks, L.R.: Nonanalytic concept formation and memory for instances (1978)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336. ACM (1998)
Chang, A.X., Manning, C.D.: SUTime: a library for recognizing and normalizing time expressions. In: LREC 2012, pp. 3735–3740 (2012)
Chen, Y.N., Metze, F.: Two-layer mutually reinforced random walk for improved multi-party meeting summarization. In: 2012 IEEE SLT, pp. 461–466 (2012)
Li, C., Cheng, H., Xiao, Y., Xie, C., Jiang, H., Feng, S.: Timeline: a Chinese event extraction and exploration system. In: SoMeT 2018 (2018)
Duan, Y., Jatowt, A., Tanaka, K.: Discovering typical histories of entities by multi-timeline summarization. In: Proceedings of the 28th ACM HT, pp. 105–114 (2017)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR: Appl. Stat. 28(1), 100–108 (1979)
Hintzman, D.L., Ludlam, G.: Differential forgetting of prototypes and old instances: simulation by an exemplar-based classification model. Mem. Cogn. 8(4), 378–382 (1980)
Kschischang, F.R., Frey, B.J., Loeliger, H.A., et al.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: ICDM 2003, pp. 179–186. IEEE (2003)
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: ICML, vol. 2, 387–394 (2002)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nosofsky, R.M.: Similarity, frequency, and category representations. J. Exp. Psychol.: Learn. Mem. Cogn. 14(1), 54 (1988)
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC 2010, pp. 45–50 (2010)
Rosch, E.: Cognitive representations of semantic categories. J. Exp. Psychol. Gen. 104(3), 192 (1975)
Sanner, S., Guo, S., Graepel, T., Kharazmi, S., Karimi, S.: Diverse retrieval via greedy optimization of expected 1-call@ k in a latent subtopic relevance model. In: CIKM, pp. 1977–1980. ACM (2011)
Wang, J., Zhu, J.: Portfolio theory of information retrieval. In: SIGIR, pp. 115–122. ACM (2009)
Wang, Y., Chen, L.: K-MEAP: multiple exemplars affinity propagation with specified \(k\) clusters. IEEE Trans. Neural Netw. Learn. Syst. 27(12), 2670–2682 (2016)
Xiao, J., Wang, J., Tan, P., Quan, L.: Joint affinity propagation for multiple view segmentation. In: ICCV 2007, pp. 1–7. IEEE (2007)
Yu, H.T., et al.: A concise integer linear programming formulation for implicit search result diversification. In: WSDM, pp. 191–200. ACM (2017)
Yu, H., Han, J., Chang, K.C.C.: PEBL: positive example based learning for web page classification using SVM. In: SIGKDD, pp. 239–248. ACM (2002)
Zuccon, G., Azzopardi, L., Zhang, D., Wang, J.: Top-k retrieval using facility location analysis. In: Baeza-Yates, R., et al. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 305–316. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28997-2_26
Acknowledgements
This research has been supported by JSPS KAKENHI grants (#17H01828, #18K19841, #18H03243).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Duan, Y., Jatowt, A., Tanaka, K. (2019). History-Driven Entity Categorization. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11642. Springer, Cham. https://doi.org/10.1007/978-3-030-26075-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-26075-0_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26074-3
Online ISBN: 978-3-030-26075-0
eBook Packages: Computer ScienceComputer Science (R0)