Skip to main content

History-Driven Entity Categorization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11642))

Abstract

Knowledge of entity histories is often necessary for comprehensive understanding and characterization of entities. In this paper we introduce a novel task of history-based entity categorization. Taking a set of entity-related documents as an input we detect latent entity categories whose members share similar histories, effectively, grouping entities based on the similarities of their historical developments. Next, we generate comparative timelines for each generated group allowing users to spot similarities and differences in entity histories. We evaluate our approach on several datasets of different entity types demonstrating its effectiveness against competitive baselines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We experimentally set the value of \(\lambda \) to be 0.4.

  2. 2.

    For example, https://en.wikipedia.org/wiki/1939.

  3. 3.

    Note that the standard deviations of event occurrence times are 0 here as the total number of used events is quite small.

References

  1. Au Yeung, C.M., Leung, H.F.: A formal model of ontology for handling fuzzy membership and typicality of instances. Comput. J. 53(3), 316–341 (2008)

    Article  Google Scholar 

  2. Bairi, R.B., Carman, M., Ramakrishnan, G.: On the evolution of Wikipedia: dynamics of categories and articles. In: AAAI (2015)

    Google Scholar 

  3. Bamman, D., Smith, N.A.: Unsupervised discovery of biographical structure from text. TACL 2, 363–376 (2014)

    Google Scholar 

  4. Barsalou, L.W.: The instability of graded structure: implications for the nature of concepts. In: Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization, pp. 10139 (1987)

    Google Scholar 

  5. Brooks, L.R.: Nonanalytic concept formation and memory for instances (1978)

    Google Scholar 

  6. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336. ACM (1998)

    Google Scholar 

  7. Chang, A.X., Manning, C.D.: SUTime: a library for recognizing and normalizing time expressions. In: LREC 2012, pp. 3735–3740 (2012)

    Google Scholar 

  8. Chen, Y.N., Metze, F.: Two-layer mutually reinforced random walk for improved multi-party meeting summarization. In: 2012 IEEE SLT, pp. 461–466 (2012)

    Google Scholar 

  9. Li, C., Cheng, H., Xiao, Y., Xie, C., Jiang, H., Feng, S.: Timeline: a Chinese event extraction and exploration system. In: SoMeT 2018 (2018)

    Google Scholar 

  10. Duan, Y., Jatowt, A., Tanaka, K.: Discovering typical histories of entities by multi-timeline summarization. In: Proceedings of the 28th ACM HT, pp. 105–114 (2017)

    Google Scholar 

  11. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  Google Scholar 

  12. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR: Appl. Stat. 28(1), 100–108 (1979)

    MATH  Google Scholar 

  13. Hintzman, D.L., Ludlam, G.: Differential forgetting of prototypes and old instances: simulation by an exemplar-based classification model. Mem. Cogn. 8(4), 378–382 (1980)

    Article  Google Scholar 

  14. Kschischang, F.R., Frey, B.J., Loeliger, H.A., et al.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)

    Article  MathSciNet  Google Scholar 

  15. Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: ICDM 2003, pp. 179–186. IEEE (2003)

    Google Scholar 

  16. Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: ICML, vol. 2, 387–394 (2002)

    Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  18. Nosofsky, R.M.: Similarity, frequency, and category representations. J. Exp. Psychol.: Learn. Mem. Cogn. 14(1), 54 (1988)

    Google Scholar 

  19. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC 2010, pp. 45–50 (2010)

    Google Scholar 

  20. Rosch, E.: Cognitive representations of semantic categories. J. Exp. Psychol. Gen. 104(3), 192 (1975)

    Article  Google Scholar 

  21. Sanner, S., Guo, S., Graepel, T., Kharazmi, S., Karimi, S.: Diverse retrieval via greedy optimization of expected 1-call@ k in a latent subtopic relevance model. In: CIKM, pp. 1977–1980. ACM (2011)

    Google Scholar 

  22. Wang, J., Zhu, J.: Portfolio theory of information retrieval. In: SIGIR, pp. 115–122. ACM (2009)

    Google Scholar 

  23. Wang, Y., Chen, L.: K-MEAP: multiple exemplars affinity propagation with specified \(k\) clusters. IEEE Trans. Neural Netw. Learn. Syst. 27(12), 2670–2682 (2016)

    Article  Google Scholar 

  24. Xiao, J., Wang, J., Tan, P., Quan, L.: Joint affinity propagation for multiple view segmentation. In: ICCV 2007, pp. 1–7. IEEE (2007)

    Google Scholar 

  25. Yu, H.T., et al.: A concise integer linear programming formulation for implicit search result diversification. In: WSDM, pp. 191–200. ACM (2017)

    Google Scholar 

  26. Yu, H., Han, J., Chang, K.C.C.: PEBL: positive example based learning for web page classification using SVM. In: SIGKDD, pp. 239–248. ACM (2002)

    Google Scholar 

  27. Zuccon, G., Azzopardi, L., Zhang, D., Wang, J.: Top-k retrieval using facility location analysis. In: Baeza-Yates, R., et al. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 305–316. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28997-2_26

    Chapter  Google Scholar 

Download references

Acknowledgements

This research has been supported by JSPS KAKENHI grants (#17H01828, #18K19841, #18H03243).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yijun Duan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Duan, Y., Jatowt, A., Tanaka, K. (2019). History-Driven Entity Categorization. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11642. Springer, Cham. https://doi.org/10.1007/978-3-030-26075-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26075-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26074-3

  • Online ISBN: 978-3-030-26075-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics