Skip to main content

Placing (Historical) Facts on a Timeline: A Classification Cum Coref Resolution Approach

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

A timeline provides one of the most effective ways to visualize the important historical facts that occurred over a period of time, presenting the insights that may not be so apparent from reading the equivalent information in textual form . By leveraging generative adversarial learning for important sentence classification and by assimilating knowledge based tags for improving the performance of event coreference resolution we introduce a two staged system for event timeline generation from multiple (historical) text documents. We demonstrate our results on two manually annotated historical text documents. Our results can be extremely helpful for historians, in advancing research in history and in understanding the socio-political landscape of a country as reflected in the writings of famous personas. The dataset and the code are available at https://github.com/sayantan11995/Event-Timeline-Generation-from-Documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.ldc.upenn.edu/collaborations/past-projects/tac-kbp.

  2. 2.

    https://quod.lib.umich.edu/l/lincoln/.

  3. 3.

    https://en.wikipedia.org/wiki/COVID-19_pandemic_in_India.

  4. 4.

    https://www.who.int/india/emergencies/coronavirus-disease-(covid-19)/india-situation-report.

  5. 5.

    Such sentences would typically consist of participants and locations.

  6. 6.

    https://www.gandhiheritageportal.org/.

  7. 7.

    https://github.com/MarioVilas/googlesearch.

  8. 8.

    https://github.com/DerwenAI/pytextrank.

  9. 9.

    https://pypi.org/project/rake-nltk/.

  10. 10.

    https://www.nltk.org/howto/collocations.html.

  11. 11.

    https://scikit-learn.org/stable/modules/mixture.html.

  12. 12.

    We consider the root verb as action for a sentence.

  13. 13.

    https://visjs.github.io/vis-timeline/docs/timeline/.

  14. 14.

    Statistical significance were performed using Mann-Whitney U test [23].

References

  1. Abadi, M., Agarwal, A., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/

  2. Adak, S., et al.: Gandhipedia: A one-stop AI-enabled portal for browsing Gandhian literature, life-events and his social network. In: JCDL, pp. 539–540, New York, NY, USA (2020)

    Google Scholar 

  3. Aprosio, A., Tonelli, S.: Recognizing biographical sections in Wikipedia, pp. 811–816, January 2015

    Google Scholar 

  4. Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Coling, vol. 1, p. 79 (2000)

    Google Scholar 

  5. Bamman, D., Smith, N.A.: Unsupervised discovery of biographical structure from text. Trans. Assoc. Comput. Linguist. 2, 363–376 (2014)

    Article  Google Scholar 

  6. Barhom, S., Shwartz, V., Eirew, A., Bugert, M., Reimers, N., Dagan, I.: Revisiting joint modeling of cross-document entity and event coreference resolution (2019)

    Google Scholar 

  7. Bedi, H., Patil, S., Hingmire, S., Palshikar, G.: Event timeline generation from history textbooks. In: Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017), pp. 69–77. Asian Federation of Natural Language Processing, Taipei, Taiwan, December 2017

    Google Scholar 

  8. Born, L., Bacher, M., Markert, K.: Dataset reproducibility and IR methods in timeline summarization. In: LREC 2020 (2020)

    Google Scholar 

  9. Chen, Z., Ji, H., Haralick, R.: A pairwise event coreference model, feature impact and evaluation for event coreference resolution. In: Proceedings of the Workshop on Events in Emerging Text Types, pp. 17–22. Association for Computational Linguistics, Borovets, Bulgaria, September 2009

    Google Scholar 

  10. Choubey, P.K., Huang, R.: Event coreference resolution by iteratively unfolding inter-dependencies among events. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2124–2133. Association for Computational Linguistics, Copenhagen, Denmark, September 2017

    Google Scholar 

  11. Croce, D., Castellucci, G., Basili, R.: GAN-BERT: generative adversarial learning for robust text classification with a bunch of labeled examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2114–2119. Association for Computational Linguistics, July 2020

    Google Scholar 

  12. Cybulska, A., Vossen, P.: Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 4545–4552. European Language Resources Association (ELRA), Reykjavik, Iceland, May 2014

    Google Scholar 

  13. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)

    Google Scholar 

  14. Ghaddar, A., Langlais, P.: Wikicoref: an English coreference-annotated corpus of Wikipedia articles. In: Chair, N.C.C., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, France, May 2016

    Google Scholar 

  15. Gholipour Ghalandari, D., Ifrim, G.: Examining the state-of-the-art in news timeline summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1322–1334. Association for Computational Linguistics, July 2020

    Google Scholar 

  16. Hearst, M.A.: Support vector machines. IEEE Intell. Syst. 13(4), 18–28 (1998)

    Article  Google Scholar 

  17. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  18. Kenyon-Dean, K., Cheung, J.C.K., Precup, D.: Resolving event coreference with supervised representation learning and clustering-oriented regularization (2018)

    Google Scholar 

  19. Kibriya, A.M., Frank, E., Pfahringer, B., Holmes, G.: Multinomial naive Bayes for text categorization revisited. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 488–499. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30549-1_43

    Chapter  Google Scholar 

  20. La Quatra, M., Cagliero, L., Baralis, E., Messina, A., Montagnuolo, M.: Summarize dates first: a paradigm shift in timeline summarization, pp. 418–427. Association for Computing Machinery, New York, NY, USA (2021)

    Google Scholar 

  21. Lu, Y., Lin, H., Tang, J., Han, X., Sun, L.: End-to-end neural event coreference resolution. Artif. Intell. 303, 103632 (2020)

    Article  Google Scholar 

  22. Luo, X.: On coreference resolution performance metrics, January 2005

    Google Scholar 

  23. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18(1), 50–60 (1947)

    Article  MathSciNet  Google Scholar 

  24. Martschat, S., Markert, K.: Improving ROUGE for timeline summarization. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, Short Papers, pp. 285–290. Association for Computational Linguistics, Valencia, Spain, April 2017

    Google Scholar 

  25. Martschat, S., Markert, K.: A temporally sensitive submodularity framework for timeline summarization. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 230–240. Association for Computational Linguistics, Brussels, Belgium, October 2018

    Google Scholar 

  26. Miller, D.: Leveraging BERT for extractive text summarization on lectures (2019)

    Google Scholar 

  27. Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification (2017)

    Google Scholar 

  28. Moosavi, N.S., Strube, M.: Which coreference evaluation metric do you trust? A proposal for a link-based entity aware metric. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, Long Papers, pp. 632–642. Association for Computational Linguistics, Berlin, Germany, August 2016

    Google Scholar 

  29. Palshikar, G., Pawar, S., Patil, et al.: Extraction of message sequence charts from narrative history text. In: Proceedings of the First Workshop on Narrative Understanding, pp. 28–36. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019

    Google Scholar 

  30. Paszke, A., Gross, S., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019)

    Google Scholar 

  31. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  32. Preservation, S.A., Trust, M.: The Collected Works of Mahatma Gandhi (2013). https://www.gandhiheritageportal.org/the-collected-works-of-mahatma-gandhi. Accessed 22 Feb 2020

  33. Pustejovsky, J., et al.: TimeML: robust specification of event and temporal expressions in text, pp. 28–34, January 2003

    Google Scholar 

  34. Recasens, M., Hovy, E.: Blanc: Implementing the rand index for coreference evaluation. Nat. Lang. Eng. 17, 485–510 (2011)

    Article  Google Scholar 

  35. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks (2019)

    Google Scholar 

  36. Strötgen, J., Gertz, M.: HeidelTime: high quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324. Association for Computational Linguistics, Uppsala, Sweden, July 2010

    Google Scholar 

  37. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme, pp. 45–52, January 1995

    Google Scholar 

  38. Zhang, W., Chen, Q., Chen, Y.: Deep learning based robust text classification method via virtual adversarial training. IEEE Access 8, 61174–61182 (2020)

    Article  Google Scholar 

  39. Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayantan Adak .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 740 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adak, S., Ahmad, A., Basu, A., Mukherjee, A. (2023). Placing (Historical) Facts on a Timeline: A Classification Cum Coref Resolution Approach. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13718. Springer, Cham. https://doi.org/10.1007/978-3-031-26422-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26422-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26421-4

  • Online ISBN: 978-3-031-26422-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics