Skip to main content

One Step Beyond: Keyword Extraction in German Utilising Surprisal from Topic Contexts

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 507))

Included in the following conference series:

  • 956 Accesses

Abstract

This paper describes a study on keyword extraction in German with a model that utilises Shannon information as a lexical feature. Lexical information content was derived from large, extra-sentential semantic contexts of words in the framework of the novel Topic Context Model. We observed that lexical information content increased the performance of a Recurrent Neural Network in keyword extraction, outperforming TexTRank and other two models, i.e., Named Entity Recognition and Latent Dirichlet Allocation used comparatively in this study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://heise.de.

  2. 2.

    https://spacy.io.

  3. 3.

    https://github.com/jnphilipp/TextRank.

  4. 4.

    model.components_ / model.components_.sum(axis=1)[:, np.newaxis] as suggested by https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html.

References

  1. Aji, S.: Document summarization using positive pointwise mutual information. Int. J. Comput. Sci. Inf. Technol. 4(2), 47–55 (2012)

    Google Scholar 

  2. Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1638–1649 (2018)

    Google Scholar 

  3. Bao, J., et al.: Towards a theory of semantic communication. In: 2011 IEEE Network Science Workshop, pp. 110–117. IEEE (2011)

    Google Scholar 

  4. Bennett, E.D., Goodman, N.D.: Extremely costly intensifiers are stronger than quite costly ones. Cognition 178, 147–161 (2018)

    Article  Google Scholar 

  5. Bharti, S.K., Babu, K.S.: Automatic keyword extraction for text summarization: a survey. arXiv preprint arXiv:1704.03242 (2017)

  6. Biemann, C., Heyer, G., Quastoff, U.: Wissensrohstoff text. eine einführung in das text mining (2. auflage) (2021)

    Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Çano, E., Bojar, O.: Keyphrase generation: a multi-aspect survey. arXiv preprint arXiv:1910.05059 (2019)

  9. Cherry, E.C.: A history of the theory of information. Proc. IEE Part III Radio Commun. Eng. 98(55), 383–393 (1951)

    Google Scholar 

  10. Cohen, J.: Trusses: cohesive subgraphs for social network analysis. National security agency technical report 16:3-1 (2008)

    Google Scholar 

  11. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  12. Dretske, F.: Knowledge and the Flow of Information. MIT Press (1981)

    Google Scholar 

  13. Hahn, M., Jurafsky, D., Futrell, R.: Universals of word order reflect optimization of grammars for efficient communication. Proc. Natl. Acad. Sci. 117(5), 2347–2353 (2020)

    Article  Google Scholar 

  14. Hale, J.: A probabilistic early parser as a psycholinguistic model. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, pp. 1–8. Association for Computational Linguistics (2001)

    Google Scholar 

  15. Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora. Eur. Phys. J. B 63(1), 135–146 (2008). arXiv: cs/0701028

  16. Huo, H., Liu, X.H.: Automatic summarization based on mutual information. Appl. Mech. Mater. 513–517, 1994–1997 (2014)

    Article  Google Scholar 

  17. Jaeger, T.F., Levy, R.P.: Speakers optimize information density through syntactic reduction. In: Advances in Neural Information Processing Systems, pp. 849–856 (2007)

    Google Scholar 

  18. Kamp, H., Van Genabith, J., Reyle, U.: Discourse representation theory. In: Gabbay, D.M., Guenthner, F. (eds.) Handbook of Philosophical Logic, pp. 125–394. Springer, Dordrecht (2011). https://doi.org/10.1007/978-94-007-0485-5_3

    Chapter  Google Scholar 

  19. Kölbl, M., Kyogoku, Y., Philipp, J., Richter, M., Rietdorf, C., Yousef, T.: Keyword extraction in German: information-theory vs. deep learning. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI, pp. 459–464. INSTICC, SciTePress (2020)

    Google Scholar 

  20. Kölbl, M., Kyogoku, Y., Philipp, J.N., Richter, M., Rietdorf, C., Yousef, T.: The semantic level of Shannon information: are highly informative words good keywords? A study on German. In: Loukanova, R. (ed.) Natural Language Processing in Artificial Intelligence—NLPinAI 2020. SCI, vol. 939, pp. 139–161. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-63787-3_5

    Chapter  Google Scholar 

  21. Levy, R.: Expectation-based syntactic comprehension. Cognition 106(3), 1126–1177 (2008)

    Article  Google Scholar 

  22. Lin, D., et al.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)

    Google Scholar 

  23. MacWhinney, B.: The competition model. In: Mechanisms of Language Acquisition, pp. 249–308 (1987)

    Google Scholar 

  24. MacWhinney, B., Bates, E.: Functionalism and the competition model. In: The Crosslinguistic Study of Sentence Processing, pp. 3–73 (1989)

    Google Scholar 

  25. Mahowald, K., Fedorenko, E., Piantadosi, S.T., Gibson, E.: Info/information theory: speakers choose shorter words in predictive contexts. Cognition 126(2), 313–318 (2013)

    Article  Google Scholar 

  26. Dan Melamed, I.: Measuring semantic entropy. In: Tagging Text with Lexical Semantics: Why, What, and How? (1997)

    Google Scholar 

  27. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)

    Google Scholar 

  28. Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)

    Article  Google Scholar 

  29. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)

    Article  Google Scholar 

  30. Novak, L., Piotrovskij, R.: Esperimento di predizione ed entropia della lingua rumena. Statistics linguistics, Bolona (1971)

    Google Scholar 

  31. Peters, M.E., Neumann, M., Zettlemoyer, L., Yih, W.: Dissecting contextual word embeddings: architecture and representation. arXiv preprint arXiv:1808.08949 (2018)

  32. Piantadosi, S.T., Tily, H., Gibson, E.: Word lengths are optimized for efficient communication. Proc. Nat. Acad. Sci. 108(9), 3526–3529 (2011)

    Article  Google Scholar 

  33. Piotrowski, R.: Text informational estimates and synergetics. J. Quant. Linguist. 4(1–3), 232–243 (1997)

    Article  Google Scholar 

  34. Piotrowski, R.G.: Quantitative linguistics and information theory (quantitative linguistik und informationstheorie) (2005)

    Google Scholar 

  35. Ravindra, G.: Information theoretic approach to extractive text summarization. Ph.D. thesis, Supercomputer Education and Research Center, Indian Institute of Science, Bangalore, (2009)

    Google Scholar 

  36. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint arXiv:cmp-lg/9511007 (1995)

  37. Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. In: ECAI, vol. 16, pp. 1089 (2004)

    Google Scholar 

  38. Shannon, C.E.: Prediction and entropy of printed English. Bell Syst. Tech. J. 30(1), 50–64 (1951)

    Article  Google Scholar 

  39. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press (1949)

    Google Scholar 

  40. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (2013)

    Article  MathSciNet  Google Scholar 

  41. Tixier, A., Malliaros, F., Vazirgiannis, M.: A graph degeneracy-based approach to keyword extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1860–1870 (2016)

    Google Scholar 

  42. Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 33–40 (2003)

    Google Scholar 

  43. Waxman, S., Xiaolan, F., Arunachalam, S., Leddon, E., Geraghty, K., Song, H.: Are nouns learned before verbs? Infants provide insight into a long-standing debate. Child Dev. Perspect. 7(3), 155–159 (2013)

    Article  Google Scholar 

  44. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automated keyphrase extraction. In: Design and Usability of Digital Libraries: Case Studies in the Asia Pacific, pp. 129–152. IGI Global (2005)

    Google Scholar 

  45. Zhang, C.: Automatic keyword extraction from documents using conditional random fields. J. Comput. Inf. Syst. 4(3), 1169–1180 (2008)

    Google Scholar 

  46. Zhang, Q., Wang, Y., Gong, Y., Huang, X.-J.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845 (2016)

    Google Scholar 

  47. Zhang, Yu., Tuo, M., Yin, Q., Qi, L., Wang, X., Liu, T.: Keywords extraction with deep neural network model. Neurocomputing 383, 113–121 (2020)

    Article  Google Scholar 

  48. Zhou, Z., Wang, Y., Gu, J.: A new model of information content for semantic similarity in wordnet. In: 2008 2nd International Conference on Future Generation Communication and Networking Symposia, vol. 3, pp. 85–89. IEEE (2008)

    Google Scholar 

Download references

Acknowledgments

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), project number: 357550571.

The training of the LDA and neural networks was done on the High Performance Computing (HPC) Cluster of the Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) of the Technische Universität Dresden.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Nathanael Philipp .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Philipp, J.N., Kölbl, M., Kyogoku, Y., Yousef, T., Richter, M. (2022). One Step Beyond: Keyword Extraction in German Utilising Surprisal from Topic Contexts. In: Arai, K. (eds) Intelligent Computing. SAI 2022. Lecture Notes in Networks and Systems, vol 507. Springer, Cham. https://doi.org/10.1007/978-3-031-10464-0_53

Download citation

Publish with us

Policies and ethics