Skip to main content

Automatic Extraction of Relevant Keyphrases for the Study of Issue Competition

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13452))

  • 364 Accesses

Abstract

The use of keyphrases is a common oratory technique in political discourse, and politicians often guide their statements by recurrently making use of keyphrases. We propose a statistical method for extracting keyphrases at document level, combining simple heuristic rules. We show that our approach can compete with state-of-the-art systems. The method is particularly useful for the study of policy preferences and issue competition, which relies primarily on the analysis of political statements contained in party manifestos and speeches. As a case study, we show an analysis of Portuguese parliamentary debates. We extract the most used keyphrases from each parliamentary group speech collection to detect political issue emphasis. We additionally show how keyphrase clouds can be used as visualization aids to summarize the main addressed political issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://manifesto-project.wzb.eu.

  2. 2.

    We use the POS pattern: <ADJ>* <NOUN>+.

  3. 3.

    In our experiments we used NLTK chunker: https://www.nltk.org/api/nltk.chunk.html.

  4. 4.

    We use the chunker POS pattern: (<NOUN>+ <ADJ>* <PREP>*)? <NOUN>+ <ADJ>*.

  5. 5.

    https://www.nltk.org/_modules/nltk/stem/porter.html.

  6. 6.

    https://www.nltk.org/_modules/nltk/stem/rslp.html.

  7. 7.

    To measure the document size we use the total number of tokens.

  8. 8.

    https://www.nltk.org/.

  9. 9.

    For the English datasets, the keyphrase extraction was performed with pre-trained embeddings for unigrams available in https://github.com/epfml/sent2vec. For Portuguese, we generated fastText [4] embeddings using a compilation of all sentences present in the Gerigonça dataset and the parliamentary speeches used in the case study shown in Sect. 5.

  10. 10.

    http://debates.parlamento.pt.

  11. 11.

    We manually annotated a list of approximately 100 terms.

  12. 12.

    All keyphrases were translated from Portuguese.

References

  1. Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45486-1_4

    Chapter  Google Scholar 

  2. Bateman, S., Gutwin, C., Nacenta, M.: Seeing things in the clouds: the effect of visual features on tag cloud selections. In: Proceedings of the ACM Conference on Hypertext and Hypermedia (2008)

    Google Scholar 

  3. Bennani-Smires, K., Musat, C., Jaggi, M., Hossmann, A., Baeriswyl, M.: EmbedRank: unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018)

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  5. Boudin, F.: pke: an open source python-based keyphrase extraction toolkit. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, pp. 69–73. The COLING 2016 Organizing Committee, Osaka, Japan, December 2016. http://aclweb.org/anthology/C16-2015

  6. Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint arXiv:1803.08721 (2018)

  7. Boudin, F., Mougard, H., Cram, D.: How document pre-processing affects keyphrase extraction performance. arXiv preprint arXiv:1610.07809 (2016)

  8. Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 543–551 (2013)

    Google Scholar 

  9. Branco, A., Silva, J.R.: Evaluating solutions for the rapid development of state-of-the-art PoS taggers for Portuguese. In: LREC (2004)

    Google Scholar 

  10. Chen, J., Zhang, X., Wu, Y., Yan, Z., Li, Z.: Keyphrase generation with correlation constraints. arXiv preprint arXiv:1808.07185 (2018)

  11. Chuang, J., Manning, C.D., Heer, J.: “Without the clutter of unimportant words”: descriptive keyphrases for text visualization. ACM Trans. Comput.-Hum. Interact. 19 (2012)

    Google Scholar 

  12. Danesh, S., Sumner, T., Martin, J.H.: SGRank: combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. In: Lexical and Computational Semantics (* SEM 2015), p. 117 (2015)

    Google Scholar 

  13. El-Beltagy, S.R., Rafea, A.: Kp-miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34(1), 132–144 (2009)

    Article  Google Scholar 

  14. Florescu, C., Caragea, C.: PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1105–1115 (2017)

    Google Scholar 

  15. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: 16th International Joint Conference on Artificial Intelligence (IJCAI 1999), vol. 2, pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)

    Google Scholar 

  16. Green-Pedersen, C.: The growing importance of issue competition: the changing nature of party competition in western Europe. Polit. stud. 55(3), 607–628 (2007)

    Article  Google Scholar 

  17. Green-Pedersen, C., Mortensen, P.B.: Who sets the agenda and who responds to it in the Danish parliament? A new model of issue competition and agenda-setting. Eur. J. Polit. Res. 49(2), 257–281 (2010)

    Article  Google Scholar 

  18. Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 267–297 (2013)

    Google Scholar 

  19. Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, pp. 661–670. ACM (2009)

    Google Scholar 

  20. Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 365–373. Association for Computational Linguistics (2010)

    Google Scholar 

  21. Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: ACL, vol. 1, pp. 1262–1273 (2014)

    Google Scholar 

  22. Hobolt, S.B., De Vries, C.E.: Issue entrepreneurship and multiparty competition. Comp. Pol. Stud. 48(9), 1159–1185 (2015)

    Article  Google Scholar 

  23. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223. Association for Computational Linguistics (2003)

    Google Scholar 

  24. Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 21–26. Association for Computational Linguistics (2010)

    Google Scholar 

  25. Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Automatic keyphrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)

    Article  Google Scholar 

  26. Klingemann, H.D.: Mapping Policy Preferences. Oxford University Press, Oxford (2001)

    Google Scholar 

  27. Klingemann, H.D.: Mapping policy preferences II: estimates for parties, electors, and governments in Eastern Europe, European Union, and OECD 1990–2003, vol. 2. Oxford University Press on Demand (2006)

    Google Scholar 

  28. Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 620–628. Association for Computational Linguistics (2009)

    Google Scholar 

  29. Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 257–266. Association for Computational Linguistics (2009)

    Google Scholar 

  30. Lopez, P., Romary, L.: Humb: automatic key term extraction from scientific articles in grobid. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 248–251. Association for Computational Linguistics (2010)

    Google Scholar 

  31. Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3. pp. 1318–1327. Association for Computational Linguistics (2009)

    Google Scholar 

  32. Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. arXiv preprint arXiv:1704.06879 (2017)

  33. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of Conference on Empirical Methods on Natural Language Processing (2004)

    Google Scholar 

  34. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  35. Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_41

    Chapter  Google Scholar 

  36. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  37. Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised learning of sentence embeddings using compositional n-gram features. In: NAACL 2018 - Conference of the North American Chapter of the Association for Computational Linguistics (2018)

    Google Scholar 

  38. Petrocik, J.R.: Issue ownership in presidential elections, with a 1980 case study. Am. J. Polit. Sci. 825–850 (1996)

    Google Scholar 

  39. Rivadeneira, A.W., Gruen, D.M., Muller, M.J., Millen, D.R.: Getting our head in the clouds: Toward evaluation studies of tagclouds. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2007)

    Google Scholar 

  40. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  41. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)

    Google Scholar 

  42. Vliegenthart, R., Walgrave, S.: Content matters: The dynamics of parliamentary questioning in Belgium and Denmark. Comp. Pol. Stud. 44(8), 1031–1059 (2011)

    Article  Google Scholar 

  43. Wagner, M., Meyer, T.M.: Which issues do parties emphasise? Salience strategies and party organisation in multiparty systems. West Eur. Polit. 37(5), 1019–1045 (2014)

    Article  Google Scholar 

  44. Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the AAAI Conference on Artificial Intelligence (2008)

    Google Scholar 

  45. Wang, R., Liu, W., McDonald, C.: Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software Engineering Research Conference, vol. 39 (2014)

    Google Scholar 

  46. Yih, W.t., Goodman, J., Carvalho, V.R.: Finding advertising keywords on web pages. In: Proceedings of the 15th International Conference on World Wide Web, pp. 213–222. ACM (2006)

    Google Scholar 

Download references

Acknowledgement

This work was supported by Fundação para a Ciência e Tecnologia (FCT), SFRH/BPD/104176/2014 to M. W., SFRH/BPD/86702/2012 to F.R., and FCT funding POCI/01/0145/FEDER/031460 and UID/CEC/50021/2019. We also want to thank the http://geringonca.com/ team for having made their data available.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel Won .

Editor information

Editors and Affiliations

A Appendix

A Appendix

Table 4. Precision, Recall and F-scores for the English datasets of KCRank and comparison with state-of-the-art systems.
Table 5. Precision, Recall and F-scores for the Geringonça datasets of KCRank and comparison with state-of-the-art systems.
Fig. 4.
figure 4

The 40 most relevant key-phrases extracted from PEV speeches collection.

Fig. 5.
figure 5

The 40 most relevant keyphrases extracted from BE speeches collection.

Fig. 6.
figure 6

The 40 most relevant keyphrases extracted from PCP speeches collection.

Fig. 7.
figure 7

The 40 most relevant keyphrases extracted from CDS-PP speeches collection.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Won, M., Martins, B., Raimundo, F. (2023). Automatic Extraction of Relevant Keyphrases for the Study of Issue Competition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24340-0_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24339-4

  • Online ISBN: 978-3-031-24340-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics