Automatic Extraction of Relevant Keyphrases for the Study of Issue Competition

Won, Miguel; Martins, Bruno; Raimundo, Filipa

doi:10.1007/978-3-031-24340-0_48

Miguel Won⁸,
Bruno Martins⁸ &
Filipa Raimundo⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13452))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

364 Accesses

Abstract

The use of keyphrases is a common oratory technique in political discourse, and politicians often guide their statements by recurrently making use of keyphrases. We propose a statistical method for extracting keyphrases at document level, combining simple heuristic rules. We show that our approach can compete with state-of-the-art systems. The method is particularly useful for the study of policy preferences and issue competition, which relies primarily on the analysis of political statements contained in party manifestos and speeches. As a case study, we show an analysis of Portuguese parliamentary debates. We extract the most used keyphrases from each parliamentary group speech collection to detect political issue emphasis. We additionally show how keyphrase clouds can be used as visualization aids to summarize the main addressed political issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://manifesto-project.wzb.eu.
2.
We use the POS pattern: <ADJ>* <NOUN>+.
3.
In our experiments we used NLTK chunker: https://www.nltk.org/api/nltk.chunk.html.
4.
We use the chunker POS pattern: (<NOUN>+ <ADJ>* <PREP>*)? <NOUN>+ <ADJ>*.
5.
https://www.nltk.org/_modules/nltk/stem/porter.html.
6.
https://www.nltk.org/_modules/nltk/stem/rslp.html.
7.
To measure the document size we use the total number of tokens.
8.
https://www.nltk.org/.
9.
For the English datasets, the keyphrase extraction was performed with pre-trained embeddings for unigrams available in https://github.com/epfml/sent2vec. For Portuguese, we generated fastText [4] embeddings using a compilation of all sentences present in the Gerigonça dataset and the parliamentary speeches used in the case study shown in Sect. 5.
10.
http://debates.parlamento.pt.
11.
We manually annotated a list of approximately 100 terms.
12.
All keyphrases were translated from Portuguese.

References

Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45486-1_4
Chapter Google Scholar
Bateman, S., Gutwin, C., Nacenta, M.: Seeing things in the clouds: the effect of visual features on tag cloud selections. In: Proceedings of the ACM Conference on Hypertext and Hypermedia (2008)
Google Scholar
Bennani-Smires, K., Musat, C., Jaggi, M., Hossmann, A., Baeriswyl, M.: EmbedRank: unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Boudin, F.: pke: an open source python-based keyphrase extraction toolkit. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, pp. 69–73. The COLING 2016 Organizing Committee, Osaka, Japan, December 2016. http://aclweb.org/anthology/C16-2015
Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint arXiv:1803.08721 (2018)
Boudin, F., Mougard, H., Cram, D.: How document pre-processing affects keyphrase extraction performance. arXiv preprint arXiv:1610.07809 (2016)
Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 543–551 (2013)
Google Scholar
Branco, A., Silva, J.R.: Evaluating solutions for the rapid development of state-of-the-art PoS taggers for Portuguese. In: LREC (2004)
Google Scholar
Chen, J., Zhang, X., Wu, Y., Yan, Z., Li, Z.: Keyphrase generation with correlation constraints. arXiv preprint arXiv:1808.07185 (2018)
Chuang, J., Manning, C.D., Heer, J.: “Without the clutter of unimportant words”: descriptive keyphrases for text visualization. ACM Trans. Comput.-Hum. Interact. 19 (2012)
Google Scholar
Danesh, S., Sumner, T., Martin, J.H.: SGRank: combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. In: Lexical and Computational Semantics (* SEM 2015), p. 117 (2015)
Google Scholar
El-Beltagy, S.R., Rafea, A.: Kp-miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34(1), 132–144 (2009)
Article Google Scholar
Florescu, C., Caragea, C.: PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1105–1115 (2017)
Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: 16th International Joint Conference on Artificial Intelligence (IJCAI 1999), vol. 2, pp. 668–673. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)
Google Scholar
Green-Pedersen, C.: The growing importance of issue competition: the changing nature of party competition in western Europe. Polit. stud. 55(3), 607–628 (2007)
Article Google Scholar
Green-Pedersen, C., Mortensen, P.B.: Who sets the agenda and who responds to it in the Danish parliament? A new model of issue competition and agenda-setting. Eur. J. Polit. Res. 49(2), 257–281 (2010)
Article Google Scholar
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 267–297 (2013)
Google Scholar
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, pp. 661–670. ACM (2009)
Google Scholar
Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 365–373. Association for Computational Linguistics (2010)
Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: ACL, vol. 1, pp. 1262–1273 (2014)
Google Scholar
Hobolt, S.B., De Vries, C.E.: Issue entrepreneurship and multiparty competition. Comp. Pol. Stud. 48(9), 1159–1185 (2015)
Article Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223. Association for Computational Linguistics (2003)
Google Scholar
Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 21–26. Association for Computational Linguistics (2010)
Google Scholar
Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Automatic keyphrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)
Article Google Scholar
Klingemann, H.D.: Mapping Policy Preferences. Oxford University Press, Oxford (2001)
Google Scholar
Klingemann, H.D.: Mapping policy preferences II: estimates for parties, electors, and governments in Eastern Europe, European Union, and OECD 1990–2003, vol. 2. Oxford University Press on Demand (2006)
Google Scholar
Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 620–628. Association for Computational Linguistics (2009)
Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 257–266. Association for Computational Linguistics (2009)
Google Scholar
Lopez, P., Romary, L.: Humb: automatic key term extraction from scientific articles in grobid. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 248–251. Association for Computational Linguistics (2010)
Google Scholar
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3. pp. 1318–1327. Association for Computational Linguistics (2009)
Google Scholar
Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. arXiv preprint arXiv:1704.06879 (2017)
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of Conference on Empirical Methods on Natural Language Processing (2004)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_41
Chapter Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised learning of sentence embeddings using compositional n-gram features. In: NAACL 2018 - Conference of the North American Chapter of the Association for Computational Linguistics (2018)
Google Scholar
Petrocik, J.R.: Issue ownership in presidential elections, with a 1980 case study. Am. J. Polit. Sci. 825–850 (1996)
Google Scholar
Rivadeneira, A.W., Gruen, D.M., Muller, M.J., Millen, D.R.: Getting our head in the clouds: Toward evaluation studies of tagclouds. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2007)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)
Google Scholar
Vliegenthart, R., Walgrave, S.: Content matters: The dynamics of parliamentary questioning in Belgium and Denmark. Comp. Pol. Stud. 44(8), 1031–1059 (2011)
Article Google Scholar
Wagner, M., Meyer, T.M.: Which issues do parties emphasise? Salience strategies and party organisation in multiparty systems. West Eur. Polit. 37(5), 1019–1045 (2014)
Article Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the AAAI Conference on Artificial Intelligence (2008)
Google Scholar
Wang, R., Liu, W., McDonald, C.: Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software Engineering Research Conference, vol. 39 (2014)
Google Scholar
Yih, W.t., Goodman, J., Carvalho, V.R.: Finding advertising keywords on web pages. In: Proceedings of the 15th International Conference on World Wide Web, pp. 213–222. ACM (2006)
Google Scholar

Download references

Acknowledgement

This work was supported by Fundação para a Ciência e Tecnologia (FCT), SFRH/BPD/104176/2014 to M. W., SFRH/BPD/86702/2012 to F.R., and FCT funding POCI/01/0145/FEDER/031460 and UID/CEC/50021/2019. We also want to thank the http://geringonca.com/ team for having made their data available.

Author information

Authors and Affiliations

INESC-ID, University of Lisbon, Lisbon, Portugal
Miguel Won & Bruno Martins
ICS, University of Lisbon, Lisbon, Portugal
Filipa Raimundo

Authors

Miguel Won
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Martins
View author publications
You can also search for this author in PubMed Google Scholar
Filipa Raimundo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel Won .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

A Appendix

Table 4. Precision, Recall and F-scores for the English datasets of KCRank and comparison with state-of-the-art systems.

Full size table

Table 5. Precision, Recall and F-scores for the Geringonça datasets of KCRank and comparison with state-of-the-art systems.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Won, M., Martins, B., Raimundo, F. (2023). Automatic Extraction of Relevant Keyphrases for the Study of Issue Competition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_48

Download citation

DOI: https://doi.org/10.1007/978-3-031-24340-0_48
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24339-4
Online ISBN: 978-3-031-24340-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Extraction of Relevant Keyphrases for the Study of Issue Competition

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation