Investigating Topic-Agnostic Features for Authorship Tasks in Spanish Political Speeches

Corbara, Silvia; Chulvi Ferriols, Berta; Rosso, Paolo; Moreo, Alejandro

doi:10.1007/978-3-031-08473-7_36

Silvia Corbara¹²,
Berta Chulvi Ferriols^13,14,
Paolo Rosso¹³ &
…
Alejandro Moreo¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13286))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1728 Accesses

Abstract

Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Rhythmic and Psycholinguistic Features for Authorship Tasks in the Spanish Parliament: Evaluation and Analysis

Using Lexical Stress in Authorship Attribution of Historical Texts

Song authorship attribution: a lyrics and rhyme based approach

Article Open access 21 September 2022

Notes

1.
https://www.clarin.si/repository/xmlui/handle/11356/1431.
2.
https://www.nltk.org/.
3.
https://github.com/linhd-postdata/rantanplan.
4.
We employ the Spanish version of the dictionary, which is based on LIWC2007.
5.
We use following categories for each macro-categoy: (i) Yo, Nosotro, TuUtd, ElElla, VosUtds, Ellos, Pasado, Present, Futuro, Subjuntiv, Negacio, Cuantif, Numeros, verbYO, verbTU, verbNOS, verbVos, verbosEL, verbELLOS, formal, informal; (ii) MecCog, Insight, Causa, Discrep, Tentat, Certeza, Inhib, Incl, Excl, Percept, Ver, Oir, Sentir, NoFluen, Relleno, Ingerir, Relativ, Movim; (iii) Maldec, Afect, EmoPos, EmoNeg, Ansiedad, Enfado, Triste, Asentir, Placer. We avoid employing categories that would repeat information already captured by the POS tags, or topic-related categories such as Dinero or Familia.
6.
We also performed preliminary experiments with other learners: SVM showed a remarkably better performance than Random Forest, while no significant differences were noticed between SVM and Logistic Regression.
7.
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html.
8.
https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased. This model obtained better results than the ‘uncased’ version in preliminary experiments.

References

Bevendorff, J., et al.: Overview of PAN 2021: authorship verification, profiling hate speech spreaders on twitter, and style change detection. In: Candan, K.S., et al. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction: 12th International Conference of the CLEF Association, CLEF 2021, Virtual Event, September 21–24, 2021, Proceedings, pp. 419–431. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85251-1_26
Chapter Google Scholar
Bischoff, S., et al.: The importance of suppressing domain style in authorship analysis. arXiv:2005.14714 (2020)
Boyd, R.L.: Mental profile mapping: a psychological single-candidate authorship attribution method. PLoS ONE 13(7), e0200588 (2018)
Article Google Scholar
Corbara, S., Moreo, A., Sebastiani, F.: Syllabic quantity patterns as rhythmic features for Latin authorship attribution. arXiv arXiv:2110.14203 (2021)
Fernández-Cabana, M., Rúas-Araújo, J., Alves-Pérez, M.T.: Psicología, lenguaje y comunicación: Análisis con la herramienta LIWC de los discursos y tweets de los candidatos a las elecciones gallegas. Anuario de Psicología 44(2), 169–184 (2014)
Google Scholar
Halvani, O., Graner, L., Regev, R.: TAVeer: an interpretable topic-agnostic authorship verification method. In: Proceedings of the 15th International Conference on Availability, Reliability and Security, ARES 2020, pp. 1–10 (2020)
Google Scholar
Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of LIWC2015. Technical report (2015)
Google Scholar
Plecháč, P.: Relative contributions of Shakespeare and Fletcher in Henry VIII: an analysis based on most frequent words and most frequent rhythmic patterns. Digit. Sch. Humanit. 36(2), 430–438 (2021)
Article Google Scholar
Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)
Article Google Scholar
Stamatatos, E.: Masking topic-related information to enhance authorship attribution. J. Am. Soc. Inf. Sci. 69(3), 461–473 (2018)
Google Scholar

Download references

Acknowledgment

The work by Silvia Corbara has been carried out during her visit at the Universitat Politècnica de València and has been supported by the AI4Media project, funded by the European Commission (Grant 951911) under the H2020 Programme ICT-48-2020.

The research work by Paolo Rosso was partially funded by the Generalitat Valenciana under DeepPattern (PROMETEO/2019/121).

Author information

Authors and Affiliations

Scuola Normale Superiore, Pisa, Italy
Silvia Corbara
Universitat Politècnica de València, Valencia, Spain
Berta Chulvi Ferriols & Paolo Rosso
Universitat de València, Valencia, Spain
Berta Chulvi Ferriols
CNR, Istituto di Scienza e Tecnologie dell’Informazione, Pisa, Italy
Alejandro Moreo

Authors

Silvia Corbara
View author publications
You can also search for this author in PubMed Google Scholar
Berta Chulvi Ferriols
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Rosso
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Moreo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvia Corbara .

Editor information

Editors and Affiliations

Universitat Politècnica de València, Valencia, Spain
Paolo Rosso
University of Turin, Torino, Italy
Valerio Basile
Universidad Nacional de Educación a Distancia, Madrid, Spain
Raquel Martínez
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Derby, Derby, UK
Farid Meziane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Corbara, S., Chulvi Ferriols, B., Rosso, P., Moreo, A. (2022). Investigating Topic-Agnostic Features for Authorship Tasks in Spanish Political Speeches. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-08473-7_36
Published: 13 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08472-0
Online ISBN: 978-3-031-08473-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Investigating Topic-Agnostic Features for Authorship Tasks in Spanish Political Speeches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rhythmic and Psycholinguistic Features for Authorship Tasks in the Spanish Parliament: Evaluation and Analysis

Using Lexical Stress in Authorship Attribution of Historical Texts

Song authorship attribution: a lyrics and rhyme based approach

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Investigating Topic-Agnostic Features for Authorship Tasks in Spanish Political Speeches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rhythmic and Psycholinguistic Features for Authorship Tasks in the Spanish Parliament: Evaluation and Analysis

Using Lexical Stress in Authorship Attribution of Historical Texts

Song authorship attribution: a lyrics and rhyme based approach

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation