Skip to main content

Stylistic Changes for Temporal Text Classification

  • Conference paper
Text, Speech, and Dialogue (TSD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

Abstract

This paper investigates stylistic changes in a set of Portuguese historical texts ranging from the 17th to the early 20th century and presents a supervised method to classify them per century. Four stylistic features – average sentence length (ASL), average word length (AWL), lexical density (LD), and lexical richness (LR) – were automatically extracted for each sub-corpus. The initial analysis of diachronic changes in these four features revealed that the texts written in the 17th and 18th centuries have similar AWL, LD and LR, which differ significantly from those in the texts written in the 19th and 20th centuries. This information was later used in automatic classification of texts per century, leading to an F-Measure of 0.92.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Joseph, B., Janda, R.: The Handbook of Historical Linguistics. Blackwell Publishing (2003)

    Google Scholar 

  2. Smith, J., Kelly, C.: Stylistic constancy and change across literary corpora: Using measures of lexical richness to date works. Computers and the Humanities 36, 411–430 (2002)

    Article  Google Scholar 

  3. Štajner, S., Mitkov, R.: Diachronic stylistic changes in british and american varieties of 20th century written english language. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, Hissar, Bulgaria, pp. 78–85 (2011)

    Google Scholar 

  4. Zampieri, M., Becker, M.: Colonia: Corpus of historical portuguese. ZSM Studien, Special Volume on Non-Standard Data Sources in Corpus-Based Research 5 (2013)

    Google Scholar 

  5. Leech, G., Hundt, M., Mair, C., Smith, N.: Change in Contemporary English: A Grammatical Study. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  6. Galves, C., Sandalo, F.: Clitic-placement in modern and classical European Portuguese. MIT Working Papers in Linguistics 47, 115–128 (2004)

    Google Scholar 

  7. Britto, H., Finger, M., Galves, C.: Computational and linguistic aspects of the Tycho Brahe parsed corpus of historical portuguese. In: Proceedings of the First Freiburg Workshop on Romance Corpus Linguistics, Freiburg, Germany (2000)

    Google Scholar 

  8. Dalli, A., Wilks, Y.: Automatic dating of documents and temporal text classification. In: Proceedings of the Workshop on Annotating and Reasoning about Time and Events, Sidney, Australia, pp. 17–22 (2006)

    Google Scholar 

  9. Abe, H., Tsumoto, S.: Text categorization with considering temporal patterns of term usages. In: Proceedings of ICDM Workshops, pp. 800–807. IEEE (2010)

    Google Scholar 

  10. Mokhov, S.: A marf approach to deft 2010. In: Proceedings of TALN 2010, Montreal, Canada (2010)

    Google Scholar 

  11. Trieschnigg, D., Hiemstra, D., Theune, M., de Jong, F., Meder, T.: An exploration of language identification techniques for the dutch folktale database. In: Proceedings of LREC 2012 (2012)

    Google Scholar 

  12. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK (1994)

    Google Scholar 

  13. Witten, I., Frank, E.: Data mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers (2005)

    Google Scholar 

  14. John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)

    Google Scholar 

  15. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13, 637–649 (2001)

    Article  MATH  Google Scholar 

  16. Platt, J.C.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning (1998)

    Google Scholar 

  17. Cohen, W.: Fast Effective Rule Induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)

    Google Scholar 

  18. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  19. Zampieri, M., Gebre, B.G.: Automatic identification of language varieties: The case of Portuguese. In: Proceedings of KONVENS 2012, Vienna, Austria, pp. 233–237 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Štajner, S., Zampieri, M. (2013). Stylistic Changes for Temporal Text Classification. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40585-3_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40584-6

  • Online ISBN: 978-3-642-40585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics