Skip to main content

Improving Measures of Text Reuse in English Poetry: A TF–IDF Based Method

  • Conference paper
  • First Online:
Diversity, Divergence, Dialogue (iConference 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12645))

Included in the following conference series:

Abstract

Text reuse measurement is important for both LIS and literary studies, where it is mainly used to study influence between authors. Although projects such as Tesserae have already adopted computational methods for investigating text reuse in Latin poetry, its potential applications to the rich collections of English poetry have not been realized. This research proposes a modified version of the Tesserae Project’s measure based on the insight embodied in TF–IDF to study English poetry. Using the Irish poet Yeats’ relationship to five English Romantic poets as a test case, three parallel experiments were conducted in order to evaluate the suitability of this method for English poetry. The results show that this new method is effective in measuring text reuse in English poetry, and the TF–IDF based modification is more sensitive to known cases of text reuse than the original method. This method can also be adopted to noncanonical literary works in the future, providing an example of the significance of LIS for digital humanities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Similar to the text reuse rate between two poets, the text reuse rate between two poems is defined as the average value of each phrase-pair of these two poems. Here, average text reuse rates of poem-pairs rather than phrase-pairs were compared, because the commentary discussed the influence within the unit of poem, rather than phrase [19].

  2. 2.

    There are 41 (Yeats’s poems that are recognized as being influenced by Blake) × 216 (all Blake’s poems) = 8856 poem-pairs in this experiment.

  3. 3.

    There are 9 Yeats’s poem that are recognized as being influenced by a particular Blake’s poem, so there are 9 poem-pairs in this experiment.

  4. 4.

    Due to the paper’s methodological focus, we did not discuss the indications of the results in terms of the relationship between Yeats and different English Romantic poets in specific. For preliminary discussions on this relationship, please refer to our previous work [20].

  5. 5.

    For example, if there are two poem-pairs, and in poem-pair A, there is only one phrase-pair with a text reuse rate of 10. In poem-pair B, there are four phrase-pairs, each with a text reuse rate of 1. Then the average value of the two poem-pairs is (10 + 1) / 2 = 5.5, and the average value of the six phrase-pairs is (10 + 1 + 1 + 1 + 1) / 5 = 3. The average value of the poem-pairs is higher, since the shorter poem with a higher text reuse rate contributes more when calculating the average value of poem-pairs than that of phrase-pairs.

References

  1. Citron, D.T., Ginsparg, P.: Patterns of text reuse in a scientific corpus. Proc. Natl. Acad. Sci. 112(1), 25–30 (2015)

    Article  Google Scholar 

  2. Hickey, T.B., O’Neill, E.T., Toves, J.: Experiments with the IFLA functional requirements for bibliographic records (FRBR). D-Lib Magazine 8(9), 1–13 (2002)

    Article  Google Scholar 

  3. Farrell, J.: Intention and intertext. Phoenix 59(1/2), 98–111 (2005)

    Google Scholar 

  4. Fowler, D.: On the shoulders of giants: intertextuality and classical studies. Materiali E Discussioni Per l’analisi Dei Testi Classici 39, 13–34 (1997)

    Article  Google Scholar 

  5. Büchler, M., Burns, P.R., Müller, M., Franzini, E., Franzini, G.: Towards a historical text re-use detection. In: Biemann, C., Mehler, A. (eds.) Text Mining, Theory and Applications of Natural Language Processing, pp. 221–238. Springer, Cham (2014)

    Google Scholar 

  6. Duhaime, D.E.: Textual reuse in the eighteenth century: mining Eliza Haywood’s quotations. Digital Humanities Quarterly 10(1) (2016). https://digitalhumanities.org/dhq/vol/10/1/000229/000229.html

  7. Coffee, N., Koenig, J.-P., Poornima, S., Forstall, C.W., Ossewaarde, R., Jacobson, S.L.: The Tesserae Project: intertextual analysis of Latin poetry. Literary and Linguistic Comput. 28(2), 221–228 (2012)

    Article  Google Scholar 

  8. Bernstein, N., Gervais, K., Lin, W.: Comparative rates of text reuse in classical Latin hexameter poetry. Digital Humanities Quarterly 9(3) (2015). https://digitalhumanities.org/dhq/vol/9/3/000237/000237.html

  9. Forstall, C.W., Coffee, N., Buck, T., Roache, K., Jacobson, S.: Modeling the scholars: detecting intertextuality through enhanced word-level n-gram matching. Literary and Linguistic Comput. 30(4), 503–515 (2014)

    Article  Google Scholar 

  10. Gawley, J.O., Diddams, A.C.: Comparing the intertextuality of multiple authors using Tesserae: a new technique for normalization. Digital Scholarship in the Humanities 32(suppl_2), ii53–ii59 (2017)

    Google Scholar 

  11. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Document. 28(1), 11–21 (1972)

    Article  Google Scholar 

  12. Beel, J., Gipp, G., Langer, S., Breitinger, C.: Research-paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2016)

    Article  Google Scholar 

  13. Yeats, W.B.: The collected poems of W. B. Yeats. 2nd edn. Scribner, New York (1996)

    Google Scholar 

  14. Blake, W.: The poetical works of William Blake. Oxford University Press, London and New York (1908)

    Google Scholar 

  15. Byron, G.G.: Poetry of Byron. Macmillan and Co, London (1881)

    Google Scholar 

  16. Shelley, P.B.: The Complete Poetical Works of Percy Bysshe Shelley. Oxford University Press, Oxford (1925)

    Google Scholar 

  17. Keats, J.: The Poetical Works of John Keats. Macmillan, London (1884)

    Google Scholar 

  18. Wordsworth, W.: The Complete Poetical Works. Macmillan and Co, London (1888)

    Google Scholar 

  19. Jeffares, A.N.: A Commentary on the Collected Poems of W B. Yeats. Stanford University Press, Redwood city (1968)

    Book  Google Scholar 

  20. Shang, W., Zhang, J., Huang, W.: Modelling poetic similarity: a comparative study of W. B. Yeats and the English Romantic poets. DH 2019: Digital Humanities Conference 2019 (2019). https://dev.clariah.nl/files/dh2019/boa/0207.html

  21. Moretti, F.: The slaughterhouse of literature. MLQ: Modern Language Quarterly 61(1), 207–227 (2000)

    Google Scholar 

  22. Bode, K.: The equivalence of “close” and “distant” reading; or, toward a new object for data-rich literary history. Modern Language Quarterly 78(1), 77–106 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenyi Shang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shang, W., Underwood, T. (2021). Improving Measures of Text Reuse in English Poetry: A TF–IDF Based Method. In: Toeppe, K., Yan, H., Chu, S.K.W. (eds) Diversity, Divergence, Dialogue. iConference 2021. Lecture Notes in Computer Science(), vol 12645. Springer, Cham. https://doi.org/10.1007/978-3-030-71292-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71292-1_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71291-4

  • Online ISBN: 978-3-030-71292-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics