Abstract
Text reuse measurement is important for both LIS and literary studies, where it is mainly used to study influence between authors. Although projects such as Tesserae have already adopted computational methods for investigating text reuse in Latin poetry, its potential applications to the rich collections of English poetry have not been realized. This research proposes a modified version of the Tesserae Project’s measure based on the insight embodied in TF–IDF to study English poetry. Using the Irish poet Yeats’ relationship to five English Romantic poets as a test case, three parallel experiments were conducted in order to evaluate the suitability of this method for English poetry. The results show that this new method is effective in measuring text reuse in English poetry, and the TF–IDF based modification is more sensitive to known cases of text reuse than the original method. This method can also be adopted to noncanonical literary works in the future, providing an example of the significance of LIS for digital humanities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Similar to the text reuse rate between two poets, the text reuse rate between two poems is defined as the average value of each phrase-pair of these two poems. Here, average text reuse rates of poem-pairs rather than phrase-pairs were compared, because the commentary discussed the influence within the unit of poem, rather than phrase [19].
- 2.
There are 41 (Yeats’s poems that are recognized as being influenced by Blake) × 216 (all Blake’s poems) = 8856 poem-pairs in this experiment.
- 3.
There are 9 Yeats’s poem that are recognized as being influenced by a particular Blake’s poem, so there are 9 poem-pairs in this experiment.
- 4.
Due to the paper’s methodological focus, we did not discuss the indications of the results in terms of the relationship between Yeats and different English Romantic poets in specific. For preliminary discussions on this relationship, please refer to our previous work [20].
- 5.
For example, if there are two poem-pairs, and in poem-pair A, there is only one phrase-pair with a text reuse rate of 10. In poem-pair B, there are four phrase-pairs, each with a text reuse rate of 1. Then the average value of the two poem-pairs is (10 + 1) / 2 = 5.5, and the average value of the six phrase-pairs is (10 + 1 + 1 + 1 + 1) / 5 = 3. The average value of the poem-pairs is higher, since the shorter poem with a higher text reuse rate contributes more when calculating the average value of poem-pairs than that of phrase-pairs.
References
Citron, D.T., Ginsparg, P.: Patterns of text reuse in a scientific corpus. Proc. Natl. Acad. Sci. 112(1), 25–30 (2015)
Hickey, T.B., O’Neill, E.T., Toves, J.: Experiments with the IFLA functional requirements for bibliographic records (FRBR). D-Lib Magazine 8(9), 1–13 (2002)
Farrell, J.: Intention and intertext. Phoenix 59(1/2), 98–111 (2005)
Fowler, D.: On the shoulders of giants: intertextuality and classical studies. Materiali E Discussioni Per l’analisi Dei Testi Classici 39, 13–34 (1997)
Büchler, M., Burns, P.R., Müller, M., Franzini, E., Franzini, G.: Towards a historical text re-use detection. In: Biemann, C., Mehler, A. (eds.) Text Mining, Theory and Applications of Natural Language Processing, pp. 221–238. Springer, Cham (2014)
Duhaime, D.E.: Textual reuse in the eighteenth century: mining Eliza Haywood’s quotations. Digital Humanities Quarterly 10(1) (2016). https://digitalhumanities.org/dhq/vol/10/1/000229/000229.html
Coffee, N., Koenig, J.-P., Poornima, S., Forstall, C.W., Ossewaarde, R., Jacobson, S.L.: The Tesserae Project: intertextual analysis of Latin poetry. Literary and Linguistic Comput. 28(2), 221–228 (2012)
Bernstein, N., Gervais, K., Lin, W.: Comparative rates of text reuse in classical Latin hexameter poetry. Digital Humanities Quarterly 9(3) (2015). https://digitalhumanities.org/dhq/vol/9/3/000237/000237.html
Forstall, C.W., Coffee, N., Buck, T., Roache, K., Jacobson, S.: Modeling the scholars: detecting intertextuality through enhanced word-level n-gram matching. Literary and Linguistic Comput. 30(4), 503–515 (2014)
Gawley, J.O., Diddams, A.C.: Comparing the intertextuality of multiple authors using Tesserae: a new technique for normalization. Digital Scholarship in the Humanities 32(suppl_2), ii53–ii59 (2017)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Document. 28(1), 11–21 (1972)
Beel, J., Gipp, G., Langer, S., Breitinger, C.: Research-paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2016)
Yeats, W.B.: The collected poems of W. B. Yeats. 2nd edn. Scribner, New York (1996)
Blake, W.: The poetical works of William Blake. Oxford University Press, London and New York (1908)
Byron, G.G.: Poetry of Byron. Macmillan and Co, London (1881)
Shelley, P.B.: The Complete Poetical Works of Percy Bysshe Shelley. Oxford University Press, Oxford (1925)
Keats, J.: The Poetical Works of John Keats. Macmillan, London (1884)
Wordsworth, W.: The Complete Poetical Works. Macmillan and Co, London (1888)
Jeffares, A.N.: A Commentary on the Collected Poems of W B. Yeats. Stanford University Press, Redwood city (1968)
Shang, W., Zhang, J., Huang, W.: Modelling poetic similarity: a comparative study of W. B. Yeats and the English Romantic poets. DH 2019: Digital Humanities Conference 2019 (2019). https://dev.clariah.nl/files/dh2019/boa/0207.html
Moretti, F.: The slaughterhouse of literature. MLQ: Modern Language Quarterly 61(1), 207–227 (2000)
Bode, K.: The equivalence of “close” and “distant” reading; or, toward a new object for data-rich literary history. Modern Language Quarterly 78(1), 77–106 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Shang, W., Underwood, T. (2021). Improving Measures of Text Reuse in English Poetry: A TF–IDF Based Method. In: Toeppe, K., Yan, H., Chu, S.K.W. (eds) Diversity, Divergence, Dialogue. iConference 2021. Lecture Notes in Computer Science(), vol 12645. Springer, Cham. https://doi.org/10.1007/978-3-030-71292-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-71292-1_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71291-4
Online ISBN: 978-3-030-71292-1
eBook Packages: Computer ScienceComputer Science (R0)