Assessing the Effect of Text Type on the Choice of Linguistic Mechanisms in Scientific Publications

Ivanova, Iverina

doi:10.1007/978-3-031-50628-4_9

Iverina Ivanova ORCID: orcid.org/0000-0003-2026-9448¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14354))

Included in the following conference series:

European Summer School in Logic, Language and Information

48 Accesses

Abstract

In this paper, we report a qualitative and quantitative evaluation of a hand-crafted set of discourse features and their interaction with different text types. To be more specific, we compared two distinct text types—scientific abstracts and their accompanying full texts—in terms of linguistic properties, which include, among others, sentence length, coreference information, noun density, self-mentions, noun phrase count, and noun phrase complexity. Our findings suggest that abstracts and full texts differ in three mechanisms which are size and purpose bound. In abstracts, nouns tend to be more densely distributed, which indicates that there is a smaller distance between noun occurrences to be observed because of the compact size of abstracts. Furthermore, in abstracts we find a higher frequency of personal and possessive pronouns which authors use to make references to themselves. In contrast, in full texts we observe a higher frequency of noun phrases. These findings are our first attempt to identify text type motivated linguistic features that can help us draw clearer text type boundaries. These features could be used as parameters during the construction of systems for writing evaluation that could assist both tutors and students in text analysis, or as guides in linguistically-controllable neural text generation systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://openai.com/blog/better-language-models/.
2.
https://transformer.huggingface.co/ (July 4 2020).
3.
https://aclanthology.org/.
4.
https://stanfordnlp.github.io/CoreNLP/index.html.

References

Ahmad, J.: Stylistic features of scientific English: a study of scientific research articles. English Lang. Literat. Stud. 2(1), 47–55 (2012). https://doi.org/10.5539/ells.v2n1p47
Article Google Scholar
Benz, A., Jasinskaja, K.: Questions under discussion: from sentence to discourse. Discourse Proc. 54(3), 177–186 (2017). https://doi.org/10.1080/0163853X.2017.1316038. (04.07.2020)
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). https://arxiv.org/abs/1810.04805 (04.07.2020)
Flower, L., Hayes, J.R.: A cognitive process theory of writing. College Compos. Commun. 32(4), 365–387 (1981)
Article Google Scholar
Grosz, B.J., Joshi, A.K., Weinstein, S.: Centering: a framework for modeling the local coherence of discourse. Comput. Linguist. 21(2), 203–225 (1995)
Google Scholar
Halliday, M., Hasan, R.: Cohesion in English. Longman Group Ltd London (1976)
Google Scholar
Hyland, K.: Humble servants of the discipline? self-mention in research articles. Engl. Specif. Purp. 20(3), 207–226 (2001). https://doi.org/10.1016/S0889-4906(00)00012-0. (04.07.2020)
Article Google Scholar
Jin, C., He, B., Hui, K., Sun, L.: TDNN: a two stage deep neural network for prompt-independent automated essay scoring. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1, Long Papers), pp. 1088–1097 (2018). https://doi.org/10.18653/v1/P18-1100(04.07.2020)
Kalpić, D., Hlupić, N., Lovrić, M.: Student’s tTests, pp. 1559–1563. Springer, Berlin (2011)
Google Scholar
Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C., Socher, R.: CTRL: a conditional transformer language model for controllable generation (2019). https://doi.org/10.48550/arXiv.1909.05858 (Oct 7 2020)
McNamara, D.S., Crossley, S.A., Mccarthy, P.M.: Linguistic features of writing quality. Written Commun. 27(1), 57–86 (2009). https://doi.org/10.1177/0741088309351547
McNamara, D.S. Graesser, A.C.: Coh-metrix: an automated tool for theoretical and applied natural language processing. In McCarthy, P., Boonthum-Denecke, C. (eds.) Applied Natural Language Processing: Identication, Investigation and Resolution, pp. 188–205. IGI Global, Hershey, PA (2011). https://doi.org/10.4018/978-1-60960-741-8.ch011(10.07.2020)
Orasan, C.: Patterns in scientific abstracts. In: Proceedings Corpus Linguistics, pp. 433–445 (2001)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: . Language models are unsupervised multitask learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (4 July 2020)
Rey, D., Neuhäuser, M.: Wilcoxon-Signed-Rank Test. In: International Encyclopedia of Statistical Science, pp. 1658–1659. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (July 4 2020)
von Stutterheim, C., Klein, W.: Referential movement in descriptive and narrative discourse, 54, 39–76 North-Holland Linguistic Series: Linguistic Variations. Elsevier (1989). https://doi.org/10.1016/B978-0-444-87144-2.50005-7
Witte, S.P., Faigley, L.: Coherence, cohesion, and writing quality. Coll. Compos. Commun. 32(2), 189–204 (1981)
Article Google Scholar
Wolf, T., et al.: HuggingFace’s transformers: State-of-the-art natural language processing (2019). https://arxiv.org/abs/1910.03771v4 (4 July 2020)
Yazilarda, A., İşaret, Y., Kullanłmł, E.S., Kafes, H.: The use of authorial self-mention words in academic writing. Inter. J. Language Academy 5(3), 165–180 (2017). https://doi.org/10.18033/ijla.3532

Download references

Acknowledgements

I would like to thank Dr. Niko Schenk for his constant support, inspiration, and supervision of the extracted and analyzed data. Sincere gratitude to Prof. Gert Webelhuth, Dr. Janina Radó, and Prof. Manfred Sailer for their guidance and assistance in the interpretation of the research results.

Author information

Authors and Affiliations

Goethe University Frankfurt, Norbert-Wollheim-Platz 1, 60323, Frankfurt am Main, Germany
Iverina Ivanova

Authors

Iverina Ivanova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iverina Ivanova .

Editor information

Editors and Affiliations

Institute of Logic and Computation, TU Wien, Vienna, Austria
Alexandra Pavlova
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Mina Young Pedersen
University of Trento, Rovereto, Italy
Raffaella Bernardi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 231 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ivanova, I. (2024). Assessing the Effect of Text Type on the Choice of Linguistic Mechanisms in Scientific Publications. In: Pavlova, A., Pedersen, M.Y., Bernardi, R. (eds) Selected Reflections in Language, Logic, and Information. ESSLLI 2019. Lecture Notes in Computer Science, vol 14354. Springer, Cham. https://doi.org/10.1007/978-3-031-50628-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-50628-4_9
Published: 28 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50627-7
Online ISBN: 978-3-031-50628-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Association of Logic, Language and Information. (opens in a new tab)

Assessing the Effect of Text Type on the Choice of Linguistic Mechanisms in Scientific Publications