Reusable Phrase Extraction Based on Syntactic Parsing

Duan, Xuemin; Zan, Hongying; Bai, Xiaojing; Zahner, Christoph

doi:10.1007/978-3-030-63031-7_33

Xuemin Duan¹⁴,
Hongying Zan¹⁴,
Xiaojing Bai¹⁵ &
…
Christoph Zahner¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12522))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

1066 Accesses

Abstract

Academic Phrasebank is an important resource composed of neutral and generic phrases for academic writers. In this paper, we name these neutral and generic phrases reusable phrases, and student writers use them to organize their research articles. Due to the limited size of Academic Phrasebank, it can not meet all the academic writing needs. There are still a large number of reusable phrases in authentic research articles. In order to make up for the deficiency of Academic Phrasebank, we proposed a reusable phrase extraction model based on constituency parsing and dependency parsing to automatically extract reusable phrases from unlabelled research articles. We divided the proposed model into three main components including a reusable words corpus module, a sentence simplification module, and a syntactic parsing module. We created a reusable words corpus of 2129 words to help judge whether a word is neutral and generic, and created two datasets under two scenarios to verify the feasibility of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Noun Phrase Chunking for Turkish Using a Dependency Parser

Automatic Extraction of Base Noun Phrases from Law Corpus by Stanza

Towards Indonesian Phrase Extraction: Framework and Corpus

References

Davis, M., Morley, J.: Facilitating learning about academic phraseology: teaching activities for student writers. J. Learn. Dev. High. Educ. (2018)
Google Scholar
Davis, M.: The corpus of contemporary American English: 450 million words, 1990-present. (2008)
Google Scholar
Sharma, S.K.: Clause boundary identification for different languages: a survey. Int. J. Comput. Appl. Inf. Technol. 8(2), 152 (2016)
Google Scholar
Sacaleanu, B., Marascu, A., Jochim, C.: Rule-based syntactic approach to claim boundary detection in complex sentences. International Business Machines Corp U.S. Patent 9,652,450 (2017)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Oakey, D.: Phrases in EAP academic writing pedagogy: illuminating Halliday’s influence on research and practice. J. Engl. Acad. Purp. 44, 100829 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Engineering, Zhengzhou University, Zhengzhou, China
Xuemin Duan & Hongying Zan
Language Centre, Tsinghua University, Beijing, China
Xiaojing Bai
University of Cambridge Language Centre, Cambridge, UK
Christoph Zahner

Authors

Xuemin Duan
View author publications
You can also search for this author in PubMed Google Scholar
Hongying Zan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojing Bai
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Zahner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongying Zan .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Peking University, Beijing, China
Sujian Li
Westlake University, Hangzhou, China
Yue Zhang
Tsinghua University, Beijing, China
Yang Liu
Chinese Academy of Sciences, Beijing, China
Shizhu He
Beijing Language and Culture University, Beijing, China
Gaoqi Rao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duan, X., Zan, H., Bai, X., Zahner, C. (2020). Reusable Phrase Extraction Based on Syntactic Parsing. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds) Chinese Computational Linguistics. CCL 2020. Lecture Notes in Computer Science(), vol 12522. Springer, Cham. https://doi.org/10.1007/978-3-030-63031-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-63031-7_33
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63030-0
Online ISBN: 978-3-030-63031-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics