Skip to main content

Reusable Phrase Extraction Based on Syntactic Parsing

  • Conference paper
  • First Online:
Chinese Computational Linguistics (CCL 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12522))

Included in the following conference series:

  • 1055 Accesses

Abstract

Academic Phrasebank is an important resource composed of neutral and generic phrases for academic writers. In this paper, we name these neutral and generic phrases reusable phrases, and student writers use them to organize their research articles. Due to the limited size of Academic Phrasebank, it can not meet all the academic writing needs. There are still a large number of reusable phrases in authentic research articles. In order to make up for the deficiency of Academic Phrasebank, we proposed a reusable phrase extraction model based on constituency parsing and dependency parsing to automatically extract reusable phrases from unlabelled research articles. We divided the proposed model into three main components including a reusable words corpus module, a sentence simplification module, and a syntactic parsing module. We created a reusable words corpus of 2129 words to help judge whether a word is neutral and generic, and created two datasets under two scenarios to verify the feasibility of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Davis, M., Morley, J.: Facilitating learning about academic phraseology: teaching activities for student writers. J. Learn. Dev. High. Educ. (2018)

    Google Scholar 

  2. Davis, M.: The corpus of contemporary American English: 450 million words, 1990-present. (2008)

    Google Scholar 

  3. Sharma, S.K.: Clause boundary identification for different languages: a survey. Int. J. Comput. Appl. Inf. Technol. 8(2), 152 (2016)

    Google Scholar 

  4. Sacaleanu, B., Marascu, A., Jochim, C.: Rule-based syntactic approach to claim boundary detection in complex sentences. International Business Machines Corp U.S. Patent 9,652,450 (2017)

    Google Scholar 

  5. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  6. Oakey, D.: Phrases in EAP academic writing pedagogy: illuminating Halliday’s influence on research and practice. J. Engl. Acad. Purp. 44, 100829 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongying Zan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Duan, X., Zan, H., Bai, X., Zahner, C. (2020). Reusable Phrase Extraction Based on Syntactic Parsing. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds) Chinese Computational Linguistics. CCL 2020. Lecture Notes in Computer Science(), vol 12522. Springer, Cham. https://doi.org/10.1007/978-3-030-63031-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63031-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63030-0

  • Online ISBN: 978-3-030-63031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics