Skip to main content

Automatic Terminology Extraction Using a Dependency-Graph in NLP

  • Conference paper
  • First Online:
  • 493 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1372))

Abstract

Automatic Terminology Extraction (ATE) is a technique for extracting phrases representing a dataset. This technique is required for translating specialistic books and documents. An existing method focused on the fact that terminologies tend to be composed of two or more single nouns. However, it does not deal with modification relations but only co-occurrence relations among single nouns. Moreover, we have to consider the fact that phrases defined as terminology tend to be explained in another sentence when we propose a novel approach. In this study, we propose a method for extracting terminologies from a dataset considering the modification relations obtained by dependency analysis. In particular, we propose how to extract features enabling us to distinguish whether or not the phrase is terminology from a dependency structure of a sentence.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Java Automatic Term Extraction: https://github.com/ziqizhang/jate.

  2. 2.

    Research Purpose Use of NTCIR Test Collections or Data Archive/User Agreement: http://research.nii.ac.jp/ntcir/permission/perm-en.html#ntcir-1 (accessed on October 20th, 2020).

  3. 3.

    Gensen Web: http://gensen.dl.itc.u-tokyo.ac.jp/gensenweb_eng.html. (accessed on November 27th, 2020).

  4. 4.

    Japanese Dependency and Case Structure Analyzer KNP: http://nlp.ist.i.kyoto-u.ac.jp/EN/?KNP. (accessed on November 27th, 2020).

  5. 5.

    Kurohashi, Kawahara, and Murawaki Laboratory (2018),

    “Japanese Morphological Analysis System JUMAN++”, http://nlp.ist.i.kyoto-u.ac.jp/index.php?JUMAN++, (November 28th, 2020).

  6. 6.

    Kurohashi, Kawahara, and Murawaki Lab (2018), “Japanese Syntactic, Case, and Linguistic Analysis System KNP”, http://nlp.ist.i.kyoto-u.ac.jp/?KNP, (November 28th, 2020).

  7. 7.

    Hiroshi Nakagawa, Akira Maeda and Hiroyuki Kojima (2003), “Gensen Web” http://gensen.dl.itc.u-tokyo.ac.jp/, (November 28th, 2020).

References

  1. Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000). https://doi.org/10.1007/s007999900023

    Article  Google Scholar 

  2. Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: SemEval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688 (2018)

    Google Scholar 

  3. Justeson, J., Katz, S.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(01), 9–27 (1995). https://doi.org/10.1017/S1351324900000048

    Article  Google Scholar 

  4. Mao, Z., Cromieres, F., Dabre, R., Song, H., Kurohashi, S.: JASS: japanese-specific sequence to sequence pre-training for neural machine translation. (2020)

    Google Scholar 

  5. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)

    Google Scholar 

  6. Nakagawa, H., Mori, T.: Automatic term recognition based on statistics of compound nouns and their components. Terminology 9(2), 201–219 (2003)

    Article  Google Scholar 

  7. Šajatović, A., Buljan, M., Šnajder, J., Dalbelo Bašić, B.: Evaluating automatic term extraction methods on individual documents. In: Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), Association for Computational Linguistics, pp. 149–154 (2019). https://doi.org/10.18653/v1/W19-5118

  8. Sato, S., Sasaki, Y.: Automatic collection of related terms from the web. In: Information Processing Society of Japan Natural Language Processing, pp. 57–64 (2003)

    Google Scholar 

  9. Tanaka, T., Miyao, Y., Asahara, M., Uematsu, S., Kanayama, H., Mori, S., Matsumoto, Y.: Universal dependencies for Japanese. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), pp. 1651–1658 (2016). https://www.aclweb.org/anthology/L16-1261

  10. Terryn, A.R., Hoste, V., Lefever, E.: In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Lang. Resour. Eval. (2019)

    Google Scholar 

  11. Tolmachev, A., Kawahara, D., Kurohashi, S.: Juman++: a morphological analysis toolkit for scriptio continua. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, pp. 54–59 (2018). https://doi.org/10.18653/v1/D18-2010, https://www.aclweb.org/anthology/D18-2010

  12. Wang, M., Zhao, B., Huang, Y.: PTR: phrase-based topical ranking for automatic keyphrase extraction in scientific publications. In: 23rd International Conference on Neural Information Processing. LNCS, vol. 9950, pp. 120–128. Springer International Publishing (2016)

    Google Scholar 

  13. Zhang, Y., Zincir-Heywood, N., Milios, E.: World wide web site summarization. Web Intell. Agent Syst. 2(1), 39–53 (2004)

    Google Scholar 

  14. Yuan, Y., Gao, J., Zhang, Y.: Supervised learning for robust term extraction. In: 2017 International Conference on Asian Language Processing (IALP), pp. 302–305 (2017)

    Google Scholar 

  15. Zhang, Z., Gao, J., Ciravegna, F.: JATE 2.0: java automatic term extraction with apache solr. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), pp. 2262–2269 (2016)

    Google Scholar 

Download references

Acknowledgement

This study was partly supported by JSPS Research Grant (JP19H01138) and a grant for research promotion from the Graduate School of Culture and Information Studies, Doshisha University. The NTCIR1 test collection for terminology extraction research was provided by the National Institute of Informatics. We hereby express my gratitude.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yusuke Kimura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kimura, Y., Kusu, K., Hatano, K., Baba, T. (2021). Automatic Terminology Extraction Using a Dependency-Graph in NLP. In: Abraham, A., Sasaki, H., Rios, R., Gandhi, N., Singh, U., Ma, K. (eds) Innovations in Bio-Inspired Computing and Applications. IBICA 2020. Advances in Intelligent Systems and Computing, vol 1372. Springer, Cham. https://doi.org/10.1007/978-3-030-73603-3_38

Download citation

Publish with us

Policies and ethics