Automatic Terminology Extraction Using a Dependency-Graph in NLP

Kimura, Yusuke; Kusu, Kazuma; Hatano, Kenji; Baba, Tokiya

doi:10.1007/978-3-030-73603-3_38

Automatic Terminology Extraction Using a Dependency-Graph in NLP

Conference paper
First Online: 10 April 2021

493 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1372))

Abstract

Automatic Terminology Extraction (ATE) is a technique for extracting phrases representing a dataset. This technique is required for translating specialistic books and documents. An existing method focused on the fact that terminologies tend to be composed of two or more single nouns. However, it does not deal with modification relations but only co-occurrence relations among single nouns. Moreover, we have to consider the fact that phrases defined as terminology tend to be explained in another sentence when we propose a novel approach. In this study, we propose a method for extracting terminologies from a dataset considering the modification relations obtained by dependency analysis. In particular, we propose how to extract features enabling us to distinguish whether or not the phrase is terminology from a dependency structure of a sentence.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Java Automatic Term Extraction: https://github.com/ziqizhang/jate.
2.
Research Purpose Use of NTCIR Test Collections or Data Archive/User Agreement: http://research.nii.ac.jp/ntcir/permission/perm-en.html#ntcir-1 (accessed on October 20th, 2020).
3.
Gensen Web: http://gensen.dl.itc.u-tokyo.ac.jp/gensenweb_eng.html. (accessed on November 27th, 2020).
4.
Japanese Dependency and Case Structure Analyzer KNP: http://nlp.ist.i.kyoto-u.ac.jp/EN/?KNP. (accessed on November 27th, 2020).
5.
Kurohashi, Kawahara, and Murawaki Laboratory (2018),
“Japanese Morphological Analysis System JUMAN++”, http://nlp.ist.i.kyoto-u.ac.jp/index.php?JUMAN++, (November 28th, 2020).
6.
Kurohashi, Kawahara, and Murawaki Lab (2018), “Japanese Syntactic, Case, and Linguistic Analysis System KNP”, http://nlp.ist.i.kyoto-u.ac.jp/?KNP, (November 28th, 2020).
7.
Hiroshi Nakagawa, Akira Maeda and Hiroyuki Kojima (2003), “Gensen Web” http://gensen.dl.itc.u-tokyo.ac.jp/, (November 28th, 2020).

References

Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000). https://doi.org/10.1007/s007999900023
Article Google Scholar
Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: SemEval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688 (2018)
Google Scholar
Justeson, J., Katz, S.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(01), 9–27 (1995). https://doi.org/10.1017/S1351324900000048
Article Google Scholar
Mao, Z., Cromieres, F., Dabre, R., Song, H., Kurohashi, S.: JASS: japanese-specific sequence to sequence pre-training for neural machine translation. (2020)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Nakagawa, H., Mori, T.: Automatic term recognition based on statistics of compound nouns and their components. Terminology 9(2), 201–219 (2003)
Article Google Scholar
Šajatović, A., Buljan, M., Šnajder, J., Dalbelo Bašić, B.: Evaluating automatic term extraction methods on individual documents. In: Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), Association for Computational Linguistics, pp. 149–154 (2019). https://doi.org/10.18653/v1/W19-5118
Sato, S., Sasaki, Y.: Automatic collection of related terms from the web. In: Information Processing Society of Japan Natural Language Processing, pp. 57–64 (2003)
Google Scholar
Tanaka, T., Miyao, Y., Asahara, M., Uematsu, S., Kanayama, H., Mori, S., Matsumoto, Y.: Universal dependencies for Japanese. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), pp. 1651–1658 (2016). https://www.aclweb.org/anthology/L16-1261
Terryn, A.R., Hoste, V., Lefever, E.: In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Lang. Resour. Eval. (2019)
Google Scholar
Tolmachev, A., Kawahara, D., Kurohashi, S.: Juman++: a morphological analysis toolkit for scriptio continua. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, pp. 54–59 (2018). https://doi.org/10.18653/v1/D18-2010, https://www.aclweb.org/anthology/D18-2010
Wang, M., Zhao, B., Huang, Y.: PTR: phrase-based topical ranking for automatic keyphrase extraction in scientific publications. In: 23rd International Conference on Neural Information Processing. LNCS, vol. 9950, pp. 120–128. Springer International Publishing (2016)
Google Scholar
Zhang, Y., Zincir-Heywood, N., Milios, E.: World wide web site summarization. Web Intell. Agent Syst. 2(1), 39–53 (2004)
Google Scholar
Yuan, Y., Gao, J., Zhang, Y.: Supervised learning for robust term extraction. In: 2017 International Conference on Asian Language Processing (IALP), pp. 302–305 (2017)
Google Scholar
Zhang, Z., Gao, J., Ciravegna, F.: JATE 2.0: java automatic term extraction with apache solr. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), pp. 2262–2269 (2016)
Google Scholar

Download references

Acknowledgement

This study was partly supported by JSPS Research Grant (JP19H01138) and a grant for research promotion from the Graduate School of Culture and Information Studies, Doshisha University. The NTCIR1 test collection for terminology extraction research was provided by the National Institute of Informatics. We hereby express my gratitude.

Author information

Authors and Affiliations

Graduate School of Culture and Information Science, Doshisha University, 1–3 Tatara-Miyakodani, Kyotanabe, Kyoto, 610-0394, Japan
Yusuke Kimura, Kazuma Kusu & Tokiya Baba
Doshisha University, 1–3 Tatara-Miyakodani, Kyotanabe, Kyoto, 610-0394, Japan
Kenji Hatano

Authors

Yusuke Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Kazuma Kusu
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Hatano
View author publications
You can also search for this author in PubMed Google Scholar
Tokiya Baba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yusuke Kimura .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
National Institute of Information and Communications Technology (NICT), Koganei, Tokyo, Japan
Hideyasu Sasaki
Universidade Federal da Bahia, Salvador, Brazil
Ricardo Rios
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
Institute of Technology and Science, Ghaziabad, Uttar Pradesh, India
Umang Singh
School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
Kun Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kimura, Y., Kusu, K., Hatano, K., Baba, T. (2021). Automatic Terminology Extraction Using a Dependency-Graph in NLP. In: Abraham, A., Sasaki, H., Rios, R., Gandhi, N., Singh, U., Ma, K. (eds) Innovations in Bio-Inspired Computing and Applications. IBICA 2020. Advances in Intelligent Systems and Computing, vol 1372. Springer, Cham. https://doi.org/10.1007/978-3-030-73603-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-73603-3_38
Published: 10 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73602-6
Online ISBN: 978-3-030-73603-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics