skip to main content
10.1145/3529372.3533301acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
short-paper

TermoPL: a tool for extracting and clustering domain related terms

Published: 20 June 2022 Publication History

Abstract

We present a new version of the terminology extraction tool - TermoPL. This version not only allows the ranking of term candidates but also their semantic grouping. To ensure the results are precise, we use the WordNet lexical database for identifying semantic relations between words. The tool was designed primarily for Polish texts, but the current version is tagset-independent and can be adapted to process texts in other languages. The new semantic grouping feature has been fully implemented for Polish texts, but we plan to make it available for English texts as well.

References

[1]
Gerlof Bouma. 2009. Normalized (Pointwise) Mutual Information in Collocation Extraction. In Proceedings of the Biennial GSCL Conference. Tübingen, 31--40.
[2]
Hernani Costa, Anna Zaretskaya, Corpas Pastor Gloria, and Míriam Seghiri Domínguez. 2016. Nine terminology extraction Tools: Are they useful for translators? MultiLingual (2016).
[3]
Damien Cram and Béatrice Daille. 2016. Terminology Extraction with Term Variant Detection. In Proceedings of ACL-2016 System Demonstrations. Association for Computational Linguistics, Berlin, Germany, 13--18.
[4]
Béatrice Daille. 2017. Term variation in specialised corpora: characterisation, automatic discovery and applications. John Benjamins Publishing Company, New york.
[5]
Agnieszka Dziob, Maciej Piasecki, and Ewa Rudnicka. 2019. plWordNet 4.1-a Linguistically Motivated, Corpus-based Bilingual Resource. In Proceedings of the 10th Global WordNet Conference: July 23--27, 2019, Wroclaw (Poland), C. Fellbaum, P. Vossen, E. Rudnicka, M. Maziarz, and M. Piasecki (Eds.). Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław, 353--362.
[6]
Katerina Frantzi, Sophia Ananiadou, and Hideki Mima. 2000. Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method. Int. Journal on Digital Libraries 3 (2000), 115--130.
[7]
Konrad Gliściński. 2015. Wszystkie Prawa Zastrzeżone. wolnelektury.pl. https://wolnelektury.pl/media/book/pdf/gliscinski-dyskursy-prawa-autorskiego.pdf
[8]
Kuo-Chuan Huang, James Geller, Michael Halper, Yehoshua Perl, and Junchuan Xu. 2009. Using WordNet synonym substitution to enhance UMLS source integration. Artif. Intell. Medicine 46, 2 (2009), 97--109.
[9]
Wojciech Jaworski, Małgorzata Marciniak, and Agnieszka Mykowiecka. 2021. Side Effect Alerts Generation from EHR in Polish. In Computational Science - ICCS 2021, Maciej Paszynski, Dieter Kranzlmüller, Valeria V. Krzhizhanovskaya, Jack J. Dongarra, and Peter M.A. Sloot (Eds.). Springer International Publishing, Cham, 634--647. https://link.springer.com/chapter/10.1007/978-3-030-77967-2_52
[10]
Małgorzata Marciniak and Agnieszka Mykowiecka. 2014. Terminology extraction from medical texts in Polish. Journal of Biomedical Semantics 5 (2014). http://www.jbiomedsem.com/content/5/1/24
[11]
Małgorzata Marciniak, Agnieszka Mykowiecka, and Piotr Rychlik. 2016. TermoPL --- a Flexible Tool for Terminology Extraction. In Proceedings of LREC. ELRA, Portorož, Slovenia, 2278--2284.
[12]
Małgorzata Marciniak, Agnieszka Mykowiecka, and Piotr Rychlik. 2021. Terminology/Keyphrase Extraction for Creation of Book Indexes in Polish. In Linking Theory and Practice of Digital Libraries, Gerd Berget, Mark Michael Hall, Daniel Brenn, and Sanna Kumpulainen (Eds.). Springer International Publishing, Cham, 49--54. https://link.springer.com/chapter/10.1007/978-3-030-86324-1_5
[13]
Goran Nenadic and Sophia Ananiadou. 2006. Mining semantically related terms from biomedical literature. ACM Trans. Asian Lang. Inf. Process. 5, 1 (2006), 22--43.

Cited By

View all

Index Terms

  1. TermoPL: a tool for extracting and clustering domain related terms

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
      June 2022
      392 pages
      ISBN:9781450393454
      DOI:10.1145/3529372
      • General Chairs:
      • Akiko Aizawa,
      • Thomas Mandl,
      • Zeljko Carevic,
      • Program Chairs:
      • Annika Hinze,
      • Philipp Mayr,
      • Philipp Schaer
      © 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Sponsors

      In-Cooperation

      • IEEE Technical Committee on Digital Libraries (TC DL)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering
      2. term extraction
      3. term hierarchy

      Qualifiers

      • Short-paper

      Conference

      JCDL '22
      Sponsor:

      Acceptance Rates

      JCDL '22 Paper Acceptance Rate 35 of 132 submissions, 27%;
      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media