skip to main content
10.1145/3078081.3078088acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdatechConference Proceedingsconference-collections
research-article

Parsing Romanian Specialized Dictionaries Structured in Nests

Published: 01 June 2017 Publication History

Abstract

This paper presents a tool for processing dictionaries in Word format and for obtaining the XML format which can be used in various applications. DEPAR (Dictionary Entry Parser) permits the introduction of a specific set of rules to describe the structure of entries of any dictionary. A TEI (Text Encoding Initiative) encoding standard was issued to establish the tags of the XML format. Three dictionaries have been processed with DEPAR. They are structured in nests, i.e. the words are grouped around a root word. DEPAR is a useful tool for lexicographers. After the construction and introduction of the grammar (dictionary entry structure) in the parser, the lexicographer can update the word format, and then process the dictionary again without computer scientists' assistance, obtaining a new XML. A method to classify the MWEs (Multi-word Expressions) of one of the dictionaries will be presented. The tool was used to extract a list of set phrases, another one with archaic and regional variants from well formalized sources, used to increase the performance of a Romanian P(art)O(f)S(peech)-tagger.

References

[1]
2001-2003. Little Academic Dictionary. Vol. I-IV. Universe Encyclopedic Publishing House, Bucharest.
[2]
Romanian Academy (Ed.). 1913--1949. Dictionary of Romanian Language. Socec Universul Publishing, Bucharest.
[3]
Romanian Academy (Ed.). 1959--1972. Studies and materials related to words formation in Romanian. Vol. I-VI. Romanian Academy Publishing, Bucharest.
[4]
Romanian Academy (Ed.). 1965--2010. Dictionary of Romanian Language. Romanian Academy Publishing, Bucharest.
[5]
Romanian Academy (Ed.). 1970--1989. The Formation of Words in Romanian. Vol. I-III. Romanian Academy Publishing, Bucharest.
[6]
Cecilia Căpăţână. 2007. Elements of Phraseology. Universitaria Publishing, Craiova.
[7]
Neculai Curteanu and Alexandru Moruz. 2012. A Procedural DTD Project for Dictionary Entry Parsing Described with Parameterized Grammars. In Proceedings of COGALEX-III -- The Third Workshop on Cognitive Aspects of the Lexicon. Bombay, India, 127--136.
[8]
Ralph Hauser and Angelika Storrer. 1993. Dictionary Entry Parsing Using the Lex Parse System. Lexikographica 9 (1993), 174--219.
[9]
Lothar Lemnitzer and Claudia Kunze. 2005. Dictionary Entry Parsing. In Computational Lexicography at ESSLLI.
[10]
Cătălina Mărănduc. 2010. The Dictionary of Romanian Expressions, Syntagms and Phrases (DELS). Corint Publishing, Bucharest.
[11]
Cătălina Mărănduc. 2012. Derivation -- a way to organize the vocabulary in lexical nesting. Technical Report. Romanian Academy.
[12]
Cătălina Mărănduc, Cătălin Mititelu, and Cenel-Augusto Perez. 2014. Lexical Nesting Dictionaries. In Proceedings of the 10th International Conference Linguistic Resources and Tools for Processing the Romanian Language. Alexandru Ioan Cuza University Publishing, Iaşi, 103--114.
[13]
Florin Marcu. 1997. New of Neologisms (NDN). Romanian Academy Publishing House, Bucharest.
[14]
Mary S. Neff and Branimir K. Boguraev. 1989. Dictionaries, Dictionary Grammars and Dictionary Entry Parsing. In Proceedings of the 27rd Annual Conference of the Association for Computational Linguistics. 91--101.
[15]
Ioan Oprea, Carmen-Gabriela Pamfil, Rodica Radu, and Victoria Zăstroiu. 2007. New Universal Dictionary of Romanian Language. Litera International Publishing, Bucharest -- Chişinău.
[16]
Terence Parr. 2012. The Definitive ANTLR 4 Reference. The Pragmatic Programmers Publishing, Dallas, Texas.
[17]
Mircea Petic. 2010. The Generative Mechanisms of the Derivational Morphology. In Proceedings of the 6th International Conference Linguistic Resources and Tools for Processing the Romanian Language. Alexandru Ioan Cuza University Publishing, Iaşi, 195--202.
[18]
Radu Simionescu. 2011. Hybrid POS Tagger. In Language Resources and Tools in Industrial Applications Proceedings of Eurolan 2011 summer school. Braşov, 27--36.
[19]
Laura Vasiliu. 1981. Synonymy of suffixal derivatives in Romanian. In Semantics and semiotics, Ion Coteanu and Lucia Wald (Eds.). Scientific and Encyclopedic Publishing, Bucharest, 314--344.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DATeCH2017: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage
June 2017
179 pages
ISBN:9781450352659
DOI:10.1145/3078081
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MWE classification
  2. POS-tagger for Romanian
  3. dictionary entry parsing
  4. list of set phrases
  5. multi-word expressions
  6. nested dictionary

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DATeCH2017

Acceptance Rates

DATeCH2017 Paper Acceptance Rate 29 of 37 submissions, 78%;
Overall Acceptance Rate 60 of 86 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 25
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media