ABSTRACT
A reverse dictionary generates a ranked list of vocabulary words that correspond to the definition of a given input description. Although reverse dictionary has widely practical values, little research has been done, particularly on multilingual reverse dictionary. To address this gap and enhance the accuracy of reverse dictionary across different languages, this paper proposes a multilingual reverse dictionary model based on mBERT. It optimizes the original model with features such as part-of-speech of words. The effectiveness of this improved model has been validated on both English and Chinese datasets. Experimental results illustrate that our model outperforms the baseline models in most metrics.
- L. Zhang, F. Qi, Z. Liu, Y. Wang, Q. Liu, and M. Sun, ‘Multi-channel reverse dictionary model’, in Proceedings of the AAAI conference on artificial intelligence, 2020, pp. 312–319.Google ScholarCross Ref
- J. D. M.-W. C. Kenton and L. K. Toutanova, ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’, in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.Google Scholar
- A. Conneau , ‘Unsupervised Cross-lingual Representation Learning at Scale’, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 8440–8451.Google Scholar
- C. Raffel , ‘Exploring the limits of transfer learning with a unified text-to-text transformer’, The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.Google ScholarDigital Library
- H. Yan, X. Li, X. Qiu, and B. Deng, ‘BERT for Monolingual and Cross-Lingual Reverse Dictionary’, in Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 4329–4338.Google ScholarCross Ref
Recommendations
Multi class-based n-gram language model for new words using web data
ROCOM'11/MUSP'11: Proceedings of the 11th WSEAS international conference on robotics, control and manufacturing technology, and 11th WSEAS international conference on Multimedia systems & signal processingOut-of-vocabulary (OOV) words cause a serious problem for automatic speech recognition (ASR) system. Not only it will be miss-recognized as an in-vocabulary word with similar phonetics, but the error will also affect nearby words to make errors. ...
Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system
This paper describes a new technique of language modeling for a highly inflectional Dravidian language, Tamil. It aims to alleviate the main problems encountered in processing of Tamil language, like enormous vocabulary growth caused by the large number ...
Pattern dictionary development based on non-compositional language model for japanese compound and complex sentences
ICCPOL'06: Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges aheadA large-scale sentence pattern dictionary (SP-dictionary) for Japanese compound and complex sentences has been developed. The dictionary has been compiled based on the non-compositional language model. Sentences with 2 or 3 predicates are extracted from ...
Comments