note

Integrating Shallow Syntactic Labels in the Phrase-Boundary Translation Model

Authors:
Shahram Salami

Shahid Beheshti University, Tehran, Iran

Shahid Beheshti University, Tehran, Iran
View Profile

,
Mehrnoush Shamsfard

Shahid Beheshti University, Tehran, Iran

Shahid Beheshti University, Tehran, Iran
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 17 Issue 3Article No.: 17pp 1–12https://doi.org/10.1145/3178460

Published:14 February 2018Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Using a novel rule labeling method, this article proposes a hierarchical model for statistical machine translation. The proposed model labels translation rules by matching the boundaries of target side phrases with the shallow syntactic labels including POS tags and chunk labels on the target side of the training corpus. The boundary labels are concatenated if there is no label for the whole target span. Labeling with the classes of boundary words on the target side phrases has been previously proposed as a phrase-boundary model which can be considered as the base form of our model. In the extended model, the labeler uses a POS tag if there is no chunk label in one boundary. Using chunks as phrase labels, the proposed model generalizes the rules to decrease the model sparseness. The sparseness is a more important issue in the language pairs with a lot of differences in the word order because they have less number of aligned phrase pairs for extraction of rules. The extended phrase-boundary model is also applicable for low-resource languages having no syntactic parser. Some experiments are performed with the proposed model, the base phrase-boundary model, and variants of Syntax Augmented Machine Translation (SAMT) in translation from Persian and German to English as source and target languages with different word orders. According to the results, the proposed model improves the translation performance in the quality and decoding time aspects. Using BLEU as our metric, the proposed model has achieved a statistically significant improvement of about 0.5 point over the base phrase-boundary model.

References

H. Almaghout, J. Jiang, and A. Way. 2010. CCG augmented hierarchical phrase based machine-translation. In Proceedings of the 7th International Workshop on Spoken Language Translation. 211--218.Google Scholar
C. Cherry. 2013. Improved reordering for phrase-based translation using sparse features. In Proceedings of HLT-NAACL. 22--31.Google Scholar
D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 263--270. Google ScholarDigital Library
D. Chiang. 2010. Learning to translate with source and target syntax. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 1443--1452. Google ScholarDigital Library
D. Chiang et al. 2005. The Hiero machine translation system: Extensions, evaluation, and analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 779--786. Google ScholarDigital Library
R. Collobert et al. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12 (2011), 2493--2537. Google ScholarDigital Library
R. Haque et al. 2010. Supertags as source language context in hierarchical phrase-based SMT. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas.Google Scholar
Z. He, Q. Liu, and S. Lin. 2008. Improving statistical machine translation using lexicalized rule selection. In Proceedings of the 22nd International Conference on Computational Linguistics, Vol. 1. 321--328. Google ScholarDigital Library
Z. He, Y. Meng, and H. Yu. 2009. Discarding monotone composed rule for hierarchical phrase-based statistical machine translation. In Proceedings of the 3rd International Universal Communication Symposium. 25--29. Google ScholarDigital Library
Z. Huang, M. Čmejrek, and B. Zhou. 2010. Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 138--147. Google ScholarDigital Library
M. Huck et al. 2012. Discriminative reordering extensions for hierarchical phrase-based machine translation. In Proceedings of the 16th Annual Conference of the European Association for Machine Translation. 313--320.Google Scholar
G. Iglesias et al. 2009. Rule filtering by pattern for efficient hierarchical translation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. 380--388. Google ScholarDigital Library
O. Kashefi. 2018. MIZAN: A large persian-english parallel corpus. CoRR abs/1801.02107. Available at: http://arxiv.org/abs/1801.02107.Google Scholar
D. Klein and C. D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Volume 1. 423--430. Google ScholarDigital Library
P. Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the MT Summit. 79--86.Google Scholar
P. Koehn et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. 177--180. Google ScholarDigital Library
P. Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. 388--395.Google Scholar
P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Vol. 1. 48--54. Google ScholarDigital Library
S.-W. Lee et al. 2012. Translation model size reduction for hierarchical phrase-based statistical machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, Vol. 2. 291--295. Google ScholarDigital Library
J. Li et al. 2012. Head-driven hierarchical phrase-based translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, Vol. 2. 33--37. Google ScholarDigital Library
Z. Li et al. 2009. Joshua: An open source toolkit for parsing-based machine translation. In Proceedings of the 4th Workshop on Statistical Machine Translation. 135--139. Google ScholarDigital Library
Y. Marton and P. Resnik. 2008. Soft syntactic constraints for hierarchical phras-based translation. In Proceedings of ACL. 1003--1011.Google Scholar
H. Mino, T. Watanabe, and E. Sumita. 2014. Syntax-augmented machine translation using syntax-label clustering. In Proceedings of EMNLP. 165--171.Google Scholar
F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Vol. 1. 160--167. Google ScholarDigital Library
F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51. Google ScholarDigital Library
F. J. Och and H. Ney. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. 440--447. Google ScholarDigital Library
K. Papineni et al. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 311--318. Google ScholarDigital Library
A. Pauls and D. Klein. 2011. Faster and smaller n-gram language models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1. 258--267. Google ScholarDigital Library
M. Post et al. 2013. Joshua 5.0: Sparser, better, faster, server. In Proceedings of the 8th Workshop on Statistical Machine Translation. 206--212.Google Scholar
S. Salami and M. Shamsfard. 2016. Monotonic filter for hierarchical translation models. In Proceedings of the 2016 6th International Conference on Computer and Knowledge Engineering (ICCKE’16). 19--24.Google Scholar
S. Salami, M. Shamsfard, and S. Khadivi. 2016. Phrase-boundary model for statistical machine translation. Computer Speech 8 Language 38 (2016), 13--27. Available at http://www.sciencedirect.com/science/article/pii/S0885230815001096. Google ScholarDigital Library
B. Sankaran, G. Haffari, and A. Sarkar. 2011. Bayesian extraction of minimal SCFG rules for hierarchical phrase-based translation. In Proceedings of the 6th Workshop on Statistical Machine Translation. 533--541. Google ScholarDigital Library
B. Sankaran, G. Haffari, and A. Sarkar. 2012. Compact rule extraction for hierarchical phrase-based translation. In Proceedings of the 10th Biennial Conference of the Association for Machine Translation in the Americas (AMTA’12), Association for Computational Linguistics.Google Scholar
A. Venugopal et al. 2009. Preference grammars: Softening syntactic constraints to improve statistical machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 236--244. Google ScholarDigital Library
T. Watanabe, H. Tsukada, and H. Isozaki. 2006. Left-to-right target generation for hierarchical phrase-based translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 777--784. Google ScholarDigital Library
G. Maillette de Buy Wenniger and K. Sima'an. 2015. Labeling hierarchical phrase-based models without linguistic resources. Machine Translation 29, 3--4 (2015), 225--265. Google ScholarDigital Library
B. Zhou et al. 2008. Prior derivation models for formally syntax-based translation using linguistically syntactic parsing and tree kernels. In Proceedings of the 2nd Workshop on Syntax and Structure in Statistical Translation. 19--27. Google ScholarDigital Library
A. Zollmann et al. 2008. A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT. In Proceedings of the 22nd International Conference on Computational Linguistics, Vol. 1. 1145--1152. Google ScholarDigital Library
A. Zollmann and A. Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proceedings of the Workshop on Statistical Machine Translation. 138--141. Google ScholarDigital Library
A. Zollmann and S. Vogel. 2011. A word-class approach to labeling PSCFG rules for machine translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1. 1--11. Google ScholarDigital Library

Index Terms

Integrating Shallow Syntactic Labels in the Phrase-Boundary Translation Model
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Phrase-boundary model for statistical machine translation

We proposed an SMT model labeling nonterminals with boundary word classes of phrases.Word classes can be defined by POS tags and automatic word clustering.The proposed model was filtered considering alignment pattern of phrase pairs.Limited patterns of ...
Read More
Integrating source-language context into phrase-based statistical machine translation

The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated ...
Read More
Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 17, Issue 3
September 2018
196 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3184403
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 February 2018
- Accepted: 1 December 2017
- Revised: 1 August 2017
- Received: 1 January 2017
Published in tallip Volume 17, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chunk label
Hierarchical models
POS tag
Statistical machine translation
Qualifiers
- note
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 106
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Integrating Shallow Syntactic Labels in the Phrase-Boundary Translation Model

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Phrase-boundary model for statistical machine translation

Integrating source-language context into phrase-based statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Integrating Shallow Syntactic Labels in the Phrase-Boundary Translation Model

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Phrase-boundary model for statistical machine translation

Integrating source-language context into phrase-based statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media