A Classifier to Determine Whether a Document is Professionally or Machine Translated

Luckert, Michael; Schaefer-Kehnert, Mortiz; Löwe, Welf; Ericsson, Morgan; Wingkvist, Anna

doi:10.1007/978-3-319-45321-7_24

A Classifier to Determine Whether a Document is Professionally or Machine Translated

Michael Luckert⁸,
Mortiz Schaefer-Kehnert⁸,
Welf Löwe⁹,
Morgan Ericsson⁹ &
…
Anna Wingkvist⁹

Conference paper
First Online: 08 September 2016

728 Accesses

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 261))

Abstract

In an increasingly networked world, the availability of high quality translations is critical for success, especially in the context of international competition. International companies need to provide well translated, high quality technical documentation not only to be successful in the market but also to meet legal regulations. We seek to evaluate translation quality, specifically concerning technical documentation, and formulate a method to evaluate the translation quality of technical documents both when we do have access to the original documents and when we do not. We rely on state-of-the-art machine learning algorithms and translation evaluation metrics in the context of a knowledge discovery process. Our evaluation is performed on a sentence level where each sentence is classified as either professionally translated or machine translated. The results for each sentence is then combined to evaluate the full document. The research is based on a database that contains 22,327 sentences and 32 translation evaluation attributes, which are used to optimize Decision Trees that are used to evaluate translation quality. Our method achieves an accuracy of 70.48 % on sentence level for texts in the database and can accurately classify documents with at least 100 sentences.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Documentation for VMware’s vSphere, available at https://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.doc%2FGUID-1B959D6B-41CA-4E23-A7DB-E9165D5A0E80.html (last accessed: January 19, 2016).

References

Albrecht, J., Hwa, R.: Regression for sentence-level MT evaluation with pseudo references. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 296–303 (2007)
Google Scholar
Albrecht, J.S., Hwa, R.: The role of pseudo references in MT evaluation. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 187–190. Association for Computational Linguistics (2008)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Gamon, M., Aue, A., Smets, M.: Sentence-level MT evaluation without reference translations: beyond language modeling. In: Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT), pp. 103–111 (2005)
Google Scholar
Kothes, L.: Grundlagen der Technischen Dokumentation: Anleitungen verständlich und normgerecht erstellen. Springer, Heidelberg (2010)
Google Scholar
Kulesza, A., Shieber, S.M.: A learning approach to improving sentence-level MT evaluation. In: Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 75–84 (2004)
Google Scholar
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231. Association for Computational Linguistics (2007)
Google Scholar
Luckert, M., Schaefer-Kehnert, M.: Using machine learning methods for evaluating the quality of technical documents. Master’s thesis, Linnaeus University, Sweden (2016). http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-52087
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Popović, M., Vilar, D., Avramidis, E., Burchardt, A.: Evaluation without references: IBM1 scores as evaluation metrics. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 99–103. Association for Computational Linguistics (2011)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World Scientific, River Edge (2014)
Book Google Scholar
Shapira, D., Storer, J.A.: Edit distance with move operations. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 85–98. Springer, Heidelberg (2002)
Chapter Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar
Somers, H.: Round-trip translation: what is it good for? In: Proceedings of the Australasian Language Technology Workshop, pp. 127–133 (2005)
Google Scholar

Download references

Acknowledgements

We are grateful for Andreas Kerren’s and Ola Peterson’s valuable feedback on the Master’s thesis project [8] that this research is based on.

Author information

Authors and Affiliations

Department of Computer Science, Linnaeus University, Växjö, Sweden
Michael Luckert & Mortiz Schaefer-Kehnert
Department of Computer Science, Linnaeus University, Växjö, Sweden
Welf Löwe, Morgan Ericsson & Anna Wingkvist

Authors

Michael Luckert
View author publications
You can also search for this author in PubMed Google Scholar
Mortiz Schaefer-Kehnert
View author publications
You can also search for this author in PubMed Google Scholar
Welf Löwe
View author publications
You can also search for this author in PubMed Google Scholar
Morgan Ericsson
View author publications
You can also search for this author in PubMed Google Scholar
Anna Wingkvist
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Wingkvist .

Editor information

Editors and Affiliations

Department of Information Technology, University of Economics, Prague 3, Czech Republic
Václav Řepa
Department of Information Technology, University of Economics, Prague 3, Praha, Czech Republic
Tomáš Bruckner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luckert, M., Schaefer-Kehnert, M., Löwe, W., Ericsson, M., Wingkvist, A. (2016). A Classifier to Determine Whether a Document is Professionally or Machine Translated. In: Řepa, V., Bruckner, T. (eds) Perspectives in Business Informatics Research. BIR 2016. Lecture Notes in Business Information Processing, vol 261. Springer, Cham. https://doi.org/10.1007/978-3-319-45321-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-45321-7_24
Published: 08 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45320-0
Online ISBN: 978-3-319-45321-7
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics