Discover Top Posts Tagged with #automatic evaluation

Machine Translation Quality: Automatic and Human Evaluation of four MT systems' output

Master's Thesis | Master's Programme in Language Technology | UOA, NTUA, ILSP

Eirini Chatzikoumi, 2015

By evaluating the outputs of four MT systems in the language pair English-Greek in both directions, we aim to assess their output quality, draw conclusions on it and suggest ways of exploiting these conclusions. For this evaluation project we use the Bleu metric as well as human evaluation methods. As three of the systems are commercial while the fourth one is under development, the methodology of their human evaluation is adjusted to the different needs, thus for the commercial systems it focuses on their ranking, while for the prototype it focuses on its improvement. The Bleu metric is also indirectly evaluated, as its correlation with human judgment is studied.

The corpus built for the needs of our evaluation project comprises 30 source texts, as well as 3 reference translations and 4 machine translations for each one of them. Half of the source texts are English and the other half are Greek, and in each direction we study 5 texts from each one of the following fields: medicine, law & administration and technology. The total number of segments of the English texts is 1,726 and of the Greek texts 1,103. Upon observation of the Bleu scores, a small number of texts is selected for human evaluation. The human evaluation tasks performed are Quality Checking, 3-way Ranking, Error Classification and Post-editing. As a result of the post-editing task, a by-product of this evaluation is a corpus of 331 post-edited segments (229 English-Greek and 102 Greek-English), the total number of edits being 2,158 (1,434 English-Greek and 724 Greek-English). Our conclusions regard the four systems' ranking, the performance of each system in the three above-mentioned fields and the relation between sentence length and translation quality. Finally, the prototype's outputs submitted to error classification provide some interesting insight on the most prominent error types in each translation direction in the given language pair.

View thesis

#machine translation quality #human evaluation #automatic evaluation #error classification #post-editing #bleu #english #greek