Skip to main content

Examining the Quality of Machine Translation

The first homework assignment does not involve any programming. Instead, you will take a closer look at the quality of todays’s machine translation systems.

Translate with Google Translate

  1. Pick a foreign language (preferable one that you have some understanding of - or an easy one like French or Spanish)
  2. Find a news site, Wikipedia articles, or social media posts.
  3. Translate it with Google Translate

Try to find challenging cases, either due to the language (not one of the top 100 languages in terms of resources), specialized domain (e.g., technical jargon), linguistic constructions, or writing style (e.g., social media with creative and ungrammatical expressions).

Analyse the Translations

Write a report about the quality of the machine translation.

Go over at least 20 sentences, manually correct each sentence, and report for each sentence:

  1. the source sentence
  2. the machine translation
  3. a correction of the machine translation
  4. an assessment of the error in the machine translation

You may do step 4 in any way you want. For instance, you could classify errors as “reordering errors”, “word sense error for a noun”, or any other type of error you can think of.

For instance:

  1. Erst drei Tage ist der neue Ministerpräsident Griechenlands im Amt.
  2. Only three days is the new Prime Minister of Greece in office.
  3. The new Prime Minister of Greece has been in office for only three days.
  4. (1) Verb tense is wrong: is instead of has been. (2) Preposition for was missing in front of the time phrase only three days. (3) While the noun phrases and preprositional phrases are correct, the overall sentence structure on the clause level is scambled.

Conclude your report with a summary of your impression of the major quality problems in the machine translation system that you analysed.

What to Hand in

Turn in a written report on Sunday, September 7 by midnight, on Gradescope.