#Quality Estimation: Final Project Quality estimation is a topic of increasing interest in Machine Translation. It aims at providing a quality indicator for unseen translated sentences at various granularity levels. In this assignment, we will focus on sentence-level estimation. Different from Machine Translation evaluation, quality estimation systems do not rely on reference translations and are generally addressed using machine learning techniques to predict quality scores.
Some interesting uses of sentence-level quality estimation are the following:
In this assignment we will give you Spanish source sentences and their English translations, which can either be human translations or two versions of machine translations. Your mission, should you choose to accept it, is to accurately score a sentence. The scores are or .
To begin, download the project starter kit. You may either choose to develop locally or on Penn’s servers. For the latter, we recommend using the Biglab machines, whose memory and runtime restrictions are much less stringent than those on Eniac. The Biglab servers can be accessed directly using the command
ssh PENNKEY@biglab.seas.upenn.edu, or from Eniac using the command
In the downloaded directory, you have a program that evaluates how good a machine translation system output is. Test it out using this command:
./default > output
Every translation of an input sentence has an associated feature vector . The
default takes a parameter vector whose length is equal to that of . By default, . For each sentence in the training data, we obtain a weighted sum of the features. Then, for each score we compute the average sum. Then, for each sentence in the test set, we compute a sum of the features and compare it with the average sum. We assign a score based on the minimum difference to the average of each score.
To evaluate, use this command:
./grade < output
The scoring is the accuracy measured against human annotated labels.
##Baseline Method For the baseline implementation we are planning to try out a few different methods to see which is feasible. They include:
##The Challenge You can improve the parameter vector by using effective learning algorithms that optimize . You can also improve the values assigned to the scores by using a function other than the naive average.
Implement a version of a learning algorithm and some feature engineering to improve accuracy. Here are some ideas to improve the accuracy:
But the sky’s the limit! You can try anything you want, as long as you follow the ground rules.
You must work independently on this assignment.
You should submit each of the following:
Your translations of the entire dataset, uploaded from any Eniac or Biglab machine
using the command
You may submit new results as often as you like, up until the assignment deadline.
The output will be evaluated using
The top few positions on the leaderboard will receive bonus points on this assignment.
Your code, uploaded using the command
TBD. This is due 24 hours after the leaderboard closes. You are free to extend the code we provide or write your own in whatever language you like, but the code should be self-contained, self-documenting, and easy to use.
A report describing the models you designed and experimented with, uploaded
using the command
TBD. This is due 24 hours after the leaderboard closes. Your report does not need to be
long, but it should at minimum address the following points:
Motivation: Why did you choose the models you experimented with?
Description of models or algorithms: Describe mathematically or algorithmically what you did. Your descriptions should be clear enough that someone else in the class could implement them.
Results: You most likely experimented with various settings of any models you implemented. We want to know how you decided on the final model that you submitted for us to grade. What parameters did you try, and what were the results? Most importantly: what did you learn?
Since we have already given you a concrete problem and dataset, you do not need describe these as if you were writing a full scientific paper. Instead, you should focus on an accurate technical description of the above items.
Note: These reports will be made available via hyperlinks on the leaderboard. Therefore, you are not required to include your real name if you would prefer not to do so.
Credits: This assignment was developed by Anwesha Das, Deepti Panuganti, Dhruvil Shah, and Aryeh Stiefel.