# Homework 5: NMT

Due Novemeber 8th, 2018 at noon

In this assignment, you will be improving your sequence to sequence neural machine translation model.

## Getting Started

You will be starting with your homework assignment with your code from Homework 4.

In this assignment, you will be improving your basic NMT model with attention and adding speedups.

This code is based on the tutorial by Sean Robertson found here. Students MAY NOT view that tutorial or use it as a reference in any way.

## Part 1

• Implementing batching (http://www.aclweb.org/anthology/W17-3208)
• Replace your implementation of the LSTM with PyTorch implementation

For each change, note the impact on training speed (you do not need to do a run training run - just run long enough to get a speed in sentences per second). For batching, experiment with a range of batch sizes.

## Part 2

Next, try at least one method for improving your machine translation system. If you are working in a team, we expect the overall effort to scale up with team size.

Some ideas you could try include (but are not limited to):

• Implementing beam search (described here: https://arxiv.org/pdf/1211.3711.pdf, http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf)

• Character Aware encoder (there are several ways of doing this, here is one: http://anthology.aclweb.org/P16-2058 but feel free to try something else creative!)

• Implementing Different types of Attention (http://aclweb.org/anthology/D15-1166)

• Other improvements:

• A good way to think about potential improvements is to look at your output and see what problems there are.
• You can also take a look at recent papers for inspiration about what problems to tackle (https://aclanthology.info/).
• If you are not sure if something is a valid extension, please post on piazza.

NOTE: This is a very small data set designed to train quickly on cpu, so some extensions may not improve performance on this data set. That’s ok, please still analyze your results. You are welcome to try training your system on different data if you would like. If you are interested in trying your system on a different language pair that has some property: complex morphology, reordering, etc, post on piazza and we help you try to find one. For larger datasets, you will likely need a GPU, which is not provided.

## Ground Rules

• This code is based on the tutorial by Sean Robertson found here. Students MAY NOT view that tutorial or use it as a reference in any way.
• You can work in independently or in groups of up to three, under these conditions:
1. You must announce the group publicly on piazza.
2. You agree that everyone in the group will receive the same grade on the assignment.
3. You can add people or merge groups at any time before the assignment is due. You cannot drop people from your group once you’ve added them. We encourage collaboration, but we will not adjudicate Rashomon-style stories about who did or did not contribute. 1. You must submit one assignment per group on Gradescope, and indicate your collaborators once you upload the files.
• You must turn in three things to Gradescope:
1. Your translations of the entire testset. You can turn in your best system output (or most interesting, in the case that your implementation didn’t help performance). You can upload new output as often as you like, up until the assignment deadline. Your translated file must be named translations.