Skip to main content

Homework 4: NMT

Due October 25th, 2018 at noon

In this assignment, you will be building a sequence to sequence neural machine translation model.

Getting Started

You can get the starter code for this assignment here:

git clone https://github.com/thompsonb/601.468_HW4.git

In this assignment, you will be building a basic NMT model with attention. In the next assignment you will be creating extensions and adding speedups. Your next assignment will build upon this one.

This code is based on the tutorial by Sean Robertson found here. Students MAY NOT view that tutorial or use it as a reference in any way.

The Task

Your task is to implement this paper, which describes neural machine translation with attention. As in the paper, you should also write the visualization for the attention mechanism and discuss selected plots in your writeup.

The starter code for this assignment is written in PyTorch, a framework for neural networks.

INSTALL_NOTES.txt includes the instructions to install PyTorch inside a conda environment. We have provided instructions that are tested on the cs ugradx machine (which currently runs Fedora release 27). We have also tested this assignment on Ubuntu 14.04.

The primary file for this assignment is seq2seq.py Once you have installed PyTorch, you can view the arguments by running.

python seq2seq.py -h

The arguments have reasonable default values for training the initial system (e.g. the file paths to the data should not need to changed). You can inspect the defaults in the code.

One argument you should note is the load_checkpoint argument. This allows you to load in a model that was generated in a previous training run (which may be useful if you kill your training script part way through).

The portions of the code you will need to fill in are denoted by “** YOUR CODE HERE **”. Further instructions and references are also in the provided code.

Ground Rules

  • This code is based on the tutorial by Sean Robertson found here. Students MAY NOT view that tutorial or use it as a reference in any way.
  • Don’t wait till the last minute, this assignment is longer than the previous.
  • You can work in independently or in groups of up to three, under these conditions:
    1. You must announce the group publicly on piazza.
    2. You agree that everyone in the group will receive the same grade on the assignment.
    3. You can add people or merge groups at any time before the assignment is due. You cannot drop people from your group once you’ve added them. We encourage collaboration, but we will not adjudicate Rashomon-style stories about who did or did not contribute. 1. You must submit one assignment per group on Gradescope, and indicate your collaborators once you upload the files.
  • You must turn in three things to Gradescope:
    1. Your translations of the entire testset. You can upload new output as often as you like, up until the assignment deadline. Your translated file must be named translations.
    2. Your code, uploaded to Gradescope.
    3. A clear, mathematical description of your algorithm and its motivation written in scientific style, uploaded to Gradescope. This needn’t be long, but it should be clear enough that one of your fellow students could re-implement it exactly. This should also include some analysis of your attention visualization.
  • You may not use (and should not need) any other data than what we provide. Neural machine translation software including (but not limited to) OpenNMT, AWS Sockeye, or Marian, is off-limits. You may of course inspect these systems if it helps you understand how they work. But be warned: they are generally quite complicated because they provide a great deal of other functionality that is not the focus of this assignment. If you aren’t sure whether something is permitted, ask us.