Each team's revised project write-up is available at the bottom of the page.
Vote on your favorites using the indicated Piazza poll by 11:59 pm on April 16.
Term Project The Ultimate Challenge
Your term project is to design a homework assignment similar to the ones you have completed
in this class. Your project will consist of the following components:
- A description of the problem, in the style and format of the homework write-ups.
- Training and evaluation data.
- A commented implementation of the simplest possible solution to the problem.
This will be the default program provided to students. Name this file
default
.
- A commented implementation of a baseline published in the literature, along with
skeleton code obtained by removing the parts that students should implement.
Name these files
baseline-solution
and baseline
, respectively.
- One extension per team member that attempts to improve on the baseline, along
with a brief (one- to three-paragraph) accompanying write-up for each extension
describing the general approach and whether it worked. Name these
extension-1
,
extension-2
, etc.
- A scoring script implementing an objective function that can be used to score
submissions on the class leaderboard. Naturally, the output of all model
implementations should be gradeable with this program. Name this file
grade
.
Implementation Specification
All implementations should be runnable using the following command:
More specifically, each implementation should write its results to stdout
, and
the grading program should accept its input via stdin
. Any logging information
should be printed to stderr
. If any of your programs require command-line arguments,
such as paths to the data files, parameters for algorithms, etc., sensible defaults
should be provided in case they are omitted.
If you are writing your code in a scripting language like Python, be sure to
include the appropriate hash-bang
line at the top of each file, such as #!/usr/bin/env python
, so that your programs
can be executed directly.
If you are writing your code in a compiled language like Java, include a Makefile
so that your code can be compiled into the above form by calling make
.
Topics
You should identify what topic you would like to work on, and then email the instructor or schedule a meeting to go over the details. Here are some ideas of topics that you could use for your final project:
- Sentence alignment
- Language identification
- Transliteration
- Feature engineering for discriminative word alignment
- Language modeling
- Predict the right translation given a context
- Phrase pair extraction
- SCFG rule extraction
- Implement the CKY algorithm for decoding SCFGs
- System combination
- Domain adaptation
- Lexicalized re-ordering
- Clause restructuring for better re-ordering
- Mining parallel documents from Common Crawl
- Extracting parallel sentences from Wikipedia
- Monolingual text-to-text generation
- Predicting the goodness of translation without references
- Crowdsourcing translation
- Minimum Bayes risk re-ranking
- Predict the features in WALS for a language, given a subset of its features
You are welcome to choose your own topic, provided that it is related to machine translation.
Submission Details
Your final submission should consist of the following in a compressed archive:
report.md
: The final version of your write-up, incorporating any additional changes to your revised draft (if any).
readme.md
: A brief description of your task and the included code.
data-train/
: A directory containing the training data.
data-dev/
: A directory containing the development data for local evaluation.
data-test/
: A directory containing the test data for leaderboard evaluation.
default
: A full implementation of the default system.
baseline
: A skeleton of the baseline system to be provided to students.
baseline-solution
: A full implementation of the baseline system.
extension-1
, extension-2
, …: Full implementations of the extensions, one per group member.
extensions.md
: A brief write-up describing your extensions and their performance.
grade-dev
: A grading script for local evaluation. This may be a wrapper around a generic grading script grade
.
grade-test
: A grading script for leaderboard evaluation. This may be a wrapper around a generic grading script grade
.
Your archive should be submitted using turnin
before the deadline.
Milestones and Due Dates
- February 27 - term project assigned
- During Spring break (March 7-15) - Select your term project topic and email your instructor to say what you’ve selected
- March 17 - draft write-up is due. Your write-up should include
- A problem definition
- A pointer to or more more papers or sections textbook that describes the problem
- What objective functions you are considering
- What type of data you will need to evaluate
- March 24 - your data must be collected and submitted
- March 26 HW4 due
- March 31 - objective function must be implemented (it is OK to use an existing implementation if you need to), and your default system is due
- April 2 - revised write-up is due, taking into account any feedback from instructor and TA. This will be published on the course web site.
- April 7 - your baseline system is due
- April 9 - Language Research “Core Requirements” are due
- April 14 - no late days allowed for this deadline - your completed project is due (final writeup, data, objective fn, default system, baseline system, and extensions). It will be published on the class web site, and used as the final homework for other students.
- April 15-16 - Read over the other students’ term projects, and vote on the ones that you think should be included.
- April 28 - Final HW due: turn in your implementation of another student’s term project
- April 28 - The language research project is also due.
- final exam date (May 12th) - extra credit implement one or more extensions beyond your own baseline, or do one or more of the new student-written HWs.
Revised Project Write-ups
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16