Skip to main content

Final Project

You will design a final research project by yourself or with one or two other students with guidance from us. You have two options: a survey option and an empirical option. Survey projects must clearly analyze and synthesize a selection of recent literature a topic related to machine translation. Empirical studies must clearly define a research problem and an experimental procedure with input, output, and evaluation measures. You should make an appointment with the instructors to discuss your project prior to the first due date. To keep you on track, there are three:

  • BEFORE October 26: Meet with your instructors, by appointment.
  • October 26: Project proposals (10 points)
  • November 14: Interim report (5 points)
  • December 7: Final project report (15 points)

All projects will be graded on a final written report.

Project Presentations

Day Presenter(s)
November 27    Zhuoran Han
  Georgie Botev
  Felicia Koerner
  Edward Hu
  Siyue Zhou
  Maria Coleman
  Juhi Sanjay Malani
November 29 Riley C Scott, Joshan Bajaj, William Bernardoni
  Kai Wang, Shanshan Yang, Chen Wang
  Brian Michael Cueto, Jeesoo Kim, Tae Jin Kim
  Yi Zhang, Yingda Xia
  Kai Wang, Shanshan Yang, Chen Wang
  Brian Michael Cueto, Jeesoo Kim, Tae Jin Kim
December 04 William Watson, Vivian Tsai, Bailey Parker
  Angelo J Olcese, Vibhu Jawa, Abhinav Singh
  Xinyao Liu, Yan Wei, Yibing Zhang
  Mrudul Harwani, Lohita Sivaprakasam, Avais Pagarkar
  Zhiqi Wang, Xiaochen Sun
  Yash Kumar Lal, Aaron Mueller
December 06 Yi Zhang, Yingda Xia
  Lionel Zachary Eisenberg, Sanat Deshpande
  Xiang Li, Haley Coleman Canon, Prakhar Kaushik, Fei Wu
  Emily Brahma, Nirmal S Krishnan, Sharmila Tamby
  Cheng-I Lai, Kelly Marchisio, Jialiang Guo
  Feixuan Wang, Ryan Culkin

Project Proposal

Your project proposal must identify a concrete research plan. A survey proposal must clearly identify:

  • A coherent research area related to machine translation. The research area could be organized around a specific application, a technical problem, or a coherent set of methods for solving problems. The area should be one that has not been surveyed before, or at least not recently.

  • A set of initial papers to be surveyed.

  • An outline of the themes that you expect to find, and questions that you hope to learn answers to.

An empirical proposal must clearly idenfity:

  • A single problem related to machine translation. Your problem should be illustrated with an example and stated formally, ideally in the first paragraph.

  • An outline of your project. How will you solve the problem? What models and algorithms will you implement? What software will you use?

  • An experimental design. How will you know if you solved the problem? You should clearly identify input, output, and evaluation measures.

The proposal is a contract. If we give you full credit for it, we expect you to implement it and analyze the results, and we will give you full credit for the entire project if you do. If you turn in a weak proposal, you can revise it and resubmit it before moving forward. But the longer you take to define your project, the less time you will have to implement it, so do the best you can for this early checkpoint.

Interim Report

For your interim report, you should have made substantial progress on your project. For a survey, you should have read many of the papers, identified main themes of the literature, and synthesized these into an outline. For an empirical project, you should have collected data, developed baseline algorithms and metrics, and run preliminary experiments. Your interim report is an extension of your proposal, clarifying existing material where requested, adding technical details of completed work, and outlining planned work. A reader should be able to answer these questions:

  • What problem are you trying to solve or survey? Illustrate the problem with examples and a give a precise technical description. Clearly identify inputs and outputs. Be concise: if you don’t hook your reader in the first paragraph or so, they won’t keep reading.

  • Why is the problem important? If you could solve it, would you answer a scientific or mathematical question about language? Would you be able to build better, faster, or more usable systems than we can build now?

  • Why is the problem hard? How do the obvious solutions fail? For an empirical project, answer this question by implementing and/ or running a baseline algorithm and analyzing the failure cases. For a survey paper, you should find evidence in the literature.

  • For an empirical paper: what is your proposed solution? Give a technical description of your planned work, with enough detail that someone could implement it. Your description should include an evaluation plan. For a survey paper: what are solutions that have been tried? Your description should be convincing enough that your reader believes they’ll learn something interesting if they read all the way to the end of your (as yet unwritten!) final report.

The interim report should look much like a conference or workshop paper, with some of the technical details and experimental results missing. To get a feel for this style, read some papers from the syllabus. None of them are perfect, so read actively: what do you find good or bad about the writing in each paper? Emulate the styles that you find effective, and avoid those that are unclear or pedantic. Here is some more advice:

Final Project Report

Your final project report should extend your interim report, filling in missing technical details, experiments, and analysis. We will make our expectations explicit in our comments on your interim report.

Project Ideas

Many topics related to machine translation are welcome. Some suggestions:

  • Extensions to existing neural machine translation systems
  • Analysis and visualization of machine translation systems
  • Advanced decoding algorithms.
  • Principled methods for incorporating non-parallel data (e.g., dictionaries, grammars, or thesauri) into translation models.
  • Aspects of corpus crawling
    • Document alignment
    • Sentence alignment
    • Corpus cleaning
  • Interactive machine translation that assists human translators.
  • Experiment with different data conditions (social media text, speech, out-of-domain)
  • Post-processing MT output (e.g., applying correct capitalization, removing unnecessary spaces, combining compounds)
  • Pushing further on one of the homeworks, say by incorporating more advanced techniques and more data.
  • Writing your own MT class assignment

Group Work

Groups of any size are permitted. We will require an amount of work that is linear in group size, so you should take into account the overhead of group coordination when forming groups. Each group should turn in a single proposal identifying all members. All group members will receive the same grade, and you are stuck with your group members once your proposal is finalized: we refuse to adjudicate stories about who did or did not contribute.