You will design a final research project by yourself or with one or two other students with guidance from us. Unlike the Machine Translation course, all projects mush be empirical. As such, all projects must clearly define a research problem and an experimental procedure with input, output, and evaluation measures. You should make an appointment with the instructors to discuss your project prior to the interim report. To keep you on track, there are three presentations (proposal/interim/final) with two reports (interim being a subset of the final):
Here are some slides discussing broad themes of the class and potential projects.
Your project proposal should be ready at the start of class on February 7th. Groups will be randomly selected to present and you may not end up presenting until the 8th. Unlike the Machine Translation course, you are not required to meet with the instructor before the project proposal and there is no written proposal. However, by the mid-point checkpoint, a much more formal research proposal will be due and your group will need to meet with the instructor at least once before then.
The goal of the project proposal is to foster research presentation skills that are key for future success in the field of Multilingual NLP. Groups are expected to present for 5-10 minutes on ideas that they have. The implict assumption is that this will be approximately 10 content slides. We understand that this is very early in the semester and that students have not had a lot of exposure to all of the methods/datasets/algorithms/etc. that are common in Multilingual NLP Research. With that being said, students should have narrowed down the scope of the project to a specific sub-field (i.e., Cross-Lingual Information Extraction, Multilingual Spoken Language Understanding, etc.) though knowledge of state-of-the-art methods for that task is not required. However, students are expected to have identified at least one dataset that they will use. If desired, this CAN be an English only dataset, but the final project will be cross-lingual or multilingual, and disucssion of how to synthetically generate resources in other languages using methods learned in the class will be required.
After each group presents, the next 15 minutes will be for class discussion and questions related to the presentation. Success as a reseracher requires communicating your scientific work in a variety of settings. Much of a researcher’s job happens at coffees, meals, QA sessions after talks, hallways of a conference, and visiting other institutions. This 15 minute format is designed to highlight this skill which is often neglected in our formal educational training. As such, a greater emphasis is on open-disucssion, brainstorming, and iterating on problem ideas, rather than coming with a formal presentation.
Note that this proposal is only 5% of your final grade and that the format is much more open-ended than most assignments. As such, grading of this portion will be two-fold: coming prepared for 10 minutes with slides, and a good-faith attempt at discussion with all of your classmates. The understanding is that most groups will be able to get full marks if they engage in discussions about their proposed work, as well as with the other groups. The one assumption is that the prepared remarks discuss at least one METHOD/SUB-FIELD and at least one DATASET.
More Information Later about further expectations and deliverables, but as a start:
The interim report should look much like a conference or workshop paper, with some of the technical details and experimental results missing. To get a feel for this style, read some papers from the syllabus. None of them are perfect, so read actively: what do you find good or bad about the writing in each paper? Emulate the styles that you find effective, and avoid those that are unclear or pedantic. Here is some more advice:
Your final project report should extend your interim report, filling in missing technical details, experiments, and analysis. We will make our expectations explicit in our comments on your interim report.
Many topics related to [Multilingual/Cross-Lingual/Cross-Language]+ [NLP/Speech/HLT]+ are welcome. Some suggestions:
This is the Multilingual NLP class, and NOT the Machine Translation class, so projects should be decidedly not MT. Much of the work will rely on MT as a part of the pipeline, or as a baseline. However, overly translation focused projects will be discouraged. We will make note of projects that rely too much on MT during the proposal and interim reports so that groups have a chance to correct course.
The assumption is that most groups will be of size 2. However, groups of 1 and 3 are permitted. Scope of the projects ARE assumed to be linear, so you should take into account the overhead of group coordination when forming groups. Each group should turn in a single proposal identifying all members. All group members will receive the same grade, and you are stuck with your group members once your proposal is finalized: we refuse to adjudicate stories about who did or did not contribute.