This is a project based course focusing on the design and implementation of systems that scale Natural Language Processing methods beyond English. The course will cover both multilingual and cross-lingual methods with an emphasis on zero-shot and few-shot approaches, as well as ‘silver’ dataset creation. Modules will include Cross-Lingual Information Extraction & Semantics, Cross-Language Information Retrieval, Multilingual Question Answering, Multilingual Structured Prediction, Multilingual Automatic Speech Recognition, as well as other non-English centric NLP methods. Students will be expected to work in small groups and pick from one of the modules to create a model based on state-of-the-art methods covered in the class. The course will be roughly two-thirds lecture based and one-third students presenting project updates periodically throughout the semester.
All assignments are due at the start of class on the day they are due. Exceptions regarding tardiness will be considered on a case-by-case basis and must be submitted via e-mail 24 hours before the initial due date. Late submissions will be accepted using an exponential decay formula with a half-life of 1 week (604800 seconds).