DCU Home | Our Courses | Loop | Registry | Library | Search DCU

Module Specifications..

Current Academic Year 2021 - 2022

Please note that this information is subject to change.

Module Title Machine Translation
Module Code CA4012
School School of Computing
Module Co-ordinatorSemester 1: Maja Popovic
Semester 2: Maja Popovic
Autumn: Maja Popovic
Module TeachersAndrew Way
Maja Popovic
NFQ level 8 Credit Rating 7.5
Pre-requisite None
Co-requisite None
Compatibles None
Incompatibles None
Description
This course introduces the fundamentals of machine translation, including the currently widely used neural approach.

Learning Outcomes
1. Discuss the challenges associated with machine translation including its evaluation.
2. Explain the concept of machine translation including approaches and the importance of language data.
3. Demonstrate how a statistical translation model can be inferred from a parallel corpus of texts using unsupervised machine learning techniques.
4. Explain how neural networks work in general and how they can be used for language-related tasks.
5. Explain the concepts of statistical language modelling and neural language modelling and their differences.
6. Explain the decoding process in NMT and understand the differences between decoding in SMT and decoding in NMT.
7. Demonstrate a knowledge of the state-of-the-art transformer neural machine translation.
8. Explain the differences between recurrent machine translation and transformer machine translation.
9. Train, test and evaluate MT system using the open-source Joey NMT tookit.



Workload Full-time hours per semester
Type Hours Description
Lecture24Two lectures a week
Laboratory24One two-hour lab session a week
Group work40Group project
Assignment Completion50Individual assignment
Independent Study50Studying material presented in lecture, reading research papers
Total Workload: 188

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities
Introduction to Machine Translation
What is machine translation? Overview of the three approaches: rule-based, statistical, neural. Importance of data for statistical and neural MT. Sentence alignment and preprocessing.

Evaluating MT systems
The relative advantages and disadvantages of human evaluation and automatic evaluation. Two main concepts used for automatic evaluation metrics: n-gram matching and edit distance.

Statistical Machine Translation
Probability model for translation, Translation model and Language model, Word Alignments and IBM models, Phrase-based SMT, Decoding.

Introduction to Neural Networks
What are neural networks? Architectures: feed forward and recurrent networks. Training neural networks: back-propagation and gradient descent.

Neural Language Models
Word representations: why are they needed? Different types: one-hot, static, contextual, external vs internal representations. Feed-forward neural language models. Recurrent neural language models.

Neural Machine Translation
Encoder-decoder architecture and sequence-to-sequence modelling. Decoding for NMT. Recurrent neural networks for MT. Recurrent neural MT with attention. Transformer neural networks for MT.

Assessment Breakdown
Continuous Assessment30% Examination Weight70%
Course Work Breakdown
TypeDescription% of totalAssessment Date
AssignmentStudents undertake a group project of their choosing which involves training a machine translation system using the open-source toolkit Joey NMT which was developed for educational purposes.10%Once per semester
AssignmentStudents take on a significant individual project which involves calculations related to 1) automatic evaluation methods, 2) language model probabilities, 3) translation model probabilities 4) neural networks.20%Once per semester
Reassessment Requirement Type
Resit arrangements are explained by the following categories;
1 = A resit is available for all components of the module
2 = No resit is available for 100% continuous assessment module
3 = No resit is available for the continuous assessment component
This module is category 1
Indicative Reading List
  • Philipp Koehn,: 0, Statistical Machine Translation, 0521874157
  • Philipp Koehn: 0, Neural Machine Translation, 9781108608480
Other Resources
None
Array
Programme or List of Programmes
CASEBSc in Computer Applications (Sft.Eng.)
ECSAStudy Abroad (Engineering & Computing)
ECSAOStudy Abroad (Engineering & Computing)
Timetable this semester: Timetable for CA4012
Archives:

My DCU | Loop | Disclaimer | Privacy Statement