DCU Home | Our Courses | Loop | Registry | Library | Search DCU

Module Specifications..

Current Academic Year 2020 - 2021

Please note that this information is subject to change.

Module Title Statistical Machine Translation
Module Code CA4012
School School of Computing
Online Module Resources

Module Co-ordinatorSemester 1: Andrew Way
Semester 2: Andrew Way
Autumn: Andrew Way
Module TeachersAndrew Way
NFQ level 8 Credit Rating 7.5
Pre-requisite None
Co-requisite None
Compatibles None
Incompatibles None
This course introduces the fundamentals of statistical machine translation.

Learning Outcomes
1. Discuss the challenges associated with machine translation
2. Explain the noisy channel model underpinning statistical machine translation
3. Demonstrate how a statistical translation model can be inferred from a parallel corpus of texts using unsupervised machine learning techniques
4. Explain the concept of statistical language modelling and how it fits in to the basic SMT architecture
5. Explain the concept of decoding and be in a position to implement a beam decoder
6. Evaluate a statistical machine translation system using at least one automatic metric
7. Demonstrate a knowledge of the state-of-the-art in statistical machine translation
8. Train, test and evaluate MT systems using the open-source Moses toolkit
9. Implement a language modeller (including smoothing) and a basic word aligner

Workload Full-time hours per semester
Type Hours Description
Lecture24Two lectures a week
Laboratory24One two-hour lab session a week
Group work40Group project
Assignment Completion50Individual assignment
Independent Study50Studying material presented in lecture, reading research papers
Total Workload: 188

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities
Noisy Channel Model of Statistical Machine Translation
The noisy channel model and its link to Bayes Theorem

Evaluating SMT systems
The relative advantages and disadvantages of human evaluation, automatic evaluation and task-based evaluation. The BLEU evaluation metric

Language Modelling
The role of language modelling in SMT. The importance of smoothing in language modelling

Translation Models
Learning a word-based translation model from a parallel corpus using Expectation Maximization. Deriving a phrase-based model from a word-based model.The relative strengths and weaknesses of various models

A beam search decoding algorithm for SMT. Techniques for pruning the search space.

Encoding Linguistic Information in an SMT system
Techniques for including morphological, syntactic and semantic knowledge in an SMT system

Assessment Breakdown
Continuous Assessment30% Examination Weight70%
Course Work Breakdown
TypeDescription% of totalAssessment Date
AssignmentStudents undertake a group project of their choosing which involves training a machine translation system using the open-source Moses toolkit25%Once per semester
AssignmentStudents take on a significant individual project which involves implementing one of the following SMT components in a programming language of their choosing: 1) decoder, 2) language modeller, 3) word aligner.25%Once per semester
Reassessment Requirement Type
Resit arrangements are explained by the following categories;
1 = A resit is available for all components of the module
2 = No resit is available for 100% continuous assessment module
3 = No resit is available for the continuous assessment component
This module is category 1
Indicative Reading List
  • Philipp Koehn,: 0, Statistical Machine Translation, 0521874157
Other Resources
Programme or List of Programmes
CASEBSc in Computer Applications (Sft.Eng.)
CPSSDBSc in ComputationalProblem Solv&SW Dev.
ECSAStudy Abroad (Engineering & Computing)
ECSAOStudy Abroad (Engineering & Computing)
Timetable this semester: Timetable for CA4012

My DCU | Loop | Disclaimer | Privacy Statement