The provided data is mainly taken from version 6 of the Europarl corpus, which is freely available. Please click on the links below to download the sentence-aligned data, or go to the Europarl website for the source release.
Additional training data is taken from the new News Commentary corpus. There are about 45 million words of training data per language from the Europarl corpus and 2 million words from the News Commentary corpus.
EuroparlFrench-English Spanish-English German-English Czech-English French monolingual Spanish monolingual German monolingual Czech monolingual English monolingual
| News CommentaryFrench-English Spanish-English German-English Czech-English French monolingual Spanish monolingual German monolingual Czech monolingual English monolingual
| NewsFrench monolingual Spanish monolingual German monolingual English monolingual Czech monolingual
|
United NationsFrench-English Spanish-English
| French-English 109 corpusCrawled from Canadian and European Union sources. | CzEngThe current version of the CzEng corpus (version v0.9) is available from the CzEng web site (note: same as last year).
|