资源数据集WMT 2011 News Crawl 机器翻译数据

WMT 2011 News Crawl 机器翻译数据

2019-12-04 | |  135 |   0 |   0

The provided data is mainly taken from version 6 of the Europarl corpus, which is freely available. Please click on the links below to download the sentence-aligned data, or go to the Europarl website for the source release.

Additional training data is taken from the new News Commentary corpus. There are about 45 million words of training data per language from the Europarl corpus and 2 million words from the News Commentary corpus.

  • French-English

  • Spanish-English

  • German-English

  • Czech-English

  • French monolingual

  • Spanish monolingual

  • German monolingual

  • Czech monolingual

  • English monolingual

News Commentary
  • French-English

  • Spanish-English

  • German-English

  • Czech-English

  • French monolingual

  • Spanish monolingual

  • German monolingual

  • Czech monolingual

  • English monolingual

  • French monolingual

  • Spanish monolingual

  • German monolingual

  • English monolingual

  • Czech monolingual

United Nations
  • French-English

  • Spanish-English

French-English 109 corpus
  • French-English

Crawled from Canadian and European Union sources.
  • Czech-English

The current version of the CzEng corpus (version v0.9) is available from the CzEng web site (note: same as last year).

上一篇:Cityscapes 场景标注数据




  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据


  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...
