资源数据集WMT 2011 News Crawl 机器翻译数据

WMT 2011 News Crawl 机器翻译数据

2019-12-04 | |  135 |   0 |   0

The provided data is mainly taken from version 6 of the Europarl corpus, which is freely available. Please click on the links below to download the sentence-aligned data, or go to the Europarl website for the source release.

Additional training data is taken from the new News Commentary corpus. There are about 45 million words of training data per language from the Europarl corpus and 2 million words from the News Commentary corpus.

Europarl
  • French-English

  • Spanish-English

  • German-English

  • Czech-English

  • French monolingual

  • Spanish monolingual

  • German monolingual

  • Czech monolingual

  • English monolingual

News Commentary
  • French-English

  • Spanish-English

  • German-English

  • Czech-English

  • French monolingual

  • Spanish monolingual

  • German monolingual

  • Czech monolingual

  • English monolingual

News
  • French monolingual

  • Spanish monolingual

  • German monolingual

  • English monolingual

  • Czech monolingual

United Nations
  • French-English

  • Spanish-English

French-English 109 corpus
  • French-English

Crawled from Canadian and European Union sources.
CzEng
  • Czech-English

The current version of the CzEng corpus (version v0.9) is available from the CzEng web site (note: same as last year).


上一篇:Cityscapes 场景标注数据

下一篇:2014年美国社区统计数据

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...