资源数据集TED 平行语料库

TED 平行语料库

2019-12-26 | |  124 |   0 |   0

TED parallel Corpora is growing collection of Bilingual parallel corpora, Multilingual parallel corpora and Monolingual corpora extracted from TED talks www.ted.com for 109 world languages. It includes Monolingual corpus, 12 languages for Bilingual parallel corpus over 120 million aligned sentences and 13 languages for Multilingual Parallel corpus with more than 600k sentences. The goal of the extraction and processing was to generate sentence aligned text for statistical machine translation systems. All pre-processing is done automatically. No manual corrections have been carried out.

上一篇:气象数据国际地面交换站日间数据

下一篇:MIT Saliency 眼睛浏览轨迹数据集

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...