资源数据集aidatatang

aidatatang

2019-09-09 | |  1091 |   0 |   0

Aidatatang_200zh is a free Chinese Mandarin speech corpus provided by Beijing DataTang Technology Co., Ltd under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License.

The contents and the corresponding descriptions of the corpus include:

  • The corpus contains 200 hours of acoustic data, which is mostly mobile recorded data.

  • 600 speakers from different accent areas in China are invited to participate in the recording.

  • The transcription accuracy for each sentence is larger than 98%.

  • Recordings are conducted in a quiet indoor environment.

  • The database is divided into training set, validation set, and testing set in a ratio of 7: 1: 2.

  • Detail information such as speech data coding and speaker information is preserved in the metadata file.

  • Segmented transcripts are also provided.

The corpus aims to support researchers in speech recognition, machine translation, voiceprint recognition, and other speech-related fields. Therefore, the corpus is totally free for academic use.

Please cite the corpus as 鈥渁idatatang_200zh, a free Chinese Mandarin speech corpus by Beijing DataTang Technology Co., Ltd ( www.datatang.com )鈥�.

The corpus is a subset of a much bigger data (free 1505 hours Chinese Mandarin speech corpus) set which was recorded in the same environment as this open source data. Please visit our website DataTang for more details.

DataTang is a community of creators-of world-changers and future-builders. We're invested in collaborating with a diverse set of voices in the AI world, and are excited about working on large-scale projects. Beyond speech, we're providing multiple resources in image, and text. For more data, please visit [Product].


上一篇:ST-CMDS

下一篇:讽刺检测数据集

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • NUS-WIDE

    NUS-WIDE