nmt-anuvada
We intent to collect parallel dataset for HINDI - ENGLISH language
corpus. The primary usage, it to investigate translation accuracy of the
mentioned corpus.
#Detail about Corpora
IITB Hindi-English parallel corpus(approx size 1.5M) contains the data from the following domain:
GNOME 1
KDE 145706
Quran 242933
Chats 430013
Movie Dialogs 434711
General 438933
Hi-Eng Word-Linkage 712818
Admin Dictionary 887993
Admin Examples 954457
Admin Definitions 1001292
TED Talks 1047815
Indic Multi-Parallel 1090398
Judicial I 1100747
Judicial II 1105754
Govt Websites 1109481
Wikipedia 1232841
Book Translations 1265704
Govt Website II 1492827
1561840