I have used Librispeech
corpus. I have concatenated all audio files in dev-clean to create
train.wav and all files in test-clean to create validate.wav. I have
resampled the audio files to 8000 Hz.
Here is how you can create train.wav & validate.wav using vlc on
linux: