mxnet-audio
Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet
The classifier ResNetV2AudioClassifier converts audio into mel-spectrogram and uses a simplified resnet DCnn architecture to classifier audios based on its associated labels.
The classifier Cifar10AudioClassifier converts audio into mel-spectrogram and uses the cifar-10 DCnn architecture to classifier audios based on its associated labels.
The classifiers differ from those used in image classification in that: * they use softrelu instead relu. * they have elongated max pooling shape (as the mel-spectrogram is elongated "image") * Dropout being added
Make sure you have the right dependencies in your python environment by running:
pip install -r requirements.txt
The audio training uses Gtzan data set to train the music classifier to recognize the genre of songs.
The training works by converting audio or song file into a mel-spectrogram which can be thought of a 3-dimension tensor in a similar manner to an image. With the trained model, it is possible to build other interesting application such as music recommendation, music search, audio2vec, etc.
To train on the Gtzan data set, run the following command:
cd demo python cifar10_train.py
The sample codes below show how to train Cifar10AudioClassifier to classify songs based on its genre labels:
from mxnet_audio.library.cifar10 import Cifar10AudioClassifierfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_foundimport mxnetdef load_audio_path_label_pairs(max_allowed_pairs=None): download_gtzan_genres_if_not_found('./very_large_data/gtzan') audio_paths = [] with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file: for line in file: audio_path = './very_large_data/' + line.strip() audio_paths.append(audio_path) pairs = [] with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file: for line in file: label = int(line) if max_allowed_pairs is None or len(pairs) < max_allowed_pairs: pairs.append((audio_paths[len(pairs)], label)) else: break return pairsdef main(): audio_path_label_pairs = load_audio_path_label_pairs() print('loaded: ', len(audio_path_label_pairs)) classifier = Cifar10AudioClassifier(model_ctx=mxnet.gpu(0), data_ctx=mxnet.gpu(0)) batch_size = 8 epochs = 100 history = classifier.fit(audio_path_label_pairs, model_dir_path='./models', batch_size=batch_size, epochs=epochs, checkpoint_interval=2)if __name__ == '__main__': main()
After training, the trained models are saved to demo/models.
To test the trained Cifar10AudioClassifier model, run the following command:
cd demo python cifar10_predict.py
Below compares training quality of ResNetV2AudioClassifier and Cifar10AudioClassifier:
The sample codes shows how to use the trained Cifar10AudioClassifier model to predict the music genres:
from random import shufflefrom mxnet_audio.library.cifar10 import Cifar10AudioClassifierfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_found, gtzan_labelsdef load_audio_path_label_pairs(max_allowed_pairs=None): download_gtzan_genres_if_not_found('./very_large_data/gtzan') audio_paths = [] with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file: for line in file: audio_path = './very_large_data/' + line.strip() audio_paths.append(audio_path) pairs = [] with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file: for line in file: label = int(line) if max_allowed_pairs is None or len(pairs) < max_allowed_pairs: pairs.append((audio_paths[len(pairs)], label)) else: break return pairsdef main(): audio_path_label_pairs = load_audio_path_label_pairs() shuffle(audio_path_label_pairs) print('loaded: ', len(audio_path_label_pairs)) classifier = Cifar10AudioClassifier() classifier.load_model(model_dir_path='./models') for i in range(0, 20): audio_path, actual_label_id = audio_path_label_pairs[i] predicted_label_id = classifier.predict_class(audio_path) print(audio_path) predicted_label = gtzan_labels[predicted_label_id] actual_label = gtzan_labels[actual_label_id] print('predicted: ', predicted_label, 'actual: ', actual_label)if __name__ == '__main__': main()
The sample codes shows how to use the trained Cifar10AudioClassifier model to encode an audio file into a fixed-length numerical vector:
from random import shufflefrom mxnet_audio.library.cifar10 import Cifar10AudioClassifierfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_founddef load_audio_path_label_pairs(max_allowed_pairs=None): download_gtzan_genres_if_not_found('./very_large_data/gtzan') audio_paths = [] with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file: for line in file: audio_path = './very_large_data/' + line.strip() audio_paths.append(audio_path) pairs = [] with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file: for line in file: label = int(line) if max_allowed_pairs is None or len(pairs) < max_allowed_pairs: pairs.append((audio_paths[len(pairs)], label)) else: break return pairsdef main(): audio_path_label_pairs = load_audio_path_label_pairs() shuffle(audio_path_label_pairs) print('loaded: ', len(audio_path_label_pairs)) classifier = Cifar10AudioClassifier() classifier.load_model(model_dir_path='./models') for i in range(0, 20): audio_path, actual_label_id = audio_path_label_pairs[i] audio2vec = classifier.encode_audio(audio_path) print(audio_path) print('audio-to-vec: ', audio2vec)if __name__ == '__main__': main()
The sample codes shows how to use Cifar10AudioSearch with the trained model to search for similar musics given a music file:
from mxnet_audio.library.cifar10 import Cifar10AudioSearchfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_founddef load_audio_path_label_pairs(max_allowed_pairs=None): download_gtzan_genres_if_not_found('./very_large_data/gtzan') audio_paths = [] with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file: for line in file: audio_path = './very_large_data/' + line.strip() audio_paths.append(audio_path) pairs = [] with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file: for line in file: label = int(line) if max_allowed_pairs is None or len(pairs) < max_allowed_pairs: pairs.append((audio_paths[len(pairs)], label)) else: break return pairsdef main(): search_engine = Cifar10AudioSearch() search_engine.load_model(model_dir_path='./models') for path, _ in load_audio_path_label_pairs(): search_engine.index_audio(path) query_audio = './data/audio_samples/example.mp3' search_result = search_engine.query(query_audio, top_k=10) for idx, similar_audio in enumerate(search_result): print('result #%s: %s' % (idx+1, similar_audio))if __name__ == '__main__': main()
The sample codes shows how to use Cifar10AudioRecommender with the trained model to recommend songs based on user's listening history:
from random import shufflefrom mxnet_audio.library.cifar10 import Cifar10AudioRecommenderfrom mxnet_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_founddef load_audio_path_label_pairs(max_allowed_pairs=None): download_gtzan_genres_if_not_found('./very_large_data/gtzan') audio_paths = [] with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file: for line in file: audio_path = './very_large_data/' + line.strip() audio_paths.append(audio_path) pairs = [] with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file: for line in file: label = int(line) if max_allowed_pairs is None or len(pairs) < max_allowed_pairs: pairs.append((audio_paths[len(pairs)], label)) else: break return pairsdef main(): music_recommender = Cifar10AudioRecommender() music_recommender.load_model(model_dir_path='./models') music_archive = load_audio_path_label_pairs() for path, _ in music_archive: music_recommender.index_audio(path) # create fake user history on musics listening to shuffle(music_archive) for i in range(30): song_i_am_listening = music_archive[i] music_recommender.track(song_i_am_listening) for idx, similar_audio in enumerate(music_recommender.recommend(limits=10)): print('result #%s: %s' % (idx+1, similar_audio))if __name__ == '__main__': main()
To pre-generate the mel-spectrograms from the audio files for classification, one can also first run the following scripts before starting training, which will make the training faster:
cd demo/utility python gtzan_loader.py
The audio processing depends on librosa version 0.6 which depends on audioread.
If you are on Windows and sees the error "audioread.NoBackend", go to ffmpeg and download the shared linking build, unzip to a local directory and then add the bin folder of the ffmpeg to the Windows $PATH environment variable. Restart your cmd or powershell, Python should now be able to locate the backend for audioread in librosa
Note that the default training scripts in the demo folder use GPU for training, therefore, you must configure your graphic card for this (or remove the "model_ctx=mxnet.gpu(0)" in the training scripts).
Step 1: Download and install the CUDA Toolkit 9.0 (you should download CUDA Toolkit 9.0)
Step 2: Download and unzip the cuDNN 7.0.4 for CUDA@ Toolkit 9.0 and add the bin folder of the unzipped directory to the $PATH of your Windows environment
还没有评论,说两句吧!
热门资源
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com