Abstract
In the last couple of years, weakly labeled learning
has turned out to be an exciting approach for audio
event detection. In this work, we introduce webly
labeled learning for sound events which aims to remove human supervision altogether from the learning process. We first develop a method of obtaining
labeled audio data from the web (albeit noisy), in
which no manual labeling is involved. We then describe methods to efficiently learn from these webly
labeled audio recordings. In our proposed system,
WeblyNet, two deep neural networks co-teach each
other to robustly learn from webly labeled data,
leading to around 17% relative improvement over
the baseline method. The method also involves
transfer learning to obtain efficient representations