Abstract
We address zero-shot learning using a new manifold
alignment framework based on a localized multi-scale
transform on graphs. Our inference approach includes a
smoothness criterion for a function mapping nodes on a
graph (visual representation) onto a linear space (semantic
representation), which we optimize using multi-scale graph
wavelets. The robustness of the ensuing scheme allows us
to operate with automatically generated semantic annotations, resulting in an algorithm that is entirely free of manual supervision, and yet improves the state-of-the-art as
measured on benchmark datasets.