Abstract
Many machine learning tasks require sampling a
subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick
can be used to sample a single item, and allows
for low-variance reparameterized gradients with respect to the parameters of the underlying distribution. However, stochastic optimization involving
subset sampling is typically not reparameterizable.
To overcome this limitation, we define a continuous relaxation of subset sampling that provides
reparameterization gradients by generalizing the
Gumbel-max trick. We use this approach to sample subsets of features in an instance-wise feature
selection task for model interpretability, subsets of
neighbors to implement a deep stochastic k-nearest
neighbors model, and sub-sequences of neighbors
to implement parametric t-SNE by directly comparing the identities of local neighbors. We improve
performance in all these tasks by incorporating subset sampling in end-to-end training