shampoo.pytorch
An implementation of shampoo, proposed in Shampoo : Preconditioned Stochastic Tensor Optimization by Vineet Gupta, Tomer Koren and Yoram Singer.
# Suppose the size of the tensor grad (i, j, k),# dim_id = 1 and dim = jgrad = grad.transpose_(0, dim_id).contiguous() # (j, i, k)transposed_size = grad.size()grad = grad.view(dim, -1) # (j, i x k)grad_t = grad.t() # (i x k, j)precond.add_(grad @ grad_t) # (j, j)inv_precond.copy_(_matrix_power(state[precond, -1 / order)) # (j, j)grad = grad = inv_precond @ grad # (j, i x k)grad = grad.view(transposed_size) # (j, i, k)