Abstract
In the analysis of machine learning models, it is often convenient to assume that the parameters are IID.
This assumption is not satisfied when the parameters are updated through training processes such as
Stochastic Gradient Descent. A relaxation of the
IID condition is a probabilistic symmetry known
as exchangeability. We show the sense in which
the weights in MLPs are exchangeable. This yields
the result that in certain instances, the layer-wise
kernel of fully-connected layers remains approximately constant during training. Our results shed
light on such kernel properties throughout training
while limiting the use of unrealistic assumptions