Abstract
In this paper, we investigate the structural similarities within a finite Markov decision process (MDP).
We view a finite MDP as a heterogeneous directed
bipartite graph and propose novel measures for the
state and action similarities, in a mutually reinforced manner. We prove that the state similarity
is a metric and the action similarity is a pseudometric. We also establish the connection between the
proposed similarity measures and the optimal values of the MDP. Extensive experiments show that
the proposed measures are effective