Abstract There has been an increasing interest in 3D indoor navigation, where a robot in an environment moves to a target according to an instruction. To deploy a robot for navigation in the physical world, lots of training data is required to learn an effective policy. It is quite labour intensive to obtain suffificient real environment data for robots training while synthetic data is much easier to construct by rendering. Though it is promising to utilize the synthetic environments to facilitate navigation training in the real world, real environment are heterogeneous from synthetic environment in two aspects. First, the visual representations of the two environments have signifificant variances. Second, the house plans of the two environments are rather different. Therefore, two types of information, i.e., visual representation and policy behavior, need to be adapted in the reinforcement model. The learning procedure of visual representation and that of policy behavior are presumably reciprocal. We propose to jointly adapt visual representation and policy behavior to leverage the mutual impacts of environment and policy. Specififically, our method employs an adversarial feature adaptation model for visual representation transfer and a policy mimic strategy for policy behavior imitation. The experimental results show that our method outperforms the baseline by 21.73% without any additional human annotations