Abstract
Humans are remarkably proficient at controlling
their limbs and tools from a wide range of viewpoints.
In robotics, this ability is referred to as visual servoing: moving a tool or end-point to a desired location
using primarily visual feedback. In this paper, we propose learning viewpoint invariant visual servoing skills
in a robot manipulation task. We train a deep recurrent controller that can automatically determine which
actions move the end-effector of a robotic arm to a desired object. This problem is fundamentally ambiguous: under severe variation in viewpoint, it may be
impossible to determine the actions in a single feedforward operation. Instead, our visual servoing approach uses its memory of past movements to understand
how the actions affect the robot motion from the current viewpoint, correcting mistakes and gradually moving closer to the target. This ability is in stark contrast to previous visual servoing methods, which assume
known dynamics or require a calibration phase. We
learn our recurrent controller using simulated data, synthetic demonstrations and reinforcement learning. We
then describe how the resulting model can be transferred to a real-world robot by disentangling perception from control and only adapting the visual layers. The adapted model can servo to previously unseen objects from novel viewpoints on a real-world
Kuka IIWA robotic arm. For supplementary videos, see:
https://www.youtube.com/watch?v=oLgM2Bnb7fo