Abstract
This paper demonstrates accurate human pose estimation through walls and occlusions. We leverage the fact that
wireless signals in the WiFi frequencies traverse walls and
reflect off the human body. We introduce a deep neural network approach that parses such radio signals to estimate
2D poses. Since humans cannot annotate radio signals, we
use state-of-the-art vision model to provide cross-modal supervision. Specifically, during training the system uses synchronized wireless and visual inputs, extracts pose information from the visual stream, and uses it to guide the training
process. Once trained, the network uses only the wireless
signal for pose estimation. We show that, when tested on
visible scenes, the radio-based system is almost as accurate as the vision-based system used to train it. Yet, unlike
vision-based pose estimation, the radio-based system can
estimate 2D poses through walls despite never trained on
such scenarios. Demo videos are available at our website