资源论文D RACO: Byzantine-resilient Distributed Training via Redundant Gradients

D RACO: Byzantine-resilient Distributed Training via Redundant Gradients

2020-03-11 | |  65 |   45 |   0

Abstract

Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the ge ometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present D RACO, a scalable framework for robust distributed training that uses idea from coding theory. In D RACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. D RACO comes with problemindependent robustness guarantees, and the model that it trains is identical to the one trained in t adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that D RACO is several times, to orders of magnitude faster than median-based approaches.

上一篇:On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups

下一篇:First Order Generative Adversarial Networks

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Joint Pose and Ex...

    Facial expression recognition (FER) is a challe...