D RACO: Byzantine-resilient Distributed Training via Redundant Gradients

资源分类

2020-03-11 |

65 |

45 |

Abstract

Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the ge ometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present D RACO, a scalable framework for robust distributed training that uses idea from coding theory. In D RACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. D RACO comes with problemindependent robustness guarantees, and the model that it trains is identical to the one trained in t adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that D RACO is several times, to orders of magnitude faster than median-based approaches.

上一篇：On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups

下一篇：First Order Generative Adversarial Networks

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Joint Pose and Ex...

Facial expression recognition (FER) is a challe...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com