资源算法edl

edl

2019-11-23 | |  107 |   0 |   0

PaddlePaddle EDL: Elastic Deep Learning

edl.png

While many hardware and software manufacturers are working on improving the running time of deep learning jobs, EDL optimizes

  1. the global utilization of the cluster, and

  2. the waiting time of job submitters.

For more about the project EDL, please refer to this invited blog post on the Kubernetes official blog.

EDL includes two parts:

  1. a Kubernetes controller for the elastic scheduling of distributed deep learning jobs, and

  2. making PaddlePaddle a fault-tolerable deep learning framework. This directory contains the Kubernetes controller. For more information about fault-tolerance, please refer to the design.

We deployed EDL on a real Kubernetes cluster, dlnel.com, opened for graduate students of Tsinghua University. The performance test report of EDL on this cluster is here.

Tutorials

Design Docs

Future

  • Resource Adjustments by EDL

  • Support Full-Tolerant Distributed Training in PadldePaddle Fluid.

FAQ

TBD

License

PaddlePaddle EDL is provided under the Apache-2.0 license.


上一篇:CoreNLP

下一篇:FASPell

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...