[Uber Seattle] Horovod: Distributed Deep Learning on Spark

During this April 2019 meetup, Uber engineer Travis Addair introduces the concepts that make Horovod work, and walks through how to make use of Horovod on Spark to add distributed training to machine learning pipelines. Horovod is a distributed training framework for TensorFlow, PyTorch, Keras, and MXNet. Scaling to hundreds of GPUs, Horovod can reduce training time from hours to minutes with just a handful of lines added to existing single-GPU training processes.

2 views