Scaling Deep Learning on Hadoop at LinkedIn

Deep learning has become widespread as frameworks such as TensorFlow and PyTorch have made it easy to onboard machine learning applications. However, while it is easy to start developing with these frameworks on your local developer machine, scaling up a model to run on a cluster and train on huge datasets is still challenging. Code and dependencies have to be copied to every machine and defining the cluster configurations is tedious and error-prone. In addition, troubleshooting errors and aggregating logs
Back to Top