NVIDIA Developer How To Series: Introduction to Recurrent Neural Networks in TensorRT

NVIDIA TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency and high-throughput. TensorRT can import trained models from every deep learning framework to easily create highly efficient inference engines that can be incorporated into larger applications and services. This video demonstrates how to configure a simple Recurrent Neural Network (RNN) based on the character-level language model using NVIDIA TensorRT. Five Key Things from this video: 1. TensorRT supports RNNv2, MatrixMultiply, ElementWise, TopK layers. 2. Weights for each gate and layer need to be set separately for the RNNv2 layer and the input format for RNNv2 is BSE (Batch, Sequence, Embedding). 3. Fully Connected layer can also be implemented with a MatrixMultiply layer and an Element Wise layer. Alternatively, you can directly use the Fully Connected layer of TensorRT, but it requires a reshape of the weights before they are fed to
Back to Top