The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

Date: 12/16/2021 3:34 pm
Track:

Lightning Talks

Organization: Carnegie Mellon University
Speakers: Pratik Fegade
Download Slides

There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Techniques such as padding and masking are used to make the data shapes uniform and then offload the computations to optimized kernels for dense tensor. Such techniques can, however, lead to a lot of wasted computation and therefore, performance loss. We present CoRa, a tensor compiler built on top of Apache TVM, that allows users to generate efficient code for ragged tensor operators targeting CPUs and GPUs. Evaluating CoRa on a variety of ragged operators as well as on a transformer encoder layer, we find that CoRa (i) performs as well as hand-optimized implementations of the operators and the transformer encoder and (ii) achieves, over PyTorch, a 1.6X geomean speedup for the encoder on an Nvidia GPU and a 1.86X geomean speedup for the multi-head attention module used in transformers on an ARM CPU.

Event Details

The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

Register for TVMCon 2021