Event Details

Miniaturizing Models for microNPUs: a Cascading Scheduler for TVM

Date: 12/16/2021 12:45 pm
Track:
Edge & Embedded Lounge

Organization: Arm
Speakers: Matthew Barrett

We will present a new approach to scheduling machine learning models called ‘cascading’. Cascading is a form of inter-operator scheduling that can significantly reduce the working memory requirements of a model, allowing big models to run on tiny devices. It can also improve performance for memory-bound processors such as NPUs. In this talk, we’ll cover:

  • What cascading is and how it works
  • Why TVM is a great compiler framework for exploring cascading
  • How we at Arm leveraged this technique when compiling for our Arm® Ethos™-U NPU

This session is broken into two parts, a 20 minute talk followed by a 10 minute community breakout session.

Register for TVMCon 2021