Modern systems that run Machine Learning (ML) workloads have a wide spectrum of devices, from smaller micro-controllers to bigger System-on-Chip (SoCs). The scale of memory present in the devices will determine not only how performant the execution is going to be, but also whether the device could execute the ML workload at all (especially in micro-controllers). We introduce the Unified “Static” Memory Planner (USMP), a comprehensive solution that can ensure memory for tensors that are used within the operators (intra-operator tensors) and between the operators (inter-operator tensors) utilise the least amount of device memory. Additionally, USMP enables the compiler to pool tensors to differently sized memory pools, allowing for intermediary data tensors and constant tensors to be placed on different memory “homes” as specified by the user.
This session is broken into two parts, a 20 minute talk followed by a 10 minute community breakout session.