Besides intra-operator parallelism, another important optimization is to utlize the inter-operator parallelism within in multi-branch models (e.g., inception v3). One way to support inter-operator parallelism for DNN execution on CUDA platform is to utilize the CUDA stream. This talk introduce our support for multi-stream execution in the virtual machine of meta project.