In this talk, we propose a new method named RoofTune to accelerate the tuning process significantly. In particular, we devise a cost model based on Roofline Model to evaluate the schedule configs’ performance. At the same time, we introduce a general principle of the design for the cost model, which is easily to be deployed on different micro-architectures, e.g. GPUs and NPUs. We further give two specific cost models on NVidia GPUs and Huawei Ascend NPUs, respectively. Finally, we propose a flexible two-stage search algorithm based on the cost model, which can be combined optionally with other machine learning algorithms to improve the tuning processes. Experiments on NVidia GPUs and Huawei Ascend NPUs show that, the RoofTune speeds up the tuning processes about 4X and 10X comparing with AutoTVM on GPUs and the AutoTune of Huawei’s Tensor Boost Engine (TBE) on NPUs for some typical networks, respectively, while also improves the inference time of some modern DNNs by up to 7%.