Extremely Fast GEMM on AVX512 CPUs Combining TVM and XSMM

Date: 12/17/2021 2:40 pm
Organization: Tencent Holding Limited
Speakers: Honglin Zhu
We studied the performance penalty of existing TVM LLVM backend, and proposed an alternative approach to better optimize GEMV performance. By leveraging tensorize and hand-written inner-kernel, the GEMV op generated by TVM could outperform MKL by around 2~3x on latest X86 servers.

