We studied the performance penalty of existing TVM LLVM backend, and proposed an alternative approach to better optimize GEMV performance. By leveraging tensorize and hand-written inner-kernel, the GEMV op generated by TVM could outperform MKL by around 2~3x on latest X86 servers.