In the concept stage (pre-implementation) of many ML accelerators, the integration of novel HW design ideas into the compile flow often happens after the implementation of the accelerator in HDL or SystemC. We demonstrate a proof-of-concept methodology to integrate a very flexible simulation-engine into TVM. This is used to quantify and explore novel HW architectures, profile them and optimize dedicated operators with the AutoTuner before the first line of HDL/SystemC is written. The hardware is described as high-level, low-overhead simulation model in Python. By integrating auto-tuning, the parameters of the operator schedule and the hardware parameters can be optimized together. This significantly increases the turn-around time for design space exploration.