I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions. The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).
I shared my view on Qwen vs DeepSeek (student vs genius), and I forgot to mention this: they are neighbors in the same city. https://en.wikipedia.org/wiki/Hangzhou