Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 153
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • Jan 30 • 72
jiogenes/Llama-2-7b-hf-finetuned-open-korean-instructions Text Generation • Updated Jan 16, 2024 • 17