Adaptive Orchestration for Large-Scale Inference on Heterogeneous Accelerator Systems Balancing Cost, Performance, and Resilience Paper • 2503.20074 • Published Mar 25 • 6
jburtoft/NousResearch-Llama-2-13b-hf-seqlen-1024-bs-1-cores-24 Text Generation • Updated Apr 24, 2024 • 23