Spaces:

oobabooga
/

accurate-gguf-vram-calculator

Running

App Files Files Community

Possible Improvements / Future Options

by Koitenshin - opened 8 days ago

Discussion

Koitenshin

8 days ago

•

edited 8 days ago

Future Improvements could include:

Even longer Context Length. Some models, such as Nemotron Nano v2 support a 1 million context length (i.e., 1048576)
Evaluation Batch Size
Flash Attention / Sage Attention / Triton / etc.
K / V Cache Quantization Type
More Quantization Types (e.g., Q4_K_M, Q4_K_L, Q6_K, etc.)

I use EBS set at 2048 quite frequently, because who wants to wait?
Flash Attention is very model dependent, as some models never load with FA turned on.

This tool is very helpful, thanks for making it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment