Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

jarrelscy
/
Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit

Text Generation
Transformers
Safetensors
mixtral
conversational
text-generation-inference
4-bit precision
gptq
Model card Files Files and versions
xet
Community
3
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram)

1
#3 opened over 1 year ago by
orel12

KeyError: 'model.layers.45.block_sparse_moe.gate.g_idx'

5
#2 opened over 1 year ago by
tutu329

no special_tokens_map.json tokenizer_config.json and tokenizer.json

#1 opened over 1 year ago by
tutu329
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs