YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Quantization made by Richard Erkhov.
saily_100b - GGUF
- Model creator: https://huggingface.co/deepnight-research/
- Original model: https://huggingface.co/deepnight-research/saily_100b/
Name | Quant method | Size |
---|---|---|
saily_100b.Q2_K.gguf | Q2_K | 40.28GB |
saily_100b.IQ3_XS.gguf | IQ3_XS | 44.82GB |
saily_100b.IQ3_S.gguf | IQ3_S | 47.37GB |
saily_100b.Q3_K_S.gguf | Q3_K_S | 47.22GB |
saily_100b.IQ3_M.gguf | IQ3_M | 49.0GB |
saily_100b.Q3_K.gguf | Q3_K | 52.7GB |
saily_100b.Q3_K_M.gguf | Q3_K_M | 52.7GB |
saily_100b.Q3_K_L.gguf | Q3_K_L | 57.43GB |
saily_100b.IQ4_XS.gguf | IQ4_XS | 59.08GB |
saily_100b.Q4_0.gguf | Q4_0 | 61.76GB |
saily_100b.IQ4_NL.gguf | IQ4_NL | 62.35GB |
saily_100b.Q4_K_S.gguf | Q4_K_S | 62.22GB |
saily_100b.Q4_K.gguf | Q4_K | 65.79GB |
saily_100b.Q4_K_M.gguf | Q4_K_M | 65.79GB |
saily_100b.Q4_1.gguf | Q4_1 | 68.59GB |
saily_100b.Q5_0.gguf | Q5_0 | 75.43GB |
saily_100b.Q5_K_S.gguf | Q5_K_S | 75.43GB |
saily_100b.Q5_K.gguf | Q5_K | 77.51GB |
saily_100b.Q5_K_M.gguf | Q5_K_M | 77.51GB |
saily_100b.Q5_1.gguf | Q5_1 | 82.27GB |
saily_100b.Q6_K.gguf | Q6_K | 89.96GB |
saily_100b.Q8_0.gguf | Q8_0 | 116.52GB |
Original model description:
license: mit license_name: deepnight-responsible-ai license_link: LICENSE
SaiLy 100B (deepnight-research/saily_100B)

SaiLy is a series/collection of AI Models by DEEPNIGHT-RESEARCH which are highly experimental and uncensored. Please use with responsibility.
*waiting for evals, the model is submitted on HuggingFace OpenLLM Leaderboard, and is currently in the pending list* Prompt Template: Alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
Description:
This is the first stable model of the series. The model is based on Llama2-chat.
Did some said CODE?
Here you go!
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
'deepnight-research/saily_100B'
)
To use the optimized triton implementation of FlashAttention, you can load the model on GPU (cuda:0)
with attn_impl='triton'
and with bfloat16
precision:
import torch
import transformers
name = 'deepnight-research/saily_100B'
config = transformers.AutoConfig.from_pretrained(name)
config.attn_config['attn_impl'] = 'triton'
config.init_device = 'cuda:0' # For fast initialization directly on GPU!
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
torch_dtype=torch.bfloat16, # Load model weights in bfloat16
trust_remote_code=True
)
If you would like to support us, please consider donating for #aiforcause.
Cheers✌️
- Team DEEPNIGHT
- Downloads last month
- 13
Hardware compatibility
Log In
to view the estimation
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support