Overview

weblab-10b-instruction-sftをBitsAndBytes(0.44.1)で4bit量子化

量子化の際のコードは以下の通りです。

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "matsuo-lab/weblab-10b-instruction-sft" 
repo_id = "indiebot-community/weblab-10b-instruction-sft-bnb-4bit"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")

tokenizer.push_to_hub(repo_id)
model.push_to_hub(repo_id)

tokenizer_config.json

tokenizer_config.jsonにchat_template設定を追加しています。

{
  (省略...)

  "bos_token": "<|endoftext|>",
  "chat_template": "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '\n' + message['content'] | trim + '<end_of_turn>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}",
  "clean_up_tokenization_spaces": true,
  "eos_token": "<|endoftext|>",
  "extra_special_tokens": {},
  "model_max_length": 1000000000000000019884624838656,
  "pad_token": "<|padding|>",
  "tokenizer_class": "PreTrainedTokenizer",
  "unk_token": "<|endoftext|>"
}
Downloads last month
4
Safetensors
Model size
5.76B params
Tensor type
F32
·
FP16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for indiebot-community/weblab-10b-instruction-sft-bnb-4bit

Quantized
(1)
this model