|
--- |
|
license: apache-2.0 |
|
language: |
|
- ja |
|
--- |
|
# Tanuki-8B-Instruct |
|
## Model Details |
|
|
|
- **Model type:** [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)-like pretrained Language Model |
|
- **Total seen tokens:** 280B |
|
|
|
|Params|Layers|Hidden size|Intermediate size|Attention Heads|KV Heads|Context length|Rope Theta| |
|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |
|
|8b|32|4096|14336|32|8|8192|500000| |
|
|
|
## Usage |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
tokenizer = AutoTokenizer.from_pretrained("hatakeyama-llm-team/Tanuki-8B-Instruct") |
|
model = AutoModelForCausalLM.from_pretrained("hatakeyama-llm-team/Tanuki-8B-Instruct", torch_dtype=torch.bfloat16).to('cuda') |
|
chat = [ |
|
{"role": "system", "content": "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"}, |
|
{"role": "user", "content": "たぬきってなんですか?"}, |
|
] |
|
tokenized_input = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(model.device) |
|
with torch.no_grad(): |
|
output = model.generate( |
|
tokenized_input, |
|
max_new_tokens=256, |
|
do_sample=True, |
|
temperature=0.7, |
|
repetition_penalty=1.05, |
|
)[0] |
|
print(tokenizer.decode(output)) |
|
``` |
|
<p style="font-size: medium; color: gray;"> |
|
※生成時にtokenizer.apply_chat_templateではなくtokenizer.encode()を用いる場合は、文末にEOSトークンが挿入されないようadd_special_tokens=Falseを設定してください。<br> |
|
例: tokenizer.encode(input_text, add_special_tokens=False, return_tensors="pt")<br> |
|
tokenizer.apply_chat_templateの場合はadd_special_tokens=Falseがデフォルトのため問題ありません。 |
|
</p> |
|
|
|
| Model Variant | |
|
| :--- | |
|
|**Instruction models**| |
|
| [hatakeyama-llm-team/Tanuki-8B-Instruct](https://huggingface.co/hatakeyama-llm-team/Tanuki-8B-Instruct) | |
|
| [hatakeyama-llm-team/Tanuki-8B-Instruct-without-DPO](https://huggingface.co/hatakeyama-llm-team/Tanuki-8B-Instruct-without-DPO) | |
|
|**Pre-trained models**| |
|
| [Tanuki-8B](https://huggingface.co/hatakeyama-llm-team/Tanuki-8B) | |
|
| [Tanuki-8B-Before-Context-Length-Extension](https://huggingface.co/hatakeyama-llm-team/Tanuki-8B-Before-Context-Length-Extension) | |