File size: 2,332 Bytes
8500e13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
license: apache-2.0
language:
- ja
---
# Tanuki-8B-Instruct
## Model Details

- **Model type:** [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)-like pretrained Language Model
- **Total seen tokens:** 280B

|Params|Layers|Hidden size|Intermediate size|Attention Heads|KV Heads|Context length|Rope Theta|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|8b|32|4096|14336|32|8|8192|500000|

## Usage

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("hatakeyama-llm-team/Tanuki-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("hatakeyama-llm-team/Tanuki-8B-Instruct", torch_dtype=torch.bfloat16).to('cuda')
chat = [
    {"role": "system", "content": "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"},
    {"role": "user", "content": "たぬきってなんですか?"},
]
tokenized_input = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(
        tokenized_input,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        repetition_penalty=1.05,
    )[0]
print(tokenizer.decode(output))
```
<p style="font-size: medium; color: gray;">
※生成時にtokenizer.apply_chat_templateではなくtokenizer.encode()を用いる場合は、文末にEOSトークンが挿入されないようadd_special_tokens=Falseを設定してください。<br>
例: tokenizer.encode(input_text, add_special_tokens=False, return_tensors="pt")<br>
tokenizer.apply_chat_templateの場合はadd_special_tokens=Falseがデフォルトのため問題ありません。
</p>

| Model Variant | 
| :--- |
|**Instruction models**|
| [hatakeyama-llm-team/Tanuki-8B-Instruct](https://huggingface.co/hatakeyama-llm-team/Tanuki-8B-Instruct) |
| [hatakeyama-llm-team/Tanuki-8B-Instruct-without-DPO](https://huggingface.co/hatakeyama-llm-team/Tanuki-8B-Instruct-without-DPO) |
|**Pre-trained models**|
| [Tanuki-8B](https://huggingface.co/hatakeyama-llm-team/Tanuki-8B) |
| [Tanuki-8B-Before-Context-Length-Extension](https://huggingface.co/hatakeyama-llm-team/Tanuki-8B-Before-Context-Length-Extension) |