File size: 5,240 Bytes
1d8e62a
 
 
 
 
 
 
 
 
 
 
 
a3c8cde
 
 
1d8e62a
c11ca0b
1d8e62a
 
 
 
 
 
 
 
 
 
 
 
a3c8cde
 
 
 
1d8e62a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a3c8cde
 
1d8e62a
 
 
 
 
 
67cde2a
1d8e62a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
548b64c
1d8e62a
 
 
 
 
a3c8cde
1d8e62a
 
 
 
 
fad35e4
 
a3c8cde
fad35e4
1d8e62a
 
 
 
 
 
 
 
fad35e4
 
 
1d8e62a
 
 
 
 
 
 
fad35e4
 
 
1d8e62a
 
 
 
fad35e4
1d8e62a
 
 
 
 
fad35e4
1d8e62a
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
license: llama3
datasets:
- mc4
- wikipedia
- EleutherAI/pile
- oscar-corpus/colossal-oscar-1.0
- cc100
language:
- ja
- en
tags:
- llama
- llama-3
inference: false
base_model: meta-llama/Meta-Llama-3-8B
---

# `Llama 3 Youko 8B (rinna/llama-3-youko-8b)`

![rinna-icon](./rinna.png)

# Overview

We conduct continual pre-training of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on **22B** tokens from a mixture of Japanese and English datasets. The continual pre-training significantly improves the model's performance on Japanese tasks.

The name `youko` comes from the Japanese word [`妖狐/ようこ/Youko`](https://ja.wikipedia.org/wiki/%E5%A6%96%E7%8B%90), which is a kind of Japanese mythical creature ([`妖怪/ようかい/Youkai`](https://ja.wikipedia.org/wiki/%E5%A6%96%E6%80%AA)).

| Size | Continual Pre-Training | Instruction-Tuning |
| :-   | :-                     | :-                 |
| 8B   | Llama 3 Youko 8B [[HF]](https://huggingface.co/rinna/llama-3-youko-8b) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-8b-gptq) | Llama 3 Youko 8B Instruct [[HF]](https://huggingface.co/rinna/llama-3-youko-8b-instruct) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-8b-instruct-gptq) |
| 70B  | Llama 3 Youko 70B [[HF]](https://huggingface.co/rinna/llama-3-youko-70b) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-70b-gptq) | Llama 3 Youko 70B Instruct [[HF]](https://huggingface.co/rinna/llama-3-youko-70b-instruct) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-70b-instruct-gptq) |

* **Library**

    The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).

* **Model architecture**

    A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.

* **Training: Built with Meta Llama 3**

    The model was initialized with the [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model and continually trained on around **22B** tokens from a mixture of the following corpora
    - [Japanese CC-100](https://huggingface.co/datasets/cc100)
    - [Japanese C4](https://huggingface.co/datasets/mc4)
    - [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
    - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
    - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
    - rinna curated Japanese dataset
  
* **Contributors**

    - [Koh Mitsuda](https://huggingface.co/mitsu-koh)
    - [Xinqi Chen](https://huggingface.co/Keely0419)
    - [Toshiaki Wakatsuki](https://huggingface.co/t-w)
    - [Kei Sawada](https://huggingface.co/keisawada)

---

# Benchmarking

Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).

---

# How to use the model

~~~~python
import transformers
import torch

model_id = "rinna/llama-3-youko-8b"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto"
)
output = pipeline(
    "西田幾多郎は、",
    max_new_tokens=256,
    do_sample=True
)
print(output[0]["generated_text"])
~~~~

---

# Tokenization
The model uses the original [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) tokenizer.

---

# How to cite
```bibtex
@misc{rinna-llama-3-youko-8b,
    title = {rinna/llama-3-youko-8b},
    author = {Mitsuda, Koh and Chen, Xinqi and Wakatsuki, Toshiaki and Sawada, Kei},
    url = {https://huggingface.co/rinna/llama-3-youko-8b}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}
```
---

# References
```bibtex
@article{llama3modelcard,
    title = {Llama 3 Model Card},
    author = {AI@Meta},
    year = {2024},
    url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}

@software{gpt-neox-library,
    title = {{GPT}-{N}eo{X}: Large Scale Autoregressive Language Modeling in {P}y{T}orch},
    author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
    doi = {10.5281/zenodo.5879544},
    month = {8},
    year = {2021},
    version = {0.0.1},
    url = {https://www.github.com/eleutherai/gpt-neox}
}
```
---

# License
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)