File size: 5,240 Bytes
1d8e62a a3c8cde 1d8e62a c11ca0b 1d8e62a a3c8cde 1d8e62a a3c8cde 1d8e62a 67cde2a 1d8e62a 548b64c 1d8e62a a3c8cde 1d8e62a fad35e4 a3c8cde fad35e4 1d8e62a fad35e4 1d8e62a fad35e4 1d8e62a fad35e4 1d8e62a fad35e4 1d8e62a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
license: llama3
datasets:
- mc4
- wikipedia
- EleutherAI/pile
- oscar-corpus/colossal-oscar-1.0
- cc100
language:
- ja
- en
tags:
- llama
- llama-3
inference: false
base_model: meta-llama/Meta-Llama-3-8B
---
# `Llama 3 Youko 8B (rinna/llama-3-youko-8b)`

# Overview
We conduct continual pre-training of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on **22B** tokens from a mixture of Japanese and English datasets. The continual pre-training significantly improves the model's performance on Japanese tasks.
The name `youko` comes from the Japanese word [`妖狐/ようこ/Youko`](https://ja.wikipedia.org/wiki/%E5%A6%96%E7%8B%90), which is a kind of Japanese mythical creature ([`妖怪/ようかい/Youkai`](https://ja.wikipedia.org/wiki/%E5%A6%96%E6%80%AA)).
| Size | Continual Pre-Training | Instruction-Tuning |
| :- | :- | :- |
| 8B | Llama 3 Youko 8B [[HF]](https://huggingface.co/rinna/llama-3-youko-8b) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-8b-gptq) | Llama 3 Youko 8B Instruct [[HF]](https://huggingface.co/rinna/llama-3-youko-8b-instruct) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-8b-instruct-gptq) |
| 70B | Llama 3 Youko 70B [[HF]](https://huggingface.co/rinna/llama-3-youko-70b) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-70b-gptq) | Llama 3 Youko 70B Instruct [[HF]](https://huggingface.co/rinna/llama-3-youko-70b-instruct) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-70b-instruct-gptq) |
* **Library**
The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
* **Model architecture**
A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
* **Training: Built with Meta Llama 3**
The model was initialized with the [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model and continually trained on around **22B** tokens from a mixture of the following corpora
- [Japanese CC-100](https://huggingface.co/datasets/cc100)
- [Japanese C4](https://huggingface.co/datasets/mc4)
- [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
- [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
- [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
- rinna curated Japanese dataset
* **Contributors**
- [Koh Mitsuda](https://huggingface.co/mitsu-koh)
- [Xinqi Chen](https://huggingface.co/Keely0419)
- [Toshiaki Wakatsuki](https://huggingface.co/t-w)
- [Kei Sawada](https://huggingface.co/keisawada)
---
# Benchmarking
Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
---
# How to use the model
~~~~python
import transformers
import torch
model_id = "rinna/llama-3-youko-8b"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto"
)
output = pipeline(
"西田幾多郎は、",
max_new_tokens=256,
do_sample=True
)
print(output[0]["generated_text"])
~~~~
---
# Tokenization
The model uses the original [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) tokenizer.
---
# How to cite
```bibtex
@misc{rinna-llama-3-youko-8b,
title = {rinna/llama-3-youko-8b},
author = {Mitsuda, Koh and Chen, Xinqi and Wakatsuki, Toshiaki and Sawada, Kei},
url = {https://huggingface.co/rinna/llama-3-youko-8b}
}
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}
```
---
# References
```bibtex
@article{llama3modelcard,
title = {Llama 3 Model Card},
author = {AI@Meta},
year = {2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
@software{gpt-neox-library,
title = {{GPT}-{N}eo{X}: Large Scale Autoregressive Language Modeling in {P}y{T}orch},
author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
doi = {10.5281/zenodo.5879544},
month = {8},
year = {2021},
version = {0.0.1},
url = {https://www.github.com/eleutherai/gpt-neox}
}
```
---
# License
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/) |