Shisa.AI ร Rakuten AI
To commemorate our recent visit to Rakuten HQ and Rakuten AI x Shisa AI - Building Japanese LLMs tech talk, it occurred to me, that it'd be fun and easy to give the Rakuten AI 2.0 Mini models a bit of the Shisa V2new razzmatazz. ๐ฅณ
We're happy to be able to now empirically show that our latest V2 recipe can improve the Japanese capabilities of base models from 1.5B to 405B in scale.
Here are the Shaberi results as judged by GPT-4.1:
Model | Average | ELYZA 100 | JA-MT | Rakuda | Tengu |
---|---|---|---|---|---|
037-rakuten-2.0-mini-instruct-1.5b-v2new-dpo405b | 5.10 | 5.42 | 4.60 | 5.68 | 4.70 |
035-rakuten-2.0-mini-1.5b-v2new-dpo405b | 5.04 | 4.88 | 4.62 | 5.38 | 5.28 |
034-rakuten-2.0-mini-1.5b-v2new-sft | 4.65 | 4.64 | 4.07 | 5.10 | 4.78 |
036-rakuten-2.0-mini-instruct-1.5b-v2new-sft | 4.63 | 5.44 | 3.78 | 4.70 | 4.58 |
Rakuten/RakutenAI-2.0-mini-instruct | 4.57 | 5.00 | 4.18 | 5.15 | 3.95 |
- 034/035 are the SFT and DPO stages of the Rakuten AI 2.0 Mini base model
- 036/037 are the SFT and DPO stages of the Rakuten AI 2.0 Mini instruct model
As an interesting point of comparison, here's the equivalent Shaberi results of our (also Mistral-based) Shisa Gamma 7B V1 released in December 2023 (and the version that was also given the V2new treatment):
Model | Average | ELYZA 100 | JA-MT | Rakuda | Tengu |
---|---|---|---|---|---|
029-shisa-gamma-7b-v1-v2new-dpo405b | 5.64 | 6.42 | 5.70 | 4.48 | 5.98 |
037-rakuten-2.0-mini-instruct-1.5b-v2new-dpo405b | 5.10 | 5.42 | 4.60 | 5.68 | 4.70 |
augmxnt/shisa-gamma-7b-v1 | 4.80 | 5.86 | 4.07 | 4.55 | 4.72 |
Rakuten/RakutenAI-2.0-mini-instruct | 4.57 | 5.00 | 4.18 | 5.15 | 3.95 |
These new models are Apache 2.0 licensed and can be used for any purpose.
These models were trained with OpenRLHF and took 10 hours for SFT and 2 hours for DPO on a single MI300X.
- Downloads last month
- 5
Model tree for shisa-ai/036-rakuten-2.0-mini-instruct-1.5b-v2new-sft
Base model
Rakuten/RakutenAI-2.0-mini