Shisa.AI ร— Rakuten AI

To commemorate our recent visit to Rakuten HQ and Rakuten AI x Shisa AI - Building Japanese LLMs tech talk, it occurred to me, that it'd be fun and easy to give the Rakuten AI 2.0 Mini models a bit of the Shisa V2new razzmatazz. ๐Ÿฅณ

We're happy to be able to now empirically show that our latest V2 recipe can improve the Japanese capabilities of base models from 1.5B to 405B in scale.

Here are the Shaberi results as judged by GPT-4.1:

Model Average ELYZA 100 JA-MT Rakuda Tengu
037-rakuten-2.0-mini-instruct-1.5b-v2new-dpo405b 5.10 5.42 4.60 5.68 4.70
035-rakuten-2.0-mini-1.5b-v2new-dpo405b 5.04 4.88 4.62 5.38 5.28
034-rakuten-2.0-mini-1.5b-v2new-sft 4.65 4.64 4.07 5.10 4.78
036-rakuten-2.0-mini-instruct-1.5b-v2new-sft 4.63 5.44 3.78 4.70 4.58
Rakuten/RakutenAI-2.0-mini-instruct 4.57 5.00 4.18 5.15 3.95
  • 034/035 are the SFT and DPO stages of the Rakuten AI 2.0 Mini base model
  • 036/037 are the SFT and DPO stages of the Rakuten AI 2.0 Mini instruct model

As an interesting point of comparison, here's the equivalent Shaberi results of our (also Mistral-based) Shisa Gamma 7B V1 released in December 2023 (and the version that was also given the V2new treatment):

Model Average ELYZA 100 JA-MT Rakuda Tengu
029-shisa-gamma-7b-v1-v2new-dpo405b 5.64 6.42 5.70 4.48 5.98
037-rakuten-2.0-mini-instruct-1.5b-v2new-dpo405b 5.10 5.42 4.60 5.68 4.70
augmxnt/shisa-gamma-7b-v1 4.80 5.86 4.07 4.55 4.72
Rakuten/RakutenAI-2.0-mini-instruct 4.57 5.00 4.18 5.15 3.95

These new models are Apache 2.0 licensed and can be used for any purpose.

These models were trained with OpenRLHF and took 10 hours for SFT and 2 hours for DPO on a single MI300X.

Compute sponsored by Hot Aisle and AMD.

Downloads last month
5
Safetensors
Model size
1.53B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for shisa-ai/036-rakuten-2.0-mini-instruct-1.5b-v2new-sft

Finetuned
(4)
this model