Shisa.AI × Rakuten AI

To commemorate our recent visit to Rakuten HQ and Rakuten AI x Shisa AI - Building Japanese LLMs tech talk, it occurred to me, that it'd be fun and easy to give the Rakuten AI 2.0 Mini models a bit of the Shisa V2new razzmatazz. 🥳

We're happy to be able to now empirically show that our latest V2 recipe can improve the Japanese capabilities of base models from 1.5B to 405B in scale.

Here are the Shaberi results as judged by GPT-4.1:

Model	Average	ELYZA 100	JA-MT	Rakuda	Tengu
037-rakuten-2.0-mini-instruct-1.5b-v2new-dpo405b	5.10	5.42	4.60	5.68	4.70
035-rakuten-2.0-mini-1.5b-v2new-dpo405b	5.04	4.88	4.62	5.38	5.28
034-rakuten-2.0-mini-1.5b-v2new-sft	4.65	4.64	4.07	5.10	4.78
036-rakuten-2.0-mini-instruct-1.5b-v2new-sft	4.63	5.44	3.78	4.70	4.58
Rakuten/RakutenAI-2.0-mini-instruct	4.57	5.00	4.18	5.15	3.95

034/035 are the SFT and DPO stages of the Rakuten AI 2.0 Mini base model
036/037 are the SFT and DPO stages of the Rakuten AI 2.0 Mini instruct model

As an interesting point of comparison, here's the equivalent Shaberi results of our (also Mistral-based) Shisa Gamma 7B V1 released in December 2023 (and the version that was also given the V2new treatment):

Model	Average	ELYZA 100	JA-MT	Rakuda	Tengu
029-shisa-gamma-7b-v1-v2new-dpo405b	5.64	6.42	5.70	4.48	5.98
037-rakuten-2.0-mini-instruct-1.5b-v2new-dpo405b	5.10	5.42	4.60	5.68	4.70
augmxnt/shisa-gamma-7b-v1	4.80	5.86	4.07	4.55	4.72
Rakuten/RakutenAI-2.0-mini-instruct	4.57	5.00	4.18	5.15	3.95

These new models are Apache 2.0 licensed and can be used for any purpose.

These models were trained with OpenRLHF and took 10 hours for SFT and 2 hours for DPO on a single MI300X.

Compute sponsored by Hot Aisle and AMD.

shisa-ai
/

036-rakuten-2.0-mini-instruct-1.5b-v2new-sft

Shisa.AI × Rakuten AI

Model tree for shisa-ai/036-rakuten-2.0-mini-instruct-1.5b-v2new-sft