13 62 141

Kaito Sugimoto

kaisugi

https://kaisugi.me

kaisugi

AI & ML interests

Japanese LLMs

Recent Activity

upvoted a collection about 8 hours ago

TinySwallow

liked a model 3 days ago

cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese

liked a model 6 days ago

nu-dialogue/j-moshi-ext

View all activity

Organizations

kaisugi's activity

upvoted a collection about 8 hours ago

TinySwallow

Collection

Compact Japanese models trained with "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models" • 5 items • Updated about 8 hours ago • 4

liked a model 3 days ago

cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese

Text Generation • Updated 3 days ago • 3.45k • 190

liked 2 models 6 days ago

nu-dialogue/j-moshi-ext

Updated 3 days ago • 29

karakuri-ai/karakuri-lm-32b-thinking-2501-exp

Text Generation • Updated 7 days ago • 60 • 6

liked a model 9 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 4 days ago • 285k • 5.04k

liked a Space 10 days ago

Running

🥇

JMMMU Leaderboard

reacted to lianghsun's post with 👍 14 days ago

Post

1710

🖖 Let me introduce the work I've done over the past three months: 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝗧𝗮𝗶𝘄𝗮𝗻-𝟯𝗕 and 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝗧𝗮𝗶𝘄𝗮𝗻-𝟯𝗕-𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁, now open-sourced on 🤗 Hugging Face.

𝗹𝗶𝗮𝗻𝗴𝗵𝘀𝘂𝗻/𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝗧𝗮𝗶𝘄𝗮𝗻-𝟯𝗕: This model is built on top of 𝗺𝗲𝘁𝗮-𝗹𝗹𝗮𝗺𝗮/𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝟯𝗕 with continual pretraining. The training dataset consists of a mixture of Traditional Chinese and multilingual texts in specific proportions, including 20B tokens of Traditional Chinese text.

𝗹𝗶𝗮𝗻𝗴𝗵𝘀𝘂𝗻/𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝗧𝗮𝗶𝘄𝗮𝗻-𝟯𝗕-𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁: This is a fine-tuned conversational model based on the foundation model.

This Llama-3.2-Taiwan open-source project is currently a one-person effort (yes, I did everything from text preparation — so exhausting!). If you're interested, feel free to join the Discord server for discussions.

🅱🅴🅽🅲🅷🅼🅰🆁🅺🅸🅽🅶

The evaluation was conducted using ikala/tmmluplus, though the README page does not yet reflect the latest results. The performance is close to the previous versions, indicating that further improvements might require adding more specialized knowledge in the datasets.

🅰 🅲🅰🅻🅻 🅵🅾🆁 🆂🆄🅿🅿🅾🆁🆃

If anyone is willing to provide compute resources, it would be greatly appreciated to help this project continue and grow. 💪

---
🏔️ Foundation model: lianghsun/Llama-3.2-Taiwan-3B
🤖 Instruction model: lianghsun/Llama-3.2-Taiwan-3B-Instruct
⚡ GGUF: lianghsun/Llama-3.2-Taiwan-3B-Instruct-GGUF