Kaito Sugimoto

kaisugi

AI & ML interests

Japanese LLMs

Recent Activity

reacted to lianghsun's post with šŸ‘ 2 days ago
šŸ–– Let me introduce the work I've done over the past three months: š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-š—§š—®š—¶š˜„š—®š—»-šŸÆš—• and š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-š—§š—®š—¶š˜„š—®š—»-šŸÆš—•-š—œš—»š˜€š˜š—暝˜‚š—°š˜, now open-sourced on šŸ¤— Hugging Face. š—¹š—¶š—®š—»š—“š—µš˜€š˜‚š—»/š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-š—§š—®š—¶š˜„š—®š—»-šŸÆš—•: This model is built on top of š—ŗš—²š˜š—®-š—¹š—¹š—®š—ŗš—®/š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-šŸÆš—• with continual pretraining. The training dataset consists of a mixture of Traditional Chinese and multilingual texts in specific proportions, including 20B tokens of Traditional Chinese text. š—¹š—¶š—®š—»š—“š—µš˜€š˜‚š—»/š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-š—§š—®š—¶š˜„š—®š—»-šŸÆš—•-š—œš—»š˜€š˜š—暝˜‚š—°š˜: This is a fine-tuned conversational model based on the foundation model. This Llama-3.2-Taiwan open-source project is currently a one-person effort (yes, I did everything from text preparation ā€” so exhausting!). If you're interested, feel free to join the Discord server for discussions. šŸ…±šŸ…“šŸ…½šŸ…²šŸ…·šŸ…¼šŸ…°šŸ†šŸ…ŗšŸ…øšŸ…½šŸ…¶ The evaluation was conducted using https://huggingface.co/datasets/ikala/tmmluplus, though the README page does not yet reflect the latest results. The performance is close to the previous versions, indicating that further improvements might require adding more specialized knowledge in the datasets. šŸ…° šŸ…²šŸ…°šŸ…»šŸ…» šŸ…µšŸ…¾šŸ† šŸ†‚šŸ†„šŸ…暟…暟…¾šŸ†šŸ†ƒ If anyone is willing to provide compute resources, it would be greatly appreciated to help this project continue and grow. šŸ’Ŗ --- šŸ”ļø Foundation model: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B šŸ¤– Instruction model: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct āš” GGUF: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct-GGUF
liked a model 4 days ago
LoneWolfgang/bert-for-japanese-twitter
View all activity

Organizations

Aizawa Laboratory at NII's profile picture Team Hatakeyama's profile picture Hugging Face Discord Community's profile picture

kaisugi's activity

upvoted 2 articles 19 days ago
view article
Article

Navigating Korean LLM Research #2: Evaluation Tools

By amphora ā€¢
ā€¢ 7
view article
Article

Navigating Korean LLM Research #1: Models

By amphora ā€¢
ā€¢ 23
upvoted an article 3 months ago
view article
Article

How to generate text: using different decoding methods for language generation with Transformers

ā€¢ 140