Kaito Sugimoto

kaisugi

AI & ML interests

Japanese LLMs

Recent Activity

reacted to lianghsun's post with šŸ‘ 2 days ago
šŸ–– Let me introduce the work I've done over the past three months: š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-š—§š—®š—¶š˜„š—®š—»-šŸÆš—• and š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-š—§š—®š—¶š˜„š—®š—»-šŸÆš—•-š—œš—»š˜€š˜š—暝˜‚š—°š˜, now open-sourced on šŸ¤— Hugging Face. š—¹š—¶š—®š—»š—“š—µš˜€š˜‚š—»/š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-š—§š—®š—¶š˜„š—®š—»-šŸÆš—•: This model is built on top of š—ŗš—²š˜š—®-š—¹š—¹š—®š—ŗš—®/š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-šŸÆš—• with continual pretraining. The training dataset consists of a mixture of Traditional Chinese and multilingual texts in specific proportions, including 20B tokens of Traditional Chinese text. š—¹š—¶š—®š—»š—“š—µš˜€š˜‚š—»/š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ®-š—§š—®š—¶š˜„š—®š—»-šŸÆš—•-š—œš—»š˜€š˜š—暝˜‚š—°š˜: This is a fine-tuned conversational model based on the foundation model. This Llama-3.2-Taiwan open-source project is currently a one-person effort (yes, I did everything from text preparation ā€” so exhausting!). If you're interested, feel free to join the Discord server for discussions. šŸ…±šŸ…“šŸ…½šŸ…²šŸ…·šŸ…¼šŸ…°šŸ†šŸ…ŗšŸ…øšŸ…½šŸ…¶ The evaluation was conducted using https://huggingface.co/datasets/ikala/tmmluplus, though the README page does not yet reflect the latest results. The performance is close to the previous versions, indicating that further improvements might require adding more specialized knowledge in the datasets. šŸ…° šŸ…²šŸ…°šŸ…»šŸ…» šŸ…µšŸ…¾šŸ† šŸ†‚šŸ†„šŸ…暟…暟…¾šŸ†šŸ†ƒ If anyone is willing to provide compute resources, it would be greatly appreciated to help this project continue and grow. šŸ’Ŗ --- šŸ”ļø Foundation model: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B šŸ¤– Instruction model: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct āš” GGUF: https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct-GGUF
liked a model 4 days ago
LoneWolfgang/bert-for-japanese-twitter
View all activity

Organizations

Aizawa Laboratory at NII's profile picture Team Hatakeyama's profile picture Hugging Face Discord Community's profile picture

Posts 5

view post
Post
804
šŸš€ Llama-3-ELYZA-JP-8B

ELYZA, Inc. has developed two large language models (LLMs) for Japanese called "Llama-3-ELYZA-JP-70B" with 70 billion parameters and "Llama-3-ELYZA-JP-8B" with 8 billion parameters, based on Meta's "Llama 3" series. These models have been fine-tuned through additional pre-training and post-training to improve Japanese language capabilities significantly.

Key Points:

Performance:
- Llama-3-ELYZA-JP-70B surpasses global models such as GPT-4, Claude 3 Sonnet, and Gemini 1.5 Flash.
- Llama-3-ELYZA-JP-8B matches models like GPT-3.5 Turbo and Claude 3 Haiku despite having fewer parameters.

Availability:
- The 8B model is available on Hugging Face Hub and can be used for both research and commercial purposes under the Llama 3 Community License.

Methodology:
- ELYZA enhanced the Japanese performance of the Llama 3 models through additional training with high-quality Japanese corpora and Instruction Tuning with proprietary datasets.

Benchmarks:
- Evaluations using ELYZA Tasks 100 and Japanese MT-Bench showed significant improvements in Japanese language generation.

Inference Speed:
- To address inference speed issues due to model size, ELYZA implemented Speculative Decoding, which achieved up to 1.6 times faster inference for the 70B model.

Overall, ELYZA's models demonstrate state-of-the-art performance in Japanese language tasks and are optimized for both efficiency and effectiveness.

Model URL:
- elyza/Llama-3-ELYZA-JP-8B
- elyza/Llama-3-ELYZA-JP-8B-AWQ
- elyza/Llama-3-ELYZA-JP-8B-GGUF

Blog post (in Japanese):
https://note.com/elyza/n/n360b6084fdbd
view post
Post
683
šŸš€ KARAKURI LM 8x7B Instruct v0.1

KARAKURI Inc. has publicly released "KARAKURI LM 8x7B Instruct v0.1", the first domestic Large Language Model (LLM) in Japan to support Function calling and Retrieval-Augmented Generation (RAG). This AI agent can handle tasks across various applications autonomously, significantly reducing implementation costs compared to traditional models.

Model Features:
- Capable of autonomously choosing optimal documents and databases for various tasks.
- Applied extensively in customer support for automating responses and processes, analyzing Voice of Customer (VoC), and predicting optimal outreach timings.

Model URL:
karakuri-ai/karakuri-lm-8x7b-instruct-v0.1

Detailed press release (in Japanese):
https://karakuri.ai/seminar/news/karakuri-lm-8x7b-instruct-v0-1/

datasets

None public yet