Sasi Kiran

sasikiran

AI & ML interests

Large language models

Recent Activity

liked a model 11 days ago

nvidia/Cosmos-Reason2-8B

reacted to codelion's post with 🔥 11 days ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

liked a model 20 days ago

zai-org/GLM-TTS

View all activity

Organizations

liked a model 11 days ago

nvidia/Cosmos-Reason2-8B

Image-Text-to-Text • 9B • Updated 17 days ago • 18.3k • 21

reacted to codelion's post with 🔥 11 days ago

Post

5842

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m

1 reply

liked 3 models 20 days ago

reacted to MonsterMMORPG's post with 🔥 about 2 months ago

Post

3542

Next level Realism with Qwen Image is now possible after new realism LoRA workflow - Top images are new realism workflow - Bottom ones are older default - Full tutorial published - 4+4 Steps only

Tutorial of realism : https://youtu.be/XWzZ2wnzNuQ

Tutorial of training : https://youtu.be/DPX3eBTuO_Y

This is a full comprehensive step-by-step tutorial for how to train Qwen Image models. This tutorial covers how to do LoRA training and full Fine-Tuning / DreamBooth training on Qwen Image models. It covers both the Qwen Image base model and the Qwen Image Edit Plus 2509 model. This tutorial is the product of 21 days of full R&D, costing over $800 in cloud services to find the best configurations for training. Furthermore, we have developed an amazing, ultra-easy-to-use Gradio app to use the legendary Kohya Musubi Tuner trainer with ease. You will be able to train locally on your Windows computer with GPUs with as little as 6 GB of VRAM for both LoRA and Fine-Tuning.