157 18 439

Djuunaa

djuna

AI & ML interests

None yet

Recent Activity

liked a model 6 days ago

LiquidAI/LFM2-1.2B

liked a model 6 days ago

HaochenWang/TreeVGR-7B

liked a model 8 days ago

Bifrost-AI/Qwen3-Bifrost-SOL-4B

View all activity

Organizations

liked 2 models 6 days ago

LiquidAI/LFM2-1.2B

Text Generation • 1B • Updated 4 days ago • 6.8k • 189

HaochenWang/TreeVGR-7B

Image-Text-to-Text • 8B • Updated 4 days ago • 363 • 4

liked 2 models 8 days ago

Bifrost-AI/Qwen3-Bifrost-SOL-4B

Text Generation • 4B • Updated 5 days ago • 9 • 3

nvidia/Qwen-3-Nemotron-32B-Reward

Text Classification • 32B • Updated 22 days ago • 143 • 10

liked 3 models 10 days ago

reacted to eaddario's post with 👍 13 days ago

Post

3771

Layer-wise and Pruned versions of cognitivecomputations/Dolphin-Mistral-24B-Venice-Edition

* Tesor-wise: eaddario/Dolphin-Mistral-24B-Venice-Edition-GGUF

* Pruned: eaddario/Dolphin-Mistral-24B-Venice-Edition-pruned-GGUF

Summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.

liked a model 16 days ago

tngtech/DeepSeek-TNG-R1T2-Chimera

Text Generation • 685B • Updated 9 days ago • 5.27k • 213

reacted to eaddario's post with 🚀 20 days ago

Post

3729

Layer-wise and Pruned versions of Qwen/Qwen3-30B-A3B

* Tesor-wise: eaddario/Qwen3-30B-A3B-GGUF
* Pruned: eaddario/Qwen3-30B-A3B-pruned-GGUF

Even though the Perplexity scores of the pruned version are 3 times higher, the ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores are holding remarkably well, considering two layers were removed (5 and 39). This seems to support Xin Men et al conclusions in
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect (2403.03853)

Results summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.

reacted to FlameF0X's post with 👍 21 days ago

Post

1931

SnowflakeCore-G1 development update: We're building a 24-layer transformer with 32K context and 1024 embedding dimensions - pretty ambitious! Even running at batch_size=1 with heavy gradient accumulation, we're hitting memory walls at 300GB RAM. Scaling up to ~1TB will take some time, but the architecture is looking promising. Thanks for following along with the journey! 😅