Barzin Lotfabadi

blotfaba

https://www.sanctus.ai

AI & ML interests

AI architecture design, foundational & theoretical AI research.

Recent Activity

reacted to burtenshaw's post with ❤️ about 1 month ago

NEW UNIT in the Hugging Face Reasoning course. We dive deep into the algorithm behind DeepSeek R1 with an advanced and hands-on guide to interpreting GRPO. 🔗 https://huggingface.co/reasoning-course This unit is super useful if you’re tuning models with reinforcement learning. It will help with: - interpreting loss and reward progression during training runs - selecting effective parameters for training - reviewing and defining effective reward functions This unit also works up smoothly toward the existing practical exercises form @mlabonne and Unsloth. 📣 Shout out to @ShirinYamani who wrote the unit. Follow for more great content.

reacted to bartowski's post with 👍 6 months ago

Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few) The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16 Had claude make me a script, using the new Reflection-70B, and these are the results: Total weights: 70553706496 Fully representable: 70530215524 Squashed: 23490972 Percentage squashed: 0.03% 0.03%!!!! A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8) This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers. Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.

updated a model 6 months ago

blotfaba/Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf

View all activity

Organizations

blotfaba's activity

reacted to burtenshaw's post with ❤️ about 1 month ago

Post

3321

NEW UNIT in the Hugging Face Reasoning course. We dive deep into the algorithm behind DeepSeek R1 with an advanced and hands-on guide to interpreting GRPO.

🔗

reasoning-course

This unit is super useful if you’re tuning models with reinforcement learning. It will help with:

- interpreting loss and reward progression during training runs
- selecting effective parameters for training
- reviewing and defining effective reward functions

This unit also works up smoothly toward the existing practical exercises form @mlabonne and Unsloth.

📣 Shout out to @ShirinYamani who wrote the unit. Follow for more great content.

1 reply

reacted to bartowski's post with 👍 6 months ago

Post

16289

Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few)

The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16

Had claude make me a script, using the new Reflection-70B, and these are the results:

Total weights: 70553706496
Fully representable: 70530215524
Squashed: 23490972
Percentage squashed: 0.03%

0.03%!!!!

A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8)

This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers.

Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.

20 replies

updated a model 6 months ago

blotfaba/Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf

Text Generation • Updated Nov 27, 2024 • 14

reacted to mikelabs's post with 👍 7 months ago

Post

1060

LLMs developing theory of mind is wild - were basically teaching AI to understand what other AIs are thinking. Its like robot therapy but for making them better team players 🤖🧠

https://www.aimodels.fyi/papers/arxiv/large-model-strategic-thinking-small-model-efficiency

updated 2 models 7 months ago

blotfaba/Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf

Text Generation • Updated Nov 27, 2024 • 14

blotfaba/Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf

Text Generation • Updated Nov 27, 2024 • 14