513 35 122

Loubna Ben Allal

loubnabnl

https://loubnabnl.github.io/

AI & ML interests

SmolLMs, ML for code, data

Recent Activity

updated a dataset about 15 hours ago

HuggingFaceTB/stack-edu-prompts-16langs-1k

published a dataset about 15 hours ago

HuggingFaceTB/stack-edu-prompts-16langs-1k

published an article 1 day ago

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

View all activity

Organizations

loubnabnl's activity

updated a dataset about 15 hours ago

HuggingFaceTB/stack-edu-prompts-16langs-1k

Viewer • Updated about 15 hours ago • 1k

published a dataset about 15 hours ago

HuggingFaceTB/stack-edu-prompts-16langs-1k

Viewer • Updated about 15 hours ago • 1k

published an article 1 day ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

and 8 others •

1 day ago

• 47

upvoted an article 7 days ago

Article

CodeAgents + Structure: A Better Way to Execute Actions

and 1 other •

May 28, 2024

• 37

upvoted a paper 8 days ago

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published 12 days ago • 76

reacted to clem's post with 🚀🔥 9 days ago

Post

3439

Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!

25 replies

reacted to nyuuzyou's post with 🔥 9 days ago

Post

2903

I recently updated nyuuzyou/pxhere dataset and it now contains approximately 1.1M CC0 high-resolution images

published a dataset 12 days ago

HuggingFaceTB/bisac_topics_expanded_2

Viewer • Updated Apr 9, 2024 • 4.41k • 51

upvoted 3 changelogs 12 days ago

Changelog

Static Spaces can now have a build step

12 days ago

• 82

Changelog

Xet is now the default storage option for new users and organizations

12 days ago

• 53

Changelog

AI-generated Abstract summaries on Hugging Face Papers

13 days ago

• 65

reacted to merve's post with 🔥 12 days ago

Post

2574

Google released MedGemma on I/O'25 👏 google/medgemma-release-680aade845f90bec6a3f60c4

> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go 🤗

they also released a cool demo for scan reading ➡️ google/rad_explain

use with transformers ⤵️

1 reply

replied to their post 14 days ago

it does now :)

reacted to AdinaY's post with 🔥🚀 14 days ago

Post

2765

ByteDance is absolutely cooking lately🔥

BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens

reacted to sayakpaul's post with 🔥 14 days ago

Post

1672

Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.

This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code ♥️

We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.

Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.

Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.

We explore several key questions in the work, such as:

Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?

Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.

* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly

We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.

To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.

liked a Space 14 days ago

7.64k

DeepSite

🐳

Generate any application with DeepSeek

liked 2 datasets 16 days ago

PrimeIntellect/INTELLECT-2-RL-Dataset

Viewer • Updated 22 days ago • 285k • 2.45k • 61

openbmb/Ultra-FineWeb

Updated 26 days ago • 21.3k • 136