Loubna Ben Allal
loubnabnl
AI & ML interests
SmolLMs, ML for code, data
Recent Activity
updated
a dataset
about 15 hours ago
HuggingFaceTB/stack-edu-prompts-16langs-1k
published
a dataset
about 15 hours ago
HuggingFaceTB/stack-edu-prompts-16langs-1k
published
an
article
1 day ago
SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data
Organizations
loubnabnl's activity

reacted to
clem's
post with 🚀🔥
9 days ago

reacted to
nyuuzyou's
post with 🔥
9 days ago
Post
2903
I recently updated
nyuuzyou/pxhere dataset and it now contains approximately 1.1M CC0 high-resolution images

reacted to
merve's
post with 🔥
12 days ago
Post
2574
Google released MedGemma on I/O'25 👏
google/medgemma-release-680aade845f90bec6a3f60c4
> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go 🤗
they also released a cool demo for scan reading ➡️ google/rad_explain
use with transformers ⤵️
> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go 🤗
they also released a cool demo for scan reading ➡️ google/rad_explain
use with transformers ⤵️

replied to
their
post
14 days ago
it does now :)

reacted to
AdinaY's
post with 🔥🚀
14 days ago
Post
2765
ByteDance is absolutely cooking lately🔥
BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.
ByteDance-Seed/BAGEL-7B-MoT
✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens
BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.
ByteDance-Seed/BAGEL-7B-MoT
✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens

reacted to
sayakpaul's
post with 🔥
14 days ago
Post
1672
Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.
This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code ♥️
We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.
Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.
Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.
We explore several key questions in the work, such as:
Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?
Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.
* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly
We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.
To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.
This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code ♥️
We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.
Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.
Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.
We explore several key questions in the work, such as:
Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?
Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.
* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly
We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.
To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.

posted
an
update
19 days ago
Post
2616
SmolVLM is now available on PocketPal — you can run it offline on your smartphone to interpret the world around you. 🌍📱
And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai
And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai

reacted to
merterbak's
post with 🔥
19 days ago
Post
2277
Qwen 3 technical report released🚀
Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf
Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf

reacted to
albertvillanova's
post with 🔥
19 days ago
Post
2403
New in smolagents v1.16.0:
🔍 Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
🔧 Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
📚 Better docs
👉 https://github.com/huggingface/smolagents/releases/tag/v1.16.0
🔍 Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
🔧 Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
📚 Better docs
👉 https://github.com/huggingface/smolagents/releases/tag/v1.16.0

reacted to
merve's
post with 🔥
19 days ago
Post
2269
New sota open-source depth estimation: Marigold v1-1 🌼
> normal maps, depth maps of scenes & faces prs-eth/marigold-normals prs-eth/marigold
> get albedo (true color) and BRDF (texture) maps of scenes prs-eth/marigold-intrinsics
> they even release a depth-to-3D printer format demo 😮 prs-eth/depth-to-3d-print
All models are here prs-eth/marigold-computer-vision-6669e9e3d3ee30f48214b9ba
> normal maps, depth maps of scenes & faces prs-eth/marigold-normals prs-eth/marigold
> get albedo (true color) and BRDF (texture) maps of scenes prs-eth/marigold-intrinsics
> they even release a depth-to-3D printer format demo 😮 prs-eth/depth-to-3d-print
All models are here prs-eth/marigold-computer-vision-6669e9e3d3ee30f48214b9ba

reacted to
lysandre's
post with ❤️
3 months ago
Post
6917
SmolVLM-2 and SigLIP-2 are now part of
They're added on top of the v4.49.0 release, and can be installed from the following tags:
This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).
Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.
Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
transformers
in dedicated releases!They're added on top of the v4.49.0 release, and can be installed from the following tags:
v4.49.0-SmolVLM-2
and v4.49.0-SigLIP-2
.This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).
Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.
Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.

reacted to
lewtun's
post with 🔥
4 months ago
Post
10394
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!
🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training.
Follow along: https://github.com/huggingface/open-r1
🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training.
Follow along: https://github.com/huggingface/open-r1

reacted to
ginipick's
post with 🔥
5 months ago
Post
4440
🌟 Digital Odyssey: AI Image & Video Generation Platform 🎨
Welcome to our all-in-one AI platform for image and video generation! 🚀
✨ Key Features
🎨 High-quality image generation from text
🎥 Video creation from still images
🌐 Multi-language support with automatic translation
🛠️ Advanced customization options
💫 Unique Advantages
⚡ Fast and accurate results using FLUX.1-dev and Hyper-SD models
🔒 Robust content safety filtering system
🎯 Intuitive user interface
🛠️ Extended toolkit including image upscaling and logo generation
🎮 How to Use
Enter your image or video description
Adjust settings as needed
Click generate
Save and share your results automatically
🔧 Tech Stack
FluxPipeline
Gradio
PyTorch
OpenCV
link: https://huggingface.co/spaces/ginigen/Dokdo
Turn your imagination into reality with AI! ✨
#AI #ImageGeneration #VideoGeneration #MachineLearning #CreativeTech
Welcome to our all-in-one AI platform for image and video generation! 🚀
✨ Key Features
🎨 High-quality image generation from text
🎥 Video creation from still images
🌐 Multi-language support with automatic translation
🛠️ Advanced customization options
💫 Unique Advantages
⚡ Fast and accurate results using FLUX.1-dev and Hyper-SD models
🔒 Robust content safety filtering system
🎯 Intuitive user interface
🛠️ Extended toolkit including image upscaling and logo generation
🎮 How to Use
Enter your image or video description
Adjust settings as needed
Click generate
Save and share your results automatically
🔧 Tech Stack
FluxPipeline
Gradio
PyTorch
OpenCV
link: https://huggingface.co/spaces/ginigen/Dokdo
Turn your imagination into reality with AI! ✨
#AI #ImageGeneration #VideoGeneration #MachineLearning #CreativeTech

reacted to
anton-l's
post with 🚀🔥
6 months ago
Post
2915
Introducing 📐𝐅𝐢𝐧𝐞𝐌𝐚𝐭𝐡: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath
Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.
We build the dataset by:
🛠️ carefully extracting math data from Common Crawl;
🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.
We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.
We hope this helps advance the performance of LLMs on math and reasoning! 🚀
We’re also releasing all the ablation models as well as the evaluation code.
HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
HuggingFaceTB/finemath
Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.
We build the dataset by:
🛠️ carefully extracting math data from Common Crawl;
🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.
We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.
We hope this helps advance the performance of LLMs on math and reasoning! 🚀
We’re also releasing all the ablation models as well as the evaluation code.
HuggingFaceTB/finemath-6763fb8f71b6439b653482c2

reacted to
julien-c's
post with 🔥❤️🤗
6 months ago
Post
10767
After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub
TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
docs: https://huggingface.co/docs/hub/storage-limits
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥
cc: @reach-vb @pierric @victor and the HF team
TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
docs: https://huggingface.co/docs/hub/storage-limits
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥
cc: @reach-vb @pierric @victor and the HF team