BigScience Catalogue Data

non-profit

https://bigscience.huggingface.co

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

Suzana authored a paper 9 days ago

Deep learning for affective computing: text-based emotion recognition in decision support

Suzana authored a paper 9 days ago

Deep contextualized word representations for detecting sarcasm and irony

afaji authored a paper 24 days ago

Crosslingual Reasoning through Test-Time Scaling

View all activity

bigscience-catalogue-data's activity

albertvillanova

posted an update 9 days ago

Post

397

New in smolagents v1.17.0:
- Structured generation in CodeAgent 🧱
- Streamable HTTP MCP support 🌐
- Agent.run() returns rich RunResult 📦

Smarter agents, smoother workflows.
Try it now: https://github.com/huggingface/smolagents/releases/tag/v1.17.0

Suzana

authored 2 papers 9 days ago

Deep learning for affective computing: text-based emotion recognition in decision support

Paper • 1803.06397 • Published Mar 16, 2018

Deep contextualized word representations for detecting sarcasm and irony

Paper • 1809.09795 • Published Sep 26, 2018

loubnabnl

posted an update 20 days ago

Post

2647

SmolVLM is now available on PocketPal — you can run it offline on your smartphone to interpret the world around you. 🌍📱

And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai

3 replies

albertvillanova

posted an update 20 days ago

Post

2408

New in smolagents v1.16.0:
🔍 Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
🔧 Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
📚 Better docs

👉 https://github.com/huggingface/smolagents/releases/tag/v1.16.0

afaji

authored a paper 24 days ago

Crosslingual Reasoning through Test-Time Scaling

Paper • 2505.05408 • Published 28 days ago • 8

afaji

authored a paper about 1 month ago

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29 • 31

Shayne

authored a paper about 1 month ago

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 70

davanstrien

posted an update about 1 month ago

Post

2212

Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)

meg

posted an update about 1 month ago

Post

2558

New launch: See the energy use of chatbot conversations, in real time. =)
jdelavande/chat-ui-energy
Great work from @JulienDelavande !

albertvillanova

posted an update about 1 month ago

Post

2737

smolagents v1.14.0 is out! 🚀
🔌 MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable.
🪨 Amazon Bedrock: Native support for Bedrock-hosted models.
SmolAgents is now more powerful, flexible, and enterprise-ready. 💼

Full release 👉 https://github.com/huggingface/smolagents/releases/tag/v1.14.0
#smolagents #LLM #AgenticAI

yjernite

posted an update about 2 months ago

Post

3325

Today in Privacy & AI Tooling - introducing a nifty new tool to examine where data goes in open-source apps on 🤗

HF Spaces have tons (100Ks!) of cool demos leveraging or examining AI systems - and because most of them are OSS we can see exactly how they handle user data 📚🔍

That requires actually reading the code though, which isn't always easy or quick! Good news: code LMs have gotten pretty good at automatic review, so we can offload some of the work - here I'm using Qwen/Qwen2.5-Coder-32B-Instruct to generate reports and it works pretty OK 🙌

The app works in three stages:
1. Download all code files
2. Use the Code LM to generate a detailed report pointing to code where data is transferred/(AI-)processed (screen 1)
3. Summarize the app's main functionality and data journeys (screen 2)
4. Build a Privacy TLDR with those inputs

It comes with a bunch of pre-reviewed apps/Spaces, great to see how many process data locally or through (private) HF endpoints 🤗

Note that this is a POC, lots of exciting work to do to make it more robust, so:
- try it: yjernite/space-privacy
- reach out to collab: yjernite/space-privacy

davanstrien

posted an update about 2 months ago

Post

1689

I've created a v1 dataset ( davanstrien/reasoning-required) and model ( davanstrien/ModernBERT-based-Reasoning-Required) to help curate "wild text" data for generating reasoning examples beyond the usual code/math/science domains.

- I developed a "Reasoning Required" dataset with a 0-4 scoring system for reasoning complexity
- I used educational content from HuggingFaceFW/fineweb-edu, adding annotations for domains, reasoning types, and example questions

My approach enables a more efficient workflow: filter text with small models first, then use LLMs only on high-value content.

This significantly reduces computation costs while expanding reasoning dataset domain coverage.

lvwerra

authored a paper about 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 188

loubnabnl

authored a paper about 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 188

ola13

authored a paper 2 months ago

Command A: An Enterprise-Ready Large Language Model

Paper • 2504.00698 • Published Apr 1 • 26

sasha

authored a paper 2 months ago

Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published Feb 4 • 34

meg

authored a paper 2 months ago

Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published Feb 4 • 34

afaji

authored a paper 3 months ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10 • 98

albertvillanova

posted an update 3 months ago

Post

4101

🚀 New smolagents update: Safer Local Python Execution! 🦾🐍

With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. 🔒

Here's why this matters & what you need to know! 🧵👇

1️⃣ Why is local execution risky? ⚠️
AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.

2️⃣ New Safety Layer in smolagents 🛡️
We now inspect every return value during execution:
✅ Allowed: Safe built-in types (e.g., numbers, strings, lists)
⛔ Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)

3️⃣ Immediate Benefits 💡
- Prevent agents from accessing unsafe builtins
- Block unauthorized file or network access
- Reduce accidental security vulnerabilities

4️⃣ Security Disclaimer ⚠️
🚨 Despite these improvements, local Python execution is NEVER 100% safe. 🚨
If you need true isolation, use a remote sandboxed executor like Docker or E2B.

5️⃣ The Best Practice: Use Sandboxed Execution 🔐
For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.

6️⃣ Upgrade Now & Stay Safe! 🚀
Check out the latest smolagents release and start building safer AI agents today.

🔗 https://github.com/huggingface/smolagents

What security measures do you take when running AI-generated code? Let’s discuss! 👇

#AI #smolagents #Python #Security

2 replies

AI & ML interests

Recent Activity

Team members 45

bigscience-catalogue-data's activity