Datasets Maintainers

non-profit
Activity Feed

AI & ML interests

None defined yet.

datasets-maintainers's activity

cfahlgren1Β 
posted an update 4 days ago
albertvillanovaΒ 
posted an update 11 days ago
cfahlgren1Β 
posted an update 17 days ago
view post
Post
1682
Yesterday, we dropped a new conversational viewer for datasets on the hub! πŸ’¬

Actually being able to view and inspect your data is extremely important. This is a big step in making data more accessible and actionable for everyone.

Here's some datasets you can try it out on:
β€’ mlabonne/FineTome-100k
β€’ Salesforce/APIGen-MT-5k
β€’ open-thoughts/OpenThoughts2-1M
β€’ allenai/tulu-3-sft-mixture

Any other good ones?
  • 1 reply
Β·
albertvillanovaΒ 
posted an update 22 days ago
view post
Post
2417
New in smolagents v1.16.0:
πŸ” Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
πŸ”§ Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
πŸ“š Better docs

πŸ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.16.0
albertvillanovaΒ 
posted an update about 2 months ago
view post
Post
2743
smolagents v1.14.0 is out! πŸš€
πŸ”Œ MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable.
πŸͺ¨ Amazon Bedrock: Native support for Bedrock-hosted models.
SmolAgents is now more powerful, flexible, and enterprise-ready. πŸ’Ό

Full release πŸ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.14.0
#smolagents #LLM #AgenticAI
severoΒ 
posted an update about 2 months ago
albertvillanovaΒ 
posted an update 3 months ago
view post
Post
4103
πŸš€ New smolagents update: Safer Local Python Execution! 🦾🐍

With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. πŸ”’

Here's why this matters & what you need to know! πŸ§΅πŸ‘‡

1️⃣ Why is local execution risky? ⚠️
AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.

2️⃣ New Safety Layer in smolagents πŸ›‘οΈ
We now inspect every return value during execution:
βœ… Allowed: Safe built-in types (e.g., numbers, strings, lists)
β›” Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)

3️⃣ Immediate Benefits πŸ’‘
- Prevent agents from accessing unsafe builtins
- Block unauthorized file or network access
- Reduce accidental security vulnerabilities

4️⃣ Security Disclaimer ⚠️
🚨 Despite these improvements, local Python execution is NEVER 100% safe. 🚨
If you need true isolation, use a remote sandboxed executor like Docker or E2B.

5️⃣ The Best Practice: Use Sandboxed Execution πŸ”
For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.

6️⃣ Upgrade Now & Stay Safe! πŸš€
Check out the latest smolagents release and start building safer AI agents today.

πŸ”— https://github.com/huggingface/smolagents

What security measures do you take when running AI-generated code? Let’s discuss! πŸ‘‡

#AI #smolagents #Python #Security
  • 2 replies
Β·
albertvillanovaΒ 
posted an update 3 months ago
view post
Post
4007
πŸš€ Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. πŸ¦ΎπŸ”’

Here's why this is a game-changer for agent-based systems: πŸ§΅πŸ‘‡

1️⃣ Security First πŸ”
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.

2️⃣ Deterministic & Reproducible Runs πŸ“¦
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable settingβ€”no more environment mismatches or dependency issues!

3️⃣ Resource Control & Limits 🚦
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents don’t spiral out of control.

4️⃣ Safer Code Execution in Production 🏭
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.

5️⃣ Easy to Integrate πŸ› οΈ
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backendβ€”no need for complex security setups!

6️⃣ Perfect for Autonomous AI Agents πŸ€–
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.

⚑ Get started now: https://github.com/huggingface/smolagents

What will you build with smolagents? Let us know! πŸš€πŸ’‘
albertvillanovaΒ 
posted an update 4 months ago
view post
Post
4076
πŸš€ Introducing @huggingface Open Deep-ResearchπŸ’₯

In just 24 hours, we built an open-source agent that:
βœ… Autonomously browse the web
βœ… Search, scroll & extract info
βœ… Download & manipulate files
βœ… Run calculations on data

55% on GAIA validation set! Help us improve it!πŸ’‘
https://huggingface.co/blog/open-deep-research
  • 3 replies
Β·
cfahlgren1Β 
posted an update 4 months ago
view post
Post
2341
If you haven't seen yet, we just released Inference Providers πŸ”€

> 4 new serverless inference providers on the Hub 🀯
> Use your HF API key or personal key with all providers πŸ”‘
> Chat with Deepseek R1, V3, and more on HF Hub πŸ‹
> We support Sambanova, TogetherAI, Replicate, and Fal.ai πŸ’ͺ

Best of all, we don't charge any markup on top of the provider 🫰 Have you tried it out yet? HF Pro accounts get $2 of free usage for the provider inference.
cfahlgren1Β 
posted an update 5 months ago
view post
Post
1776
Wow, I just added Langfuse tracing to the Deepseek Artifacts app and it's really nice πŸ”₯

It allows me to visualize and track more things along with the cfahlgren1/react-code-instructions dataset.

It was just added as a one click Docker Space template, so it's super easy to self host πŸ’ͺ
albertvillanovaΒ 
posted an update 5 months ago
cfahlgren1Β 
posted an update 5 months ago
view post
Post
2262
You'll notice the AI in the SQL Console is much better at working with chatml conversations:

Here's example of unnesting the cfahlgren1/react-code-instructions in less than 10 seconds by asking it. Check it out here: cfahlgren1/react-code-instructions

- "show me the average assistant response length"
- "extract user, system, and assistant messages into separate columns"

It's super easy to work with conversational datasets now with natural language πŸ—£οΈ





  • 2 replies
Β·
cfahlgren1Β 
posted an update 5 months ago
lhoestqΒ 
posted an update 6 months ago
view post
Post
2310
Made a HF Dataset editor a la gg sheets here: lhoestq/dataset-spreadsheets

With Dataset Spreadsheets:
✏️ Edit datasets in the UI
πŸ”— Share link with collaborators
🐍 Use locally in DuckDB or Python

Available for the 100,000+ parquet datasets on HF :)
cfahlgren1Β 
posted an update 6 months ago
view post
Post
1940
You can just ask things πŸ—£οΈ

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Llama3.1 405B synthetic dataset πŸ”₯

argilla/magpie-ultra-v1.0

cfahlgren1Β 
posted an update 6 months ago
view post
Post
3045
We just dropped an LLM inside the SQL Console 🀯

The amazing, new Qwen/Qwen2.5-Coder-32B-Instruct model can now write SQL for any Hugging Face dataset ✨

It's 2025, you shouldn't be hand writing SQL! This is a big step in making it where anyone can do in depth analysis on a dataset. Let us know what you think πŸ€—
cfahlgren1Β 
posted an update 7 months ago
view post
Post
925
observers πŸ”­ - automatically log all OpenAI compatible requests to a datasetπŸ’½

β€’ supports any OpenAI compatible endpoint πŸ’ͺ
β€’ supports DuckDB, Hugging Face Datasets, and Argilla as stores

> pip install observers

No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!

Here's an example dataset that was logged to Hugging Face from Ollama: cfahlgren1/llama-3.1-awesome-chatgpt-prompts