BigLAM: BigScience Libraries, Archives and Museums

non-profit

https://github.com/bigscience-workshop/lam

Activity Feed Request to join this org

AI & ML interests

🤗 Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.

Recent Activity

christopher new activity 12 days ago

biglam/archives-parlementaires-revolution-francaise:[bot] Conversion to Parquet

marianna13 authored a paper 24 days ago

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

marianna13 authored a paper 24 days ago

DataComp-LM: In search of the next generation of training sets for language models

View all activity

albertvillanova

posted an update 6 days ago

Post

1428

🚀 SmolAgents v1.19.0 is live!
This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:

🔧 Agent Upgrades
- Support for managed agents in ToolCallingAgent
- Context manager support for cleaner agent lifecycle handling
- Output formatting now uses XML tags for consistency

🖥️ UI Enhancements
- GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.

🔄 Streaming Refactor
- Streaming event aggregation moved off the Model class
- ➡️ Better architecture & maintainability

📦 Output Tracking
- CodeAgent outputs are now stored in ActionStep
- ✅ More visibility and structure to agent decisions

🐛 Bug Fixes
- Smarter planning logic
- Cleaner Docker logs
- Better prompt formatting for additional_args
- Safer internal functions and final answer matching

📚 Docs Improvements
- Added quickstart examples with tool usage
- One-click Colab launch buttons
- Expanded reference docs (AgentMemory, GradioUI docstrings)
- Fixed broken links and migrated to .md format

🔗 Full release notes:
https://github.com/huggingface/smolagents/releases/tag/v1.19.0

💬 Try it out, explore the new features, and let us know what you build!

#smolagents #opensource #AIagents #LLM #HuggingFace

louisbrulenaudet

posted an update 11 days ago

Post

963

🌐 Clinical Trials Dataset now available on Hugging Face! 🧬

I’ve just released a comprehensive, ML-ready dataset featuring 500,000+ clinical trial records sourced directly from ClinicalTrials.gov for biomedical NLP, healthcare analytics, and clinical research applications 🤗

I wanted to produce the most complete and up-to-date dump with all raw data partially flattened to simplify extraction, self-querying and processing.

Do you have any ideas about what we can do with it? Using descriptions to enhance specialized embedding models?

louisbrulenaudet/clinical-trials

christopher

in biglam/archives-parlementaires-revolution-francaise 12 days ago

[bot] Conversion to Parquet

#2 opened 12 days ago by

parquet-converter

davanstrien

posted an update 22 days ago

Post

2831

Inspired by Hugging Face's official MCP server, I've developed a complementary tool that exposes my semantic search API to enhance discovery across the HF platform.

Key capabilities:

- AI-powered semantic search for models and datasets
- Parameter count analysis via safetensors metadata
- Trending content discovery
- Find similar models/datasets functionality
- 11 tools total for enhanced ecosystem navigation

The semantic search goes beyond simple keyword matching, understanding context and relationships between different models and datasets.

Example query: "Find around 10 reasoning Hugging Face datasets published in 2025 focusing on topics other than maths and science. Show a link and a short summary for each dataset." (results in video!)

https://github.com/davanstrien/hub-semantic-search-mcp

marianna13

authored 4 papers 24 days ago

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Paper • 2406.02061 • Published Jun 4, 2024 • 1

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 53

OpenThoughts: Data Recipes for Reasoning Models

Paper • 2506.04178 • Published 26 days ago • 41

Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets

Paper • 2506.04598 • Published 26 days ago • 5

storytracer

authored a paper 25 days ago

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Paper • 2506.05209 • Published 25 days ago • 42

alielfilali01

authored a paper about 1 month ago

Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi

Paper • 2504.06011 • Published Apr 8 • 1

albertvillanova

posted an update about 1 month ago

Post

627

New in smolagents v1.17.0:
- Structured generation in CodeAgent 🧱
- Streamable HTTP MCP support 🌐
- Agent.run() returns rich RunResult 📦

Smarter agents, smoother workflows.
Try it now: https://github.com/huggingface/smolagents/releases/tag/v1.17.0

Zaid

authored 5 papers about 1 month ago

Masader: Metadata Sourcing for Arabic Text and Speech Data Resources

Paper • 2110.06744 • Published Oct 13, 2021

Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic

Paper • 2412.04277 • Published Dec 5, 2024

Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training

Paper • 2410.20796 • Published Oct 28, 2024

Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep Learning Approaches

Paper • 2307.06218 • Published Jul 12, 2023

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs

Paper • 2505.19800 • Published May 26 • 1

davanstrien

in biglam/newspaper-navigator about 1 month ago

update paths to biglam/newspaper-navigator

#2 opened about 1 month ago by

davanstrien

albertvillanova

posted an update about 2 months ago

Post

2484

New in smolagents v1.16.0:
🔍 Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
🔧 Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
📚 Better docs

👉 https://github.com/huggingface/smolagents/releases/tag/v1.16.0

christopher

in biglam/loc_beyond_words about 2 months ago

[bot] Conversion to Parquet

#4 opened about 2 months ago by

parquet-converter

davanstrien

published a model about 2 months ago

biglam/historic-newspaper-illustrations-yolov11

Object Detection • Updated May 8 • 10

AI & ML interests

Recent Activity

Team members 55

biglam's activity

[bot] Conversion to Parquet

update paths to biglam/newspaper-navigator

[bot] Conversion to Parquet