Andrea Soria

asoria

AI & ML interests

Maintainer of πŸ€—Datasets: Data processing

Recent Activity

liked a dataset 1 day ago
pixparse/pdfa-eng-wds
updated a dataset 5 days ago
asoria/big_pdf
published a dataset 5 days ago
asoria/big_pdf
View all activity

Organizations

Hugging Face's profile picture BigScience Data's profile picture Datasets Maintainers's profile picture Blog-explorers's profile picture Enterprise Explorers's profile picture ZeroGPU Explorers's profile picture Datasets examples's profile picture Women on Hugging Face's profile picture Dev Mode Explorers's profile picture Hugging Face Discord Community's profile picture AI Developers from Latin America's profile picture Datasets Topics's profile picture AI Starter Pack's profile picture Hugging Face MCP Course's profile picture

asoria's activity

upvoted an article 7 months ago
upvoted 7 articles 8 months ago
view article
Article

LoRA training scripts of the world, unite!

β€’ 60
view article
Article

Improving Parquet Dedupe on Hugging Face Hub

By yuchenglow and 1 other β€’
β€’ 33
view article
Article

Introducing BERTopic Integration with Hugging Face Hub

By davanstrien and 1 other β€’
β€’ 9
view article
Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

By Leyo and 2 others β€’
β€’ 180
upvoted 2 articles 9 months ago
view article
Article

Fine-Tuning Gemma Models in Hugging Face

By svaibhav and 3 others β€’
β€’ 33
upvoted 4 articles 10 months ago
view article
Article

SmolLM - blazingly fast and remarkably powerful

By loubnabnl and 2 others β€’
β€’ 374
view article
Article

Docmatix - a huge dataset for Document Visual Question Answering

By andito and 1 other β€’
β€’ 73
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

By loubnabnl and 2 others β€’
β€’ 94
view article
Article

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

By evijit and 9 others β€’
β€’ 34
upvoted 2 articles 11 months ago
view article
Article

Experimenting with Automatic PII Detection on the Hub using Presidio

By lhoestq and 3 others β€’
β€’ 24
view article
Article

Announcing New Dataset Search Features

By lhoestq and 2 others β€’
β€’ 23
upvoted an article 12 months ago
view article
Article

How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o

By chilijung β€’
β€’ 11
upvoted an article about 1 year ago
view article
Article

Synthetic dataset generation techniques: generating custom sentence similarity data

By davanstrien β€’
β€’ 16