Sylvain Lesage's picture

Sylvain Lesage PRO

severo

AI & ML interests

Dataviz freelance developer. Part-time πŸ€— Hugging Face (dataset viewer).

Recent Activity

updated a dataset about 21 hours ago
severo/trending-repos
updated a dataset 10 days ago
severo/pdf_example
published a dataset 10 days ago
severo/pdf_example
View all activity

Organizations

Hugging Face's profile picture Datasets Maintainers's profile picture geospatial's profile picture Datasets examples's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Hugging Face FineVideo's profile picture Hyperparam's profile picture

severo's activity

upvoted an article 18 days ago
view article
Article

Cohere on Hugging Face Inference Providers πŸ”₯

β€’ 124
reacted to jsulz's post with πŸš€ 18 days ago
view post
Post
973
As xet-team infrastructure begins backing hundreds of repositories on the Hugging Face Hub, we’re getting to put on our researcher hats and peer into the bytes. πŸ‘€ πŸ€“

IMO, one of the most interesting ideas Xet storage introduces is a globally shared store of data.

When you upload a file through Xet, the contents are split into ~64KB chunks and deduplicated, but what if those same chunks already exist in another repo on the Hub?

If we can detect and reuse them, we skip them as well saving time and bandwidth for AI builders. More on how that works here:
πŸ”— https://huggingface.co/blog/from-chunks-to-blocks#scaling-deduplication-with-aggregation

Because of this, different repositories can share bytes we store. That opens up something cool - we can draw a graph of which repos actually share data at the chunk level, where:

- Nodes = repositories
- Edges = shared chunks
- Edge thickness = how much they overlap

xet-team/repo-graph

Come find the many BERT islands. Or see how datasets relate in practice, not just in theory. See how libraries or tasks can tie repositories together. You can play around with node size using storage/likes/downloads too.

The result is a super fun visualization from @saba9 and @znation that I’ve already lost way too much time to. I'm excited to see how the networks grow as we add more repositories!
replied to their post 26 days ago
view reply

"convert CSV to Parquet" :) SEO is good

posted an update 26 days ago