1 5

Banerjee

port8080

port8080

AI & ML interests

datasets

Recent Activity

published a dataset about 9 hours ago

port8080/LargeModelDistribution

updated a dataset about 9 hours ago

port8080/LargeModelDistribution

reacted to jsulz's post with 🔥 7 days ago

At https://huggingface.co/xet-team we've been hard at work bringing a new generation of storage to the Hugging Face community, and we’ve crossed some major milestones: 👷 Over 2,000 builders and nearing 100 organizations with access to Xet 🚀 Over 70,000 model and dataset repositories are Xet-backed 🤯 1.4 petabytes managed by Xet As we move repos from LFS to Xet for everyone we onboard, we’re pushing our content-addressed store (CAS). Check out the chart below 👇 of CAS hitting up to 150 Gb/s throughput this past week. All of this growth is helping us build richer insights. We expanded our repo graph, which maps how Xet-backed repositories on the Hub share bytes with each other. Check out the current network in the image below (nodes are repositories, edges are where repos share bytes) and visit the space to see how different versions of Qwen, Llama, and Phi models are grouped together https://huggingface.co/spaces/xet-team/repo-graph Join the waitlist to get access! https://huggingface.co/join/xet

View all activity

Organizations

port8080's activity

published a dataset about 9 hours ago

port8080/LargeModelDistribution

Viewer • Updated about 9 hours ago • 49.3M

updated a dataset about 9 hours ago

port8080/LargeModelDistribution

Viewer • Updated about 9 hours ago • 49.3M

reacted to jsulz's post with 🔥 7 days ago

Post

2351

xet-team we've been hard at work bringing a new generation of storage to the Hugging Face community, and we’ve crossed some major milestones:

👷 Over 2,000 builders and nearing 100 organizations with access to Xet
🚀 Over 70,000 model and dataset repositories are Xet-backed
🤯 1.4 petabytes managed by Xet

As we move repos from LFS to Xet for everyone we onboard, we’re pushing our content-addressed store (CAS). Check out the chart below 👇 of CAS hitting up to 150 Gb/s throughput this past week.

All of this growth is helping us build richer insights. We expanded our repo graph, which maps how Xet-backed repositories on the Hub share bytes with each other.

Check out the current network in the image below (nodes are repositories, edges are where repos share bytes) and visit the space to see how different versions of Qwen, Llama, and Phi models are grouped together xet-team/repo-graph

Join the waitlist to get access! https://huggingface.co/join/xet

reacted to jsulz's post with 🔥 30 days ago

Post

2145

The Llama 4 release - meta-llama/llama-4-67f0c30d9fe03840bc9d0164 - was a big one for the

xet-team with every model backed by the storage infrastructure of the future for the Hub.

It's been a wild few days, and especially 🤯 to see every tensor file with a Xet logo next to it instead of LFS.

The attached graph shows requests per second to our content-addressed store (CAS) right as the release went live.

yellow = GETs; dashed line = launch time.

You can definitely tell when the community started downloading 👀

h/t to @rajatarya for the graph, the entire Xet crew to bring us to this point, and special shoutout to Rajat, @port8080 , @brianronan , @seanses , and @znation who made sure the bytes kept flying all weekend ⚡️

1 reply

reacted to jsulz's post with 🔥 about 1 month ago

Post

3734

Huge week for

xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure.

Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models.

We expect builders on the Hub to see even more improvements, helping power innovation across the community.

With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet.

Thanks to the

meta-llama team for launching on Xet!

upvoted an article about 1 month ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

Apr 5

• 142

published a dataset about 1 month ago

port8080/test-xet-sagemaker

Viewer • Updated Jan 21 • 1 • 14

upvoted an article about 2 months ago

Article

Xet is on the Hub

and 5 others •

Mar 18

• 47

upvoted an article 2 months ago

Article

FastRTC: The Real-Time Communication Library for Python

Feb 25

• 161

updated a dataset 4 months ago

port8080/test-xet-sagemaker

Viewer • Updated Jan 21 • 1 • 14

reacted to jsulz's post with 👍🔥 5 months ago

Post

1444

Doing a lot of benchmarking and visualization work, which means I'm always searching for interesting repos in terms of file types, size, branches, and overall structure.

To help, I built a Space jsulz/repo-info that lets you search for any repo and get back:

- Treemap of the repository, color coded by file/directory size
- Repo branches and their size
- Cumulative size of different file types (e.g., the total size of all the safetensors in the repo)

And because I'm interested in how this will fit in our work to leverage content-defined chunking for versioning repos on the Hub
- https://huggingface.co/blog/from-files-to-chunks - everything has the number of chunks (1 chunk = 64KB) as well as the total size in bytes.

Some of the treemaps are pretty cool. Attached are black-forest-labs/FLUX.1-dev and for fun laion/laion-audio-preview (which has nearly 10k .tar files 🤯)

2 replies

published an article 5 months ago

Article

Rearchitecting Hugging Face Uploads and Downloads

and 2 others •

Nov 26, 2024

• 46

reacted to jsulz's post with 🔥 6 months ago

Post

2968

When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
🚀 Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co/blog/from-files-to-chunks

reacted to erinys's post with 🚀 7 months ago

Post

2208

🌍 Super cool visualization of global PUT requests to Hugging Face over 24 hours, coded by object size, thanks to @port8080 !

We're putting this analysis to work to help us architect a more geo-distributed system for the HF storage backend.

Originally shared on LinkedIn: https://www.linkedin.com/posts/ajitbanerjee_one-of-the-joys-of-working-on-the-xethub-activity-7252688424732614656-tFGD

New activity in xet-team/lfs-analysis 7 months ago

LFS Analysis Roadmap

#3 opened 7 months ago by

jsulz

upvoted an article 7 months ago

Article

Improving Parquet Dedupe on Hugging Face Hub

Oct 5, 2024

• 33

upvoted an article 9 months ago

Article

XetHub is joining Hugging Face!

Aug 8, 2024

• 92