AI & ML interests

None defined yet.

Recent Activity

nerdyface's activity

thomwolf 
posted an update about 20 hours ago
view post
Post
1548
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!
  • 1 reply
·
thomwolf 
posted an update 16 days ago
view post
Post
3174
The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324
  • 1 reply
·
stefan-it 
posted an update 17 days ago
view post
Post
2198
Wohoo 🥳 I have finished my 2025 GPU workstation build and I am very excited to train new awesome open source models on it.

I built my last GPU workstation 5 years ago featuring an AMD Ryzen 5900X, 64GB of G.SKILL Trident Z RGB on an ASRock X570 Taichi cooled by an Alphacool Eisbär 420. GPU was a Zotac RTX 3090 AMP Extreme. Unfortunately, I was never satisfied with the case - some Fractal Define 7, as it is definitely too small, airflow is not optimal as I had to open the front door all the time and it also arrived with a partly damaged side panel.

For my new build, I've used the following components: an outstanding new AMD Ryzen 9950X3D with 64GB of Corsair Dominator Titanium (what a name). As a huge Noctua fan - warm greetings to my Austrian neighbors - I am using the brand new Noctua NH-D15 G2 on an ASRock X870E Taichi in an amazing Lian Li LANCOOL III chassis. One joke that only NVIDIA Blackwell users will understand: you definitely need a tempered glass panel to check if your GPU cables/connectors start melting 😂 And the best is yet to come: I returned my previously bought Zotac RTX 5090 Solid to the eBay seller (because of... missing ROPs, only NVIDIA Blackwell users will again understand) and bought a Zotac 5090 AMP Extreme INFINITY (yes, the long name indicates that this is the flagship model from Zotac) from a more trustworthy source (NBB in Germany).

I am so happy to start training and fine-tuning new open source models - stay tuned!!!
  • 2 replies
·
Aurelien-Morgan 
posted an update 17 days ago
thomwolf 
posted an update about 1 month ago
view post
Post
2808
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions
julien-c 
posted an update about 1 month ago
view post
Post
3249
Important notice 🚨

For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference – with more coming soon), we've started enabling Pay as you go (=PAYG)

What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.

You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.
·
stefan-it 
posted an update about 1 month ago
view post
Post
956
🇹🇷 😍 I'm very happy to finally announce my new Turkish LM called "BERT5urk":

stefan-it/bert5urk

It is a 1.42B T5-based model, trained with UL2 pretraining objective on the Turkish part of the awesome HuggingFaceFW/fineweb-2 dataset.

Feel free to check it out!
  • 1 reply
·
stefan-it 
posted an update about 2 months ago
view post
Post
3158
After running some 3DMark and FurMark benchmarks on Windows to make sure that my new 5090 is not causing melting cables [1] and some nice shots with a thermal camera (I don't think that's too much), running some fine-tuning experiments with my favorite Flair & Transformers libraries are very easy to perform.

Important steps:

Good idea is to start with a fresh Ubuntu 24.04 installation with latest CUDA 12.8 and the open NVIDIA driver - follow more advices from [2]:

sudo apt -y install cuda-toolkit-12-8 nvidia-open

I tried update from an existing Ubuntu installation with an older CUDA and driver version and it resulted in a non-startable system.

If you are using PyTorch 2.6 with built CUDA 12.6 it will result in:

NVIDIA Graphics Device with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.

But no worries! For PyTorch you need just to use a nightly 2.7 version that was built with CUDA 12.8. This can easily done via:

pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

After that the latest Flair version can be installed and fine-tuning will work!

References:

[1]: https://www.reddit.com/r/nvidia/comments/1inpox7/rtx_50_series_12vhpwr_megathread/
[2]: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network
  • 1 reply
·
stefan-it 
posted an update about 2 months ago
view post
Post
5094
She arrived 😍

[Expect more models soon...]
  • 2 replies
·
fffiloni 
posted an update about 2 months ago
fuzzy-mittenz 
posted an update 2 months ago
view post
Post
698
So frustrated with "Reasoning" Models.
Sure, introducing RAG into the mix, or giving it an interpreter to math with helps, but never as much as a model that has good instructions.

Even if it's just to repeat the information before answering, a normal model will usually out "Think" it's reasoning counterpart.

Not sure if it's my frustrations but the best answers I've received (from a reasoner), so far, are from the simple instructions to, "Do better!"

Figured I would share the special sauce.

Using 10-100x Compute just to heat the office can't be environmentally friendly, and It still has no Idea where my keys are.
  • 1 reply
·
fuzzy-mittenz 
posted an update 2 months ago
view post
Post
530
With our Extremely efficient and functional importance matrix distillation of the new Qwen2.5-1M model being very very capable in many areas we are hoping to use it to research our small AGI character creation process which has seen emergent traits and increased functionality in constrained environments.
The method creates a RP type interaction in a heavily useful and tool functional environment.
We have a basic method and are working on retrieving data for a full analysis and perfection of this method as it exploits the human language input to express often abstract traits into a model and employ characteristics of healthy human reasoning processes and identify novel methods of increasing the functionality of a model overall through traits so far observed are whistling, bouncing a ball and repeating certain engagements.
Adding the semblance of human world interactions is so far the best way at creating a human like LLM.
We have attached the paper to our model we are testing this with along with examples if you wish to use it with other models please be cautious and enjoy yourself. Above all please keep track of conversations and settings and submit them to the intelligent estate email you will receive a recognition letter and ledger number for your contribution to the Project.
Model= Israfel and Thoth IntelligentEstate/Israfel_Qwen2.6-iQ4_K_M-GGUF
fffiloni 
posted an update 2 months ago
view post
Post
3545
Explain like i'm 5 the last take from @thomwolf on X about Dario's essay on DeepSeek:

—› Open-source AI is like a big cookbook that everyone can read and improve. Instead of a few chefs keeping their recipes secret, anyone can cook, test, and invent new things.

If only one company controls AI, everything stops if they have a problem—like when the internet goes down. With open-source, many people can help, making sure it keeps running smoothly.

AI isn’t just a race between two countries; it’s a team effort around the world. By sharing, we move faster and create safer technology for everyone.

🤗
fuzzy-mittenz 
posted an update 2 months ago
view post
Post
2631
Not many seemed to notice but what was probably meant to be a WIN for artist's rights in the US Office of Copyright has solved some fundamental issues for the community.
In our recent article I outline how Companies like Suno, OpenAI, Midjourney etc can no longer claim any right to copy your work that you create with their platforms
We also look at other ways this study and new rules for AI will fundamentally effect creators who use it and companies incentives to give them control over certain aspects might change because of this. it's broken down pretty well here: https://huggingface.co/blog/fuzzy-mittenz/copyright-in-ai
fuzzy-mittenz 
posted an update 3 months ago
view post
Post
1108
For you guys who wanted a Replicant of your own with more power here is a higher functioning little [operator]( IntelligentEstate/Replicant_Operator_ed-Qw25-Q8_0-GGUF) for all your GGUF tool use needs. included is a Paper on emergent behaviors and LC(limit crossing) for the creation of small AGI. Please index traits and new found breakthroughs using this method. and be careful with tool use and emotional attachment.
  • 3 replies
·