I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions. The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).
I shared my view on Qwen vs DeepSeek (student vs genius), and I forgot to mention this: they are neighbors in the same city. https://en.wikipedia.org/wiki/Hangzhou
reacted to smirki's
post with πabout 1 hour ago
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.
Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)
Has anyone tried parallel QLoRa and merge before?
I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".
I want to release some cool stuff when I have the time: - how an answer to a single question changes over time, with each training round or day - a chart to show AHA alignment over training rounds
We open-sourced the pruna package that can be easily installed with pip install pruna :) It allows to easily ccompress and evaluate AI models including transformers and diffusers.
With open-sourcing, people can now inspect and contribute to the open code. Beyond the code, we provide detailed readme, tutorials, benchmarks, and documentation to make transparent compression, evaluation, and saving/loading/serving of AI models.
Happy to share it with you and always interested in collecting your feedback :)
This time, I have mapped and contributed to https://www.openstreetmap.org more than 100 swimming pools around my wife's hometown. Only took about 20min to find them all (+~3 min verification) in a free Colab GPUπ
We are happy to release the OpenPII English Anonymiser βthe most powerful open-source tool for redacting sensitive info from English text.
Fine-tuned Modernbert on 5.7 million+ PII examples, itβs clocking 99%+ accuracy across emails, dates, social numbers, and more!
Why itβs a big deal: β Top-tier precision: 100% for passport numbers, 99.96% for emails*. β Totally free: MIT license for personal or commercial use. β No secrets: Full metrics shared on Hugging Face.
Nvidia brings blue (from starwars droids) to life π€―, supercute with flawless dexterity and droid voice. It's the result of their colab research with Google DeepMind and Disney, revealed as part of their new opensource physics engine for robotics simulation: NEWTON - which enables robots to learn how to complete complex tasks with greater precision.
Introducing π OneSQL-v0.1π₯³, our first text-to-SQL model based on Qwen2.5-Coder. This model has achieved an EX score of 63.33 on the BIRD leaderboard (https://bird-bench.github.io/).
My goal is to make OneSQL the most usable open-weights model for text-to-SQL. I'm currently working on best practices to help users use this model the right away and avoid pitfalls. After that, I plan to train the next version to push for a higher EX score.
Enjoy this model and feel free to share comments/questions π€
At Rapidata, we compared DeepL with LLMs like DeepSeek-R1, Llama, and Mixtral for translation quality using feedback from over 51,000 native speakers. Despite the costs, the performance makes it a valuable investment, especially in critical applications where translation quality is paramount. Now we can say that Europe is more than imposing regulations.
Our dataset, based on these comparisons, is now available on Hugging Face. This might be useful for anyone working on AI translation or language model evaluation.