π Celebrating One Year of #SauerkrautLM with Two Groundbreaking Releases!
We're thrilled to announce the release of SauerkrautLM-v2-14b in two specialized versions: VAGOsolutions/SauerkrautLM-v2-14b-SFT and VAGOsolutions/SauerkrautLM-v2-14b-DPO. Built on the robust Qwen2.5-14B foundation, these models represent a significant leap forward in multilingual AI capabilities.
π¬ Technical Breakthroughs: π Innovative three-phase Fine-Tuning approach π Two-step Spectrum SFT + one-step Spectrum DPO optimization phase for enhanced performance π Balance of German and English language capabilities π Advanced function calling - almost on par with Claude-3.5-Sonnet-20240620
π Training Innovation: Our three-phase approach targeted specific layer percentages (15%, 20% and 25%) with carefully curated datasets, including: π Mathematics-focused content (proprietary classifier-selected) π High-quality German training data π Specialized function calling datasets π Premium multilingual content
π Community Contribution: We're also releasing two new datasets in a few days: 1οΈβ£ SauerkrautLM-Fermented-GER-DPO: 3,300 high-quality German training samples 2οΈβ£ SauerkrautLM-Fermented-Irrelevance-GER-DPO: 2,000 specialized samples for optimized function call irrelevance handling
Thank you to our incredible community and partners who have supported us throughout this journey. Here's to another year of AI innovation!Β π
reacted to reach-vb's
post with π₯π9 days ago
> Trained with 1.3 trillion (dolma 1.7) tokens on 16 nodes, each with 4 MI250 GPUs
> Three checkpoints:
- AMD OLMo 1B: Pre-trained model - AMD OLMo 1B SFT: Supervised fine-tuned on Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets - AMD OLMo 1B SFT DPO: Aligned with human preferences using Direct Preference Optimization (DPO) on UltraFeedback dataset
Key Insights: > Pre-trained with less than half the tokens of OLMo-1B > Post-training steps include two-phase SFT and DPO alignment > Data for SFT: - Phase 1: Tulu V2 - Phase 2: OpenHermes-2.5, WebInstructSub, and Code-Feedback
> Model checkpoints on the Hub & Integrated with Transformers β‘οΈ
Congratulations & kudos to AMD on a brilliant smol model release! π€
Did you guys know that if you try to link a prepaid card to huggingface it won't work, but then if you press the button again it links anyway? Then you can lock the card (deny any charges), and get resources for free? You're welcome :P
4 replies
Β·
reacted to merve's
post with β€οΈπ₯12 days ago
Another great week in open ML! Here's a small recap π«°π»
Model releases β―οΈ Video Language Models AI at Meta released Vision-CAIR/LongVU_Qwen2_7B, a new state-of-the-art long video LM model based on DINOv2, SigLIP, Qwen2 and Llama 3.2
π¬ Small language models Hugging Face released HuggingFaceTB/SmolLM2-1.7B, a family of new smol language models with Apache 2.0 license that come in sizes 135M, 360M and 1.7B, along with datasets. Meta released facebook/MobileLLM-1B, a new family of on-device LLMs of sizes 125M, 350M and 600M
πΌοΈπ¬Any-to-Any gpt-omni/mini-omni2 is closest reproduction to GPT-4o, a new LLM that can take image-text-audio input and output speech is released!
Dataset releases πΌοΈ Spawning/PD12M, a new captioning dataset of 12.4 million examples generated using Florence-2
π€― Plot twist: Size isn't everything in AI! A lean 32B parameter model just showed up to the party and outperformed a 70B one. Efficiency > Scale? The AI world just got more interesting...
Cohere For AI released Aya Expanse, a new family of multilingual models (8B and 32B) spanning 23 popular languages.
Lotus πͺ· is a new foundation model on monocular depth estimation β¨ Compared to previous diffusion-based MDE models, Lotus is modified for dense prediction tasks Authors also released a model for normal prediction π€ Find everything in this collection merve/lotus-6718fb957dc1c85a47ca1210
reacted to thomwolf's
post with πβ€οΈ19 days ago
Just watched @thomwolf tear down the over-hyped AGI narrative in 30 seconds - and it's refreshingly grounded.
No wild speculation about superintelligence timelines or consciousness. Just practical insights from someone who really understands the technology.
This is the kind of level-headed perspective that helps us focus on what AI can actually do today (which is already transformative) rather than getting lost in AGI fantasy. Worth your time if you want to understand AI progress without the hype.