AI & ML interests

None defined yet.

Recent Activity

Phind's activity

philschmid 
posted an update 21 days ago
view post
Post
2562
Gemini 2.5 Pro, thinking by default! We excited launch our best Gemini model for reasoning, multimodal and coding yet! #1 on LMSYS, Humanity’s Last Exam, AIME and GPQA and more!

TL;DR:
- 💻 Best Gemini coding model yet, particularly for web development (excels on LiveCodeBench).
- 🧠 Default "Thinking" with up to 64k token output
- 🌌 1 Million multimodal input context for text, image, video, audio, and pdf
- 🛠️ Function calling, structured output, google search & code execution.
- 🏆  #1 on LMArena & sota on AIME, GPQA, Humanity's Last Exam
- 💡 Knowledge cut of January 2025
- 🤗 Available for free as Experimental in AI Studio, Gemini API & Gemini APP
- 🚀 Rate limits - Free 2 RPM 50 req/day

Try it ⬇️

https://aistudio.google.com/?model=gemini-2.5-pro-exp-03-25
·
philschmid 
posted an update about 1 year ago
view post
Post
8480
New state-of-the-art open LLM! 🚀 Databricks just released DBRX, a 132B MoE trained on 12T tokens. Claiming to surpass OpenAI GPT-3.5 and is competitive with Google Gemini 1.0 Pro. 🤯

TL;DR
🧮 132B MoE with 16 experts with 4 active in generation
🪟 32 000 context window
📈 Outperforms open LLMs on common benchmarks, including MMLU
🚀 Up to 2x faster inference than Llama 2 70B
💻 Trained on 12T tokens
🔡 Uses the GPT-4 tokenizer
📜 Custom License, commercially useable

Collection: databricks/dbrx-6601c0852a0cdd3c59f71962
Demo: https://huggingface.co/spaces/databricks/dbrx-instruct

Kudos to the Team at Databricks and MosaicML for this strong release in the open community! 🤗
·
philschmid 
posted an update about 1 year ago
view post
Post
What's the best way to fine-tune open LLMs in 2024? Look no further! 👀 I am excited to share “How to Fine-Tune LLMs in 2024 with Hugging Face” using the latest research techniques, including Flash Attention, Q-LoRA, OpenAI dataset formats (messages), ChatML, Packing, all built with Hugging Face TRL. 🚀

It is created for consumer-size GPUs (24GB) covering the full end-to-end lifecycle with:
💡Define and understand use cases for fine-tuning
🧑🏻‍💻 Setup of the development environment
🧮 Create and prepare dataset (OpenAI format)
🏋️‍♀️ Fine-tune LLM using TRL and the SFTTrainer
🥇 Test and evaluate the LLM
🚀 Deploy for production with TGI

👉  https://www.philschmid.de/fine-tune-llms-in-2024-with-trl

Coming soon: Advanced Guides for multi-GPU/multi-Node full fine-tuning and alignment using DPO & KTO. 🔜
·