Hacked together a way to log trl GRPO training completions to a π€ dataset repo. This allows you to:
- Track rewards from multiple reward functions - Treat the completion and rewards from training as a "proper" dataset and do EDA - Share results for open science
The implementation is super hacky, but I'm curious if people would find this useful.
To push completions to the Hub, you just need two extra parameters:
Google just released PaliGemma 2 Mix: new versatile instruction vision language models π₯
> Three new models: 3B, 10B, 28B with res 224, 448 π > Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything π€―