Marian Kannwischer
canwiper
AI & ML interests
RLHF & Computer Vision
Recent Activity
liked
a dataset
5 days ago
Rapidata/multilingual-llm-jokes-4o-claude-gemini
reacted
to
jasoncorkill's
post
with 🚀
about 1 month ago
Imagine you could have an Image Arena score equivalent at each checkpoint during training. We released the first version of just that:
Crowd-Eval
Add one line of code to your training loop and you will have a new real human loss curve in your W&B dashboard.
Thousands of real humans from around the world rating your model in real time at the cost of a few dollars per checkpoint is a game changer.
Check it out here: https://github.com/RapidataAI/crowd-eval
First 5 people to put it in their loop get 100'000 human responses for free! (ping me)
reacted
to
jasoncorkill's
post
with 👀
about 2 months ago
Benchmark Update: @google Veo3 (Text-to-Video)
Two months ago, we benchmarked @google’s Veo2 model. It fell short, struggling with style consistency and temporal coherence, trailing behind Runway, Pika, @tencent, and even @alibaba-pai.
That’s changed.
We just wrapped up benchmarking Veo3, and the improvements are substantial. It outperformed every other model by a wide margin across all key metrics. Not just better, dominating across style, coherence, and prompt adherence. It's rare to see such a clear lead in today’s hyper-competitive T2V landscape.
Dataset coming soon. Stay tuned.