blogpost-scaling-test-time-compute

Running

App Files Files Community

Using and Training on Test Time Scaling approaches in Non-Verifiable Domains

#14

by blattimer - opened Feb 24

Discussion

blattimer

Feb 24

In reference to point 6, Expanding Beyond Verifiable Domains, our paper "SPARSE REWARDS CAN SELF-TRAIN DIALOGUE AGENTS" creates training data using test time scaling methodology and then trains on that data to self improve models in the dialogue domain. This domain is much more loose in it's verifiability and involves multi turn dialogue rather than single turn math. Let me know if this was helpful in exploring some alternative domains!

https://arxiv.org/pdf/2409.04617

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment