tencent/InstantCharacter
Updated
•
55
None defined yet.
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains