arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-penalize Reinforcement Learning • Updated 8 days ago
arianaazarbal/Qwen3-8B-train-away-lying-lr1e4-temp10-penalize Reinforcement Learning • Updated 9 days ago
arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-hindsight Reinforcement Learning • Updated 9 days ago
arianaazarbal/Qwen3-8B-train-away-lying-lr1e4-temp10-hindsight Reinforcement Learning • Updated 9 days ago
arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-penalize-neutral-neutral Reinforcement Learning • Updated 8 days ago
stewy33/Qwen3-8B-0524_original_augmented_original_cat_comment_and_cake-ae9eb5b5 Updated 7 days ago • 2
stewy33/Qwen3-8B-0524_original_augmented_original_cat_abortion_and_fda-df061258 Updated 7 days ago • 1