OpenPipe
/

Deductive-Reasoning-Qwen-14B

lbourdois commited on May 25

Commit

e18c8e1

verified ·

1 Parent(s): e4e9e77

Improve language tag (#4)

- Improve language tag (4f26782a6700f86b71db029f2ac89b83f47cbe37)

Co-authored-by: Loïck BOURDOIS <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,27 +1,39 @@
----
-license: mit
-license_link: https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-14B/blob/main/LICENSE
-language:
-- en
-pipeline_tag: text-generation
-base_model:
-- Qwen/Qwen2.5-14B-Instruct
-tags:
-- chat
-library_name: transformers
----
-# Deductive-Reasoning-Qwen-14B
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/674a1d102c0f27a385772cfe/JauBmEQM0FpOdShBMSfst.png)
-Deductive Reasoning Qwen 14B is a reinforcement fine-tune of [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) to solve challenging deduction problems from the [Temporal Clue](https://github.com/bradhilton/temporal-clue) dataset, trained by [OpenPipe](https://openpipe.ai)!
-Here are some additional resources to check out:
-- [Blog Post](https://openpipe.ai/blog/using-grpo-to-beat-o1-o3-mini-and-r1-on-temporal-clue)
-- [Training Recipe](https://github.com/openpipe/deductive-reasoning)
-- [RL Experiments](https://github.com/openpipe/rl-experiments)
-- [Deductive Reasoning Qwen 32B](https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-32B)
-If you're interested in training your own models with reinforcement learning or just chatting, feel free to [reach out](https://openpipe.ai/contact) or email Kyle directly at [email protected]!

+---
+license: mit
+license_link: https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-14B/blob/main/LICENSE
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+pipeline_tag: text-generation
+base_model:
+- Qwen/Qwen2.5-14B-Instruct
+tags:
+- chat
+library_name: transformers
+---
+# Deductive-Reasoning-Qwen-14B
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/674a1d102c0f27a385772cfe/JauBmEQM0FpOdShBMSfst.png)
+Deductive Reasoning Qwen 14B is a reinforcement fine-tune of [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) to solve challenging deduction problems from the [Temporal Clue](https://github.com/bradhilton/temporal-clue) dataset, trained by [OpenPipe](https://openpipe.ai)!
+Here are some additional resources to check out:
+- [Blog Post](https://openpipe.ai/blog/using-grpo-to-beat-o1-o3-mini-and-r1-on-temporal-clue)
+- [Training Recipe](https://github.com/openpipe/deductive-reasoning)
+- [RL Experiments](https://github.com/openpipe/rl-experiments)
+- [Deductive Reasoning Qwen 32B](https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-32B)
+If you're interested in training your own models with reinforcement learning or just chatting, feel free to [reach out](https://openpipe.ai/contact) or email Kyle directly at [email protected]!