File size: 6,343 Bytes
2898468
 
 
 
 
 
 
0f52654
 
 
2898468
e79166d
 
 
8738268
e79166d
 
 
 
 
 
 
 
4812a9f
210ae1a
 
091736d
 
e79166d
 
 
 
 
 
9d76a87
e79166d
 
 
 
83f881f
4df46b0
 
ba61b5c
4df46b0
 
091736d
 
 
 
 
 
 
 
 
 
 
 
deb53f7
091736d
e79166d
 
7cfc6f6
e79166d
 
 
 
091736d
 
 
5edf18f
091736d
 
e79166d
 
f4cd357
 
 
e79166d
 
 
cd23a53
e79166d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
091736d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e79166d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0e11f23
e79166d
d8beba5
c326308
d8beba5
e79166d
 
 
 
cd23a53
e79166d
 
a668bb4
 
 
51f3b16
a668bb4
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
license: mit
language:
- zh
- en
base_model:
- inclusionAI/Ling-lite-base-1.5
new_version: inclusionAI/Ring-lite-2507
pipeline_tag: text-generation
library_name: transformers
---
# Ring-lite

<p align="center">
    <img src="https://huggingface.co/inclusionAI/Ring-lite/resolve/main/ant-bailing.png" width="100"/>
<p>

<p align="center">
          🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>
<p>

## Introduction

Ring-lite is a lightweight, fully open-sourced MoE (Mixture of Experts) LLM designed for complex reasoning tasks. It is built upon the publicly available [Ling-lite-1.5](https://huggingface.co/inclusionAI/Ling-lite-1.5) model, which has 16.8B parameters with 2.75B activated parameters.. We use a joint training pipeline combining knowledge distillation with reinforcement learning, achieving performance comparable to state-of-the-art (SOTA) small-size reasoning models on challenging benchmarks (AIME, LiveCodeBench, and GPQA-Diamond) while activating only one-third of their parameters.


## News
[20250704] Ring-lite-0704: we update Ring-lite model, which supports two distinct reasoning modes: "**thinking on**" and "**thinking off**".
## Model Downloads

<div align="center">

|     **Model**      | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
| :----------------: | :---------------: | :-------------------: | :----------------: | :----------: |
| Ring-lite |       16.8B       |         2.75B         |        128K         |      [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ring-lite) |

</div>

## Evaluation
For a comprehensive evaluation of the quality of our reasoning models, we implemented automatic benchmarks to assess their performance including math, code and science.

<p align="center">
    <img src="https://huggingface.co/inclusionAI/Ring-lite/resolve/main/performance.png" width="1000"/>
<p>

To compare the performance of Ring-lite-0704 and Ring-lite-0616, we evaluate the two models on a broader range of reasoning and general-purpose benchmarks, including instruction following, function calling, and creative writing. 
| **Dataset** | **Ring-lite-0616** | **Ring-lite-0704** |
| :---------: | :----------------: | :----------------: |
| AIME 2024 | 76.6 | 79.0 |
| AIME 2025 | 69.1 | 69.5 |
| LiveCodeBench | 60.7 | 61.4 |
| Codeforces (percentile) | 86.5 | 88.0 |
| GPQA Diamond | 61.1 | 63.2 |
| C-Eval | 59.0 | 65.4 |
| MMLU-Pro | 60.0 | 63.0 |
| ArenaHard | 27.8 | 62.7 |
| IF-Eval | 51.6 | 54.3 |
| BFCL_Live | 60.1 | 66.8 |
| Creative Writing | 6.7 | 60.2 |


More details are reported in our [technical report](https://arxiv.org/abs/2506.14731).

## Quickstart

### 🤗 Hugging Face Transformers
The newly updated **Ring-lite** model now supports two distinct reasoning modes: "**thinking on**" and "**thinking off**". These modes are controlled by the `enable_thinking` parameter in the `tokenizer.apply_chat_template()` function.
* When `enable_thinking` is set to `True` (or omitted), the model operates in "**thinking on**" mode, where it generates and outputs the internal reasoning process.
* When `enable_thinking` is explicitly set to `False`, the model runs in "**thinking off**" mode, skipping the reasoning step entirely and directly producing the final answer.

This feature allows users to choose between detailed reasoning and concise output based on their specific needs.

Here is a code snippet to show you how to use the chat model with `transformers`:

#### Thinking on


```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "inclusionAI/Ring-lite"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language models."
messages = [
    {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

#### Thinking off
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "inclusionAI/Ring-lite"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language models."
messages = [
    {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## Dataset
The training data of Ring-lite is release at [Ring-lite-sft-data](https://huggingface.co/datasets/inclusionAI/Ring-lite-sft-data) and [Ring-lite-rl-data](https://huggingface.co/datasets/inclusionAI/Ring-lite-rl-data). 

## Code
The training code will be released soon.

## Deployment
Please refer to [GitHub](https://github.com/inclusionAI/Ring/blob/main/README.md)

## License
This code repository is licensed under [the MIT License](https://huggingface.co/inclusionAI/Ring-lite/blob/main/LICENSE).

## Citation
```
@misc{ringteam2025ringlitescalablereasoningc3postabilized,
      title={Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs}, 
      author={Ling Team},
      year={2025},
      eprint={2506.14731},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.14731}, 
}
```