|
--- |
|
library_name: transformers |
|
tags: |
|
- text-generation-inference |
|
license: apache-2.0 |
|
language: |
|
- ja |
|
- en |
|
base_model: |
|
- Qwen/Qwen3-8B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
 |
|
|
|
We are releasing **Qwen3-EZO-8b-beta**, an 8B-parameter LLM based on Qwen3-8B. |
|
|
|
While the model size corresponds to an SLM (Small Language Model), it achieves performance on multi-turn tasks comparable to Gemini 2.5 Flash and GPT-4o. It significantly improves upon the original Qwen3-8B, recording **MT-Bench 9.08** and **JMT-Bench 8.87** scores. |
|
|
|
It supports parallel processing of deep-thinking prompts using our **Deep-Think** technique and is compatible with the OpenAI API via **vLLM** deployment. |
|
|
|
Although it was initially planned as a closed model for API-based access, we have decided to release it as an open model in light of our new policy to monetize only after further accuracy improvements. |
|
|
|
## BenchMark |
|
|
|
 |
|
*Based on repeated evaluations of frequent outputs at temperatures 0.2 and 0.6, conducted on May 13, 2025, using GPT-4o and Gemini 2.5 Flash as judges.* |
|
*All tests were performed internally on a single A40 GPU. Results may vary under external or official benchmark conditions.* |
|
|
|
-- |
|
 |
|
|
|
|
|
## How to use: |
|
Runs on a single A40 GPU. |
|
|
|
```bash |
|
vllm serve AXCXEPT/Qwen3-EZO-8b-beta --enable-reasoning --reasoning-parser deepseek_r1 |
|
``` |
|
|
|
```python |
|
from openai import OpenAI |
|
client = OpenAI( |
|
base_url="http://localhost:8000/v1", |
|
api_key="token-abc123", |
|
) |
|
|
|
prompt = """Every morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+rac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop.""" |
|
completion = client.chat.completions.create( |
|
model="AXCXEPT/Qwen3-EZO-8b-beta", |
|
messages=[ |
|
{"role": "user", "content": prompt} |
|
] |
|
) |
|
|
|
print(completion.choices[0].message) |
|
``` |
|
|
|
## Special Thanks |
|
本モデルのベースモデルの開発を行った、Alibaba Cloud社ならびにQwen開発チームに、尊敬と敬意の念をここに表します。 |
|
We would like to express our sincere respect and appreciation to Alibaba Cloud and the Qwen development team for their work in creating the base model for this project. |
|
|