File size: 2,306 Bytes
cf922ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24b3650
4a92da0
 
 
1699171
 
8037ce5
22f2bf9
8037ce5
1699171
 
4a92da0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf922ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
datasets:
- nvidia/OpenCodeReasoning-2
- GetSoloTech/Code-Reasoning
base_model: GetSoloTech/GPT-OSS-Code-Reasoning-20B
library_name: mlx
tags:
- code-reasoning
- coding
- reasoning
- problem-solving
- algorithms
- python
- c++
- competitive-programming
- vllm
- mlx
pipeline_tag: text-generation
---

# GPT-OSS-Code-Reasoning-20B-qx86-hi-mlx

This is an experimental quant with mixed precision selective layers, some compressed to 6bit, all rendered with group size 32

Side effects of quanting with the qx86-hi formula
```bash
I needed Haskell code.

The q6 starts with Haskell, 10k tokens down the road writes Python, and finishes with React at 30k tokens
The q6-hi, encoded with group size 32, writes some Haskell, and stops around 24k tokens
The qx86-hi worked for 43k tokens, reasoning around the Haskell solution without skipping a beat

It didn't forget Python, it's just a bit more open-minded about other languages
```

From the original model card:

Overview
```bash
Base model: openai/gpt-oss-20b
Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
Dataset: nvidia/OpenCodeReasoning-2 (OCR-2), combining python and cpp splits. 
Each sample reconstructs the upstream question and uses the dataset's r1_generation as the assistant response
Context length: 4096 tokens
Training method: LoRA SFT via TRL SFTTrainer
```

Intended Use
```bash
Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code
```

This model [GPT-OSS-Code-Reasoning-20B-qx86-hi-mlx](https://huggingface.co/GPT-OSS-Code-Reasoning-20B-qx86-hi-mlx) was
converted to MLX format from [GetSoloTech/GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B)
using mlx-lm version **0.26.4**.

## Use with mlx

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("GPT-OSS-Code-Reasoning-20B-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
```