File size: 4,689 Bytes
9d0548b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79a76ae
 
9d0548b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79a76ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33a0fe4
79a76ae
33a0fe4
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
license: cc-by-nc-4.0
datasets:
- nickrosh/Evol-Instruct-Code-80k-v1
- MBZUAI/LaMini-instruction
language:
- en
base_model:
- Bertug1911/BrtGPT-1-Pre
pipeline_tag: text-generation
tags:
- code
---

# BrtGPT-1-Pre-Code

## Model Summary

We're introducing "BratGPT-1-Pre-Code"! Our model was retrained using the "BrtGPT-1-Pre" model, which was already pre-trained, using code data.

Compared to the BrtGPT-1-Pre model, it can write much better code, even with typos.

No change was observed in general/daily chat and simple knowledge-based question-and-answer capabilities.

It may produce some harmful output.

## Difference Between Models

Examples:

| Prompt | BrtGPT-1-Pre |
| :------------: | :------------: |
| "Write me a code that prints "Hello World". | "Here's a code that prints "Hello World" in a list of words:```for i in range(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5," |
| "Write me a code that generates random number."| Code: |


```
def random(1):
    return random(1)

```

BrtGPT-1-Pre-Code's answers:

1- "Write me a code that prints "Hello World" Code:

```
```python
def print_hello_numbers(numbers):
    if num < num:
        return num
    elif num % num % num % num % num % num % num % num % num % num % num % num % num % num % num % num % num
```

2- "Write me a code that generates random number. Code:

```
#Here is a code that generates random number in python 3:

```python
def generate_random_number(num):
    # Create a new random number between 1 and 1
    random_number = random.randint(num)
    random_number = random.randint(num)
    random_number = random.randint(num)

    # Create a new
```


## How to use?

You can run this code to use:

```
import torch
from transformers import PreTrainedTokenizerFast, GPT2LMHeadModel

def extract_response_between_tokens(text: str) -> str:

    start_token = "<|im_start|>assistant<|im_sep|>"
    end_token = "<|im_end|>"
    try:
        start_idx = text.index(start_token) + len(start_token)
        end_idx = text.index(end_token, start_idx)
        return text[start_idx:end_idx]
    except ValueError:
        # Tokenlar bulunamazsa orijinal metni döndür
        return text

if __name__ == "__main__":
    model_name_or_path = "Bertug1911/BrtGPT-1-Pre"

    tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name_or_path)
    model = GPT2LMHeadModel.from_pretrained(model_name_or_path)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    model.eval()

    user_input = input("Enter something to ask model: ")


    messages = [{"role": "user", "content": user_input}]


    formatted_prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )


    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
    generated = inputs["input_ids"]

    # Generate config
    max_new_tokens = 128
    do_sample = True
    top_k = 40
    temperature = 0.8

    im_end_token_id = tokenizer.convert_tokens_to_ids("<|im_end|>")

    with torch.no_grad():
        for i in range(max_new_tokens):
            outputs = model(generated)
            logits = outputs.logits[:, -1, :]
            logits = logits / temperature

            if top_k > 0:
                top_k_values, top_k_indices = torch.topk(logits, top_k)
                logits_filtered = torch.full_like(logits, float('-inf'))
                logits_filtered.scatter_(1, top_k_indices, top_k_values)
                logits = logits_filtered

            probs = torch.softmax(logits, dim=-1)

            if do_sample:
                next_token = torch.multinomial(probs, num_samples=1)
            else:
                next_token = torch.argmax(probs, dim=-1, keepdim=True)

            generated = torch.cat([generated, next_token], dim=1)

            if next_token.item() == im_end_token_id:
                break



    output = tokenizer.decode(generated[0], skip_special_tokens=False)

    # Special token conversions
    no_spaces = output.replace(" ", "")
    step2 = no_spaces.replace("Ġ", " ")
    formatted_output = step2.replace("Ċ", "\n")

    if not formatted_output.strip().endswith("<|im_end|>"):
        formatted_output += "<|im_end|>"


    assistant_response = extract_response_between_tokens(formatted_output)
    print("\nModel output:\n", assistant_response)

```
## Evulation

Evulation results is cooming soon!

## Risks and biases

Model may generates:
- Illegal outputs
- Harmfull contents

Use with caution!!

## Contact

"[email protected]" or "[email protected]"