File size: 4,689 Bytes
9d0548b 79a76ae 9d0548b 79a76ae 33a0fe4 79a76ae 33a0fe4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
---
license: cc-by-nc-4.0
datasets:
- nickrosh/Evol-Instruct-Code-80k-v1
- MBZUAI/LaMini-instruction
language:
- en
base_model:
- Bertug1911/BrtGPT-1-Pre
pipeline_tag: text-generation
tags:
- code
---
# BrtGPT-1-Pre-Code
## Model Summary
We're introducing "BratGPT-1-Pre-Code"! Our model was retrained using the "BrtGPT-1-Pre" model, which was already pre-trained, using code data.
Compared to the BrtGPT-1-Pre model, it can write much better code, even with typos.
No change was observed in general/daily chat and simple knowledge-based question-and-answer capabilities.
It may produce some harmful output.
## Difference Between Models
Examples:
| Prompt | BrtGPT-1-Pre |
| :------------: | :------------: |
| "Write me a code that prints "Hello World". | "Here's a code that prints "Hello World" in a list of words:```for i in range(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5," |
| "Write me a code that generates random number."| Code: |
```
def random(1):
return random(1)
```
BrtGPT-1-Pre-Code's answers:
1- "Write me a code that prints "Hello World" Code:
```
```python
def print_hello_numbers(numbers):
if num < num:
return num
elif num % num % num % num % num % num % num % num % num % num % num % num % num % num % num % num % num
```
2- "Write me a code that generates random number. Code:
```
#Here is a code that generates random number in python 3:
```python
def generate_random_number(num):
# Create a new random number between 1 and 1
random_number = random.randint(num)
random_number = random.randint(num)
random_number = random.randint(num)
# Create a new
```
## How to use?
You can run this code to use:
```
import torch
from transformers import PreTrainedTokenizerFast, GPT2LMHeadModel
def extract_response_between_tokens(text: str) -> str:
start_token = "<|im_start|>assistant<|im_sep|>"
end_token = "<|im_end|>"
try:
start_idx = text.index(start_token) + len(start_token)
end_idx = text.index(end_token, start_idx)
return text[start_idx:end_idx]
except ValueError:
# Tokenlar bulunamazsa orijinal metni döndür
return text
if __name__ == "__main__":
model_name_or_path = "Bertug1911/BrtGPT-1-Pre"
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name_or_path)
model = GPT2LMHeadModel.from_pretrained(model_name_or_path)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
user_input = input("Enter something to ask model: ")
messages = [{"role": "user", "content": user_input}]
formatted_prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
generated = inputs["input_ids"]
# Generate config
max_new_tokens = 128
do_sample = True
top_k = 40
temperature = 0.8
im_end_token_id = tokenizer.convert_tokens_to_ids("<|im_end|>")
with torch.no_grad():
for i in range(max_new_tokens):
outputs = model(generated)
logits = outputs.logits[:, -1, :]
logits = logits / temperature
if top_k > 0:
top_k_values, top_k_indices = torch.topk(logits, top_k)
logits_filtered = torch.full_like(logits, float('-inf'))
logits_filtered.scatter_(1, top_k_indices, top_k_values)
logits = logits_filtered
probs = torch.softmax(logits, dim=-1)
if do_sample:
next_token = torch.multinomial(probs, num_samples=1)
else:
next_token = torch.argmax(probs, dim=-1, keepdim=True)
generated = torch.cat([generated, next_token], dim=1)
if next_token.item() == im_end_token_id:
break
output = tokenizer.decode(generated[0], skip_special_tokens=False)
# Special token conversions
no_spaces = output.replace(" ", "")
step2 = no_spaces.replace("Ġ", " ")
formatted_output = step2.replace("Ċ", "\n")
if not formatted_output.strip().endswith("<|im_end|>"):
formatted_output += "<|im_end|>"
assistant_response = extract_response_between_tokens(formatted_output)
print("\nModel output:\n", assistant_response)
```
## Evulation
Evulation results is cooming soon!
## Risks and biases
Model may generates:
- Illegal outputs
- Harmfull contents
Use with caution!!
## Contact
"[email protected]" or "[email protected]" |