|
--- |
|
license: cc |
|
datasets: |
|
- MBZUAI/LaMini-instruction |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
metrics: |
|
- accuracy |
|
new_version: Bertug1911/BrtGPT-1-Pre-Code |
|
library_name: adapter-transformers |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
--- |
|
|
|
# BrtGPT-1-Pre |
|
|
|
## 1. Introduction |
|
|
|
***NEW USE CODE!*** (Shorter and faster than the first code!) |
|
|
|
We're introducing our first question-and-answer language model, "BrtGPT-1-Preview." The model was trained using GPT-2-sized question-and-answer data (~150M tokens, 1 epoch) using a "chat template" instead of plain text. |
|
|
|
The model performed surprisingly well in simple question-and-answer, creativity, and knowledge-based chat. |
|
|
|
It's quite good for general/everyday chat. |
|
|
|
But it has some shortcomings: |
|
- Simple math, |
|
- Code, |
|
- High school and college-level science and engineering questions |
|
|
|
However, if necessary, deficiencies can be corrected with fine-tuning in areas of concern. |
|
Furthermore, while generally avoiding harmful responses, caution should still be exercised regarding potentially damaging responses. |
|
|
|
## 2. Technical Specifications |
|
|
|
Model specifications: |
|
|
|
- Context length: 1024 tokens (~768 words) |
|
- Maximum output length: 128 tokens (~96 words) |
|
- Parameter count: ~90 Million |
|
- Architecture type: Transformer (Decoder-only) |
|
|
|
|
|
## 3. USE |
|
|
|
For use, you can use this code: |
|
|
|
``` |
|
from transformers import pipeline |
|
|
|
# Pipeline |
|
pipe = pipeline( |
|
"text-generation", |
|
model="Bertug1911/BrtGPT-1-Pre", |
|
trust_remote_code=True, |
|
top_k=40, # Good for creativity |
|
temperature=0.8, # Good for creativity |
|
max_new_tokens=128 # Default maximum model output (Maximum 1024) |
|
) |
|
|
|
# Messages |
|
messages = [ |
|
{"role": "user", "content": "What is the capital of France?"}, |
|
] |
|
|
|
# Take out |
|
output = pipe(messages) |
|
|
|
# Only write asistant's (Model output) answer |
|
assistant_response = output[0]["generated_text"][-1]["content"].strip() |
|
# Special token conversions |
|
formatted_out = assistant_response.replace(" ", "").replace("Ġ", " ").replace("Ċ", "\n") |
|
|
|
print(formatted_out) |
|
``` |
|
### 3.1 Direct Use |
|
|
|
You can direct use (GUI (Graphical interface)) with [**Hugging Face Spaces**](https://huggingface.co/spaces/Bertug1911/BrtGPT.1.Pre-Web-UI). |
|
|
|
### 3.1 Parameters |
|
|
|
| | top_k | temperature | max_new_tokens |
|
| :------------: | :------------: | :------------: | :------------: | |
|
| Creativity | 40-65 | 0.7-0.9 | 64-512 | |
|
| Coding | 10-25 | 0.1-0.25 | 32-128 | |
|
| Basic QA | 30-40 | 0.5-0.8 | 32-64 | |
|
| Math | 1-15 | 0.05-0.15 | 16-64 | |
|
| Knowladge-base QA | 20-30 | 0.4-0.6 | 32-64 | |
|
|
|
### 5. Use examples |
|
|
|
Usage examples: |
|
|
|
| Prompt | Top-k | Temperature | Output | |
|
| :------------: | :------------: | :------------: | :------------: | |
|
| "What is the capital of France?" | 1-40 | 0.1-0.8 | "Paris."/"Capital of the France is Paris." | |
|
| "Write me a story about penguins." | 40 | 0.1 | "Once upon a time, there was a young girl named Lily who loved to play fetch. She had always loved playing fetch, but she had never been to a local animal shelter. One day, she saw a group of children playing fetch, but she wasn't sure what to do." | |
|
| "What is 55 * 3" | 10 | 0.15| "55 * 3 is equal to 0." | |
|
| "Write me a code that prints "Hello World" | 10 | 0.15 | "Here's a code that prints "Hello World" in a list of words:```for i in range(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5," | |
|
|
|
## 6. Evulation |
|
|
|
| | BrtGPT-1-Pre | BrtGPT-1-0719 | |
|
| :------------: | :------------: | :------------: | |
|
| AIME 2025 | %0 | Cooming soon | |
|
| MMLU high-school-math | %1,45 | Cooming soon | |
|
| GPQA Diamond | %1,01 | Cooming soon | |
|
|
|
HLE (Humanity's Last Exam): |
|
|
|
| | [BrtGPT-124m-Base](https://huggingface.co/Bertug1911/BrtGPT-124m-Base) | [BrtGPT-1-0719](https://huggingface.co/Bertug1911/BrtGPT-1-0719) | [BrtGPT-1-Pre](https://huggingface.co/Bertug1911/BrtGPT-1-Pre) | GPT-4o (ChatGPT) | Claude-4-sonnet | GPT-5 minimal | GPT-4.1 | [LLama-4 Maverick](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct) | [Phi-4](http://huggingface.co/microsoft/phi-4) | |
|
| :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | |
|
| HLE (Humanity's Last Exam) | %0,5< | %4 | %3.5< | %4 | %4 | %5 | %4 | %5 | %5 | %4 | |
|
|
|
|
|
## 7. Risks and biases |
|
|
|
Model may generates: |
|
- Illegal outputs |
|
- Harmfull contents |
|
|
|
Use with caution!! |
|
|
|
## Contact |
|
|
|
"[email protected]" or "[email protected]" |