|
--- |
|
license: cc |
|
datasets: |
|
- MBZUAI/LaMini-instruction |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# BrtGPT-1-Pre |
|
|
|
## 1. Introduction |
|
|
|
We're introducing our first question-and-answer language model, "BratGPT-1-Preview." The model was trained using GPT-2-sized question-and-answer data (~150M tokens, 1 epoch) using a "chat template" instead of plain text. |
|
|
|
The model performed surprisingly well in simple question-and-answer, creativity, and knowledge-based chat. |
|
|
|
It's quite good for general/everyday chat. |
|
|
|
But it has some shortcomings: |
|
- Simple math, |
|
- Code, |
|
- High school and college-level science and engineering questions |
|
|
|
However, if necessary, deficiencies can be corrected with fine-tuning in areas of concern. |
|
Furthermore, while generally avoiding harmful responses, caution should still be exercised regarding potentially damaging responses. |
|
|
|
## 2. Technical Specifications |
|
|
|
Model specifications: |
|
|
|
- Context length: 1024 tokens (~768 words) |
|
- Maximum output length: 128 tokens (~96 words) |
|
- Parameter count: ~90 Million |
|
- Architecture type: Transformer (Decoder-only) |
|
|
|
|