metadata
license: cc
datasets:
- MBZUAI/LaMini-instruction
language:
- en
pipeline_tag: text-generation
BrtGPT-1-Pre
1. Introduction
We're introducing our first question-and-answer language model, "BratGPT-1-Preview." The model was trained using GPT-2-sized question-and-answer data (~150M tokens, 1 epoch) using a "chat template" instead of plain text.
The model performed surprisingly well in simple question-and-answer, creativity, and knowledge-based chat.
It's quite good for general/everyday chat.
But it has some shortcomings:
- Simple math,
- Code,
- High school and college-level science and engineering questions
However, if necessary, deficiencies can be corrected with fine-tuning in areas of concern. Furthermore, while generally avoiding harmful responses, caution should still be exercised regarding potentially damaging responses.
2. Technical Specifications
Model specifications:
- Context length: 1024 tokens (~768 words)
- Maximum output length: 128 tokens (~96 words)
- Parameter count: ~90 Million
- Architecture type: Transformer (Decoder-only)