metadata

license: cc
datasets:
  - MBZUAI/LaMini-instruction
language:
  - en
pipeline_tag: text-generation

BrtGPT-1-Pre

1. Introduction

We're introducing our first question-and-answer language model, "BratGPT-1-Preview." The model was trained using GPT-2-sized question-and-answer data (~150M tokens, 1 epoch) using a "chat template" instead of plain text.

The model performed surprisingly well in simple question-and-answer, creativity, and knowledge-based chat.

It's quite good for general/everyday chat.

But it has some shortcomings:

Simple math,
Code,
High school and college-level science and engineering questions

However, if necessary, deficiencies can be corrected with fine-tuning in areas of concern. Furthermore, while generally avoiding harmful responses, caution should still be exercised regarding potentially damaging responses.

2. Technical Specifications

Model specifications:

Context length: 1024 tokens (~768 words)
Maximum output length: 128 tokens (~96 words)
Parameter count: ~90 Million
Architecture type: Transformer (Decoder-only)