BrtGPT-1-Pre / README.md
Bertug1911's picture
Update README.md
47dfb16 verified
|
raw
history blame
1.09 kB
metadata
license: cc
datasets:
  - MBZUAI/LaMini-instruction
language:
  - en
pipeline_tag: text-generation

BrtGPT-1-Pre

1. Introduction

We're introducing our first question-and-answer language model, "BratGPT-1-Preview." The model was trained using GPT-2-sized question-and-answer data (~150M tokens, 1 epoch) using a "chat template" instead of plain text.

The model performed surprisingly well in simple question-and-answer, creativity, and knowledge-based chat.

It's quite good for general/everyday chat.

But it has some shortcomings:

  • Simple math,
  • Code,
  • High school and college-level science and engineering questions

However, if necessary, deficiencies can be corrected with fine-tuning in areas of concern. Furthermore, while generally avoiding harmful responses, caution should still be exercised regarding potentially damaging responses.

2. Technical Specifications

Model specifications:

  • Context length: 1024 tokens (~768 words)
  • Maximum output length: 128 tokens (~96 words)
  • Parameter count: ~90 Million
  • Architecture type: Transformer (Decoder-only)