BrtGPT-1-Pre / README.md

Bertug1911

Update README.md

df439e1 verified 21 days ago

4.55 kB

	---
	license: cc
	datasets:
	- MBZUAI/LaMini-instruction
	language:
	- en
	pipeline_tag: text-generation
	metrics:
	- accuracy
	new_version: Bertug1911/BrtGPT-1-Pre-Code
	library_name: adapter-transformers
	tags:
	- text-generation-inference
	- transformers
	---

	# BrtGPT-1-Pre

	## 1. Introduction

	*NEW USE CODE!* (Shorter and faster than the first code!)

	We're introducing our first question-and-answer language model, "BrtGPT-1-Preview." The model was trained using GPT-2-sized question-and-answer data (~150M tokens, 1 epoch) using a "chat template" instead of plain text.

	The model performed surprisingly well in simple question-and-answer, creativity, and knowledge-based chat.

	It's quite good for general/everyday chat.

	But it has some shortcomings:
	- Simple math,
	- Code,
	- High school and college-level science and engineering questions

	However, if necessary, deficiencies can be corrected with fine-tuning in areas of concern.
	Furthermore, while generally avoiding harmful responses, caution should still be exercised regarding potentially damaging responses.

	## 2. Technical Specifications

	Model specifications:

	- Context length: 1024 tokens (~768 words)
	- Maximum output length: 128 tokens (~96 words)
	- Parameter count: ~90 Million
	- Architecture type: Transformer (Decoder-only)


	## 3. USE

	For use, you can use this code:

	```
	from transformers import pipeline

	# Pipeline
	pipe = pipeline(
	"text-generation",
	model="Bertug1911/BrtGPT-1-Pre",
	trust_remote_code=True,
	top_k=40, # Good for creativity
	temperature=0.8, # Good for creativity
	max_new_tokens=128 # Default maximum model output (Maximum 1024)
	)

	# Messages
	messages = [
	{"role": "user", "content": "What is the capital of France?"},
	]

	# Take out
	output = pipe(messages)

	# Only write asistant's (Model output) answer
	assistant_response = output[0]["generated_text"][-1]["content"].strip()
	# Special token conversions
	formatted_out = assistant_response.replace(" ", "").replace("Ġ", " ").replace("Ċ", "\n")

	print(formatted_out)
	```
	### 3.1 Direct Use

	You can direct use (GUI (Graphical interface)) with [Hugging Face Spaces](https://huggingface.co/spaces/Bertug1911/BrtGPT.1.Pre-Web-UI).

	### 3.1 Parameters

	\| \| top_k \| temperature \| max_new_tokens
	\| :------------: \| :------------: \| :------------: \| :------------: \|
	\| Creativity \| 40-65 \| 0.7-0.9 \| 64-512 \|
	\| Coding \| 10-25 \| 0.1-0.25 \| 32-128 \|
	\| Basic QA \| 30-40 \| 0.5-0.8 \| 32-64 \|
	\| Math \| 1-15 \| 0.05-0.15 \| 16-64 \|
	\| Knowladge-base QA \| 20-30 \| 0.4-0.6 \| 32-64 \|

	### 5. Use examples

	Usage examples:

	\| Prompt \| Top-k \| Temperature \| Output \|
	\| :------------: \| :------------: \| :------------: \| :------------: \|
	\| "What is the capital of France?" \| 1-40 \| 0.1-0.8 \| "Paris."/"Capital of the France is Paris." \|
	\| "Write me a story about penguins." \| 40 \| 0.1 \| "Once upon a time, there was a young girl named Lily who loved to play fetch. She had always loved playing fetch, but she had never been to a local animal shelter. One day, she saw a group of children playing fetch, but she wasn't sure what to do." \|
	\| "What is 55 * 3" \| 10 \| 0.15\| "55 * 3 is equal to 0." \|
	\| "Write me a code that prints "Hello World" \| 10 \| 0.15 \| "Here's a code that prints "Hello World" in a list of words:```for i in range(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5," \|

	## 6. Evulation

	\| \| BrtGPT-1-Pre \| BrtGPT-1-0719 \|
	\| :------------: \| :------------: \| :------------: \|
	\| AIME 2025 \| %0 \| Cooming soon \|
	\| MMLU high-school-math \| %1,45 \| Cooming soon \|
	\| GPQA Diamond \| %1,01 \| Cooming soon \|

	HLE (Humanity's Last Exam):

	\| \| [BrtGPT-124m-Base](https://huggingface.co/Bertug1911/BrtGPT-124m-Base) \| [BrtGPT-1-0719](https://huggingface.co/Bertug1911/BrtGPT-1-0719) \| [BrtGPT-1-Pre](https://huggingface.co/Bertug1911/BrtGPT-1-Pre) \| GPT-4o (ChatGPT) \| Claude-4-sonnet \| GPT-5 minimal \| GPT-4.1 \| [LLama-4 Maverick](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct) \| [Phi-4](http://huggingface.co/microsoft/phi-4) \|
	\| :------------: \| :------------: \| :------------: \| :------------: \| :------------: \| :------------: \| :------------: \| :------------: \| :------------: \| :------------: \|
	\| HLE (Humanity's Last Exam) \| %0,5< \| %4 \| %3.5< \| %4 \| %4 \| %5 \| %4 \| %5 \| %5 \| %4 \|


	## 7. Risks and biases

	Model may generates:
	- Illegal outputs
	- Harmfull contents

	Use with caution!!

	## Contact

	"[email protected]" or "[email protected]"