thehosy
/

TheSyx-V1-7B-Base

Text Generation

Model card Files Files and versions Community

TheSyx-V1-7B-Base / README.md

thehosy's picture

Update README.md

e11a3a0 verified about 2 months ago

|

history blame contribute delete

819 Bytes

	---
	license: mit
	language:
	- vi
	pipeline_tag: text-generation
	---

	# TheSyx-V1-7B-Base

	## Introduction

	TheSyx-V1-7B-Base is the first LLM released by [thehosy](https://huggingface.co/thehosy).

	Features:
	- Type: Causal Language Models
	- Training Stage: Pretraining
	- Architecture: Qwen3 MoE
	- Number of Parameters: 7.52B
	- Number of Layers: 28
	- Number of Attention Heads (GQA): 24 for Q and 4 for KV
	- Context Length: Full 16384 tokens and generation 8192 tokens

	We do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.

	## Requirements

	The code of TheSyx-V1-7B-Base has been in the latest Hugging face `transformers` and i advise you to use the latest version of `transformers`.

	## Citation

	...