TheSyx-V1-7B-Base / README.md
thehosy's picture
Update README.md
e11a3a0 verified
---
license: mit
language:
- vi
pipeline_tag: text-generation
---
# TheSyx-V1-7B-Base
## Introduction
TheSyx-V1-7B-Base is the first LLM released by [thehosy](https://huggingface.co/thehosy).
**Features**:
- Type: Causal Language Models
- Training Stage: Pretraining
- Architecture: Qwen3 MoE
- Number of Parameters: 7.52B
- Number of Layers: 28
- Number of Attention Heads (GQA): 24 for Q and 4 for KV
- Context Length: Full 16384 tokens and generation 8192 tokens
**We do not recommend using base language models for conversations**. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.
## Requirements
The code of TheSyx-V1-7B-Base has been in the latest Hugging face `transformers` and i advise you to use the latest version of `transformers`.
## Citation
...