|
--- |
|
license: mit |
|
language: |
|
- vi |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# TheSyx-V1-7B-Base |
|
|
|
## Introduction |
|
|
|
TheSyx-V1-7B-Base is the first LLM released by [thehosy](https://huggingface.co/thehosy). |
|
|
|
**Features**: |
|
- Type: Causal Language Models |
|
- Training Stage: Pretraining |
|
- Architecture: Qwen3 MoE |
|
- Number of Parameters: 7.52B |
|
- Number of Layers: 28 |
|
- Number of Attention Heads (GQA): 24 for Q and 4 for KV |
|
- Context Length: Full 16384 tokens and generation 8192 tokens |
|
|
|
**We do not recommend using base language models for conversations**. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model. |
|
|
|
## Requirements |
|
|
|
The code of TheSyx-V1-7B-Base has been in the latest Hugging face `transformers` and i advise you to use the latest version of `transformers`. |
|
|
|
## Citation |
|
|
|
... |