TheSyx-V1-7B-Base / README.md
thehosy's picture
Update README.md
e11a3a0 verified
metadata
license: mit
language:
  - vi
pipeline_tag: text-generation

TheSyx-V1-7B-Base

Introduction

TheSyx-V1-7B-Base is the first LLM released by thehosy.

Features:

  • Type: Causal Language Models
  • Training Stage: Pretraining
  • Architecture: Qwen3 MoE
  • Number of Parameters: 7.52B
  • Number of Layers: 28
  • Number of Attention Heads (GQA): 24 for Q and 4 for KV
  • Context Length: Full 16384 tokens and generation 8192 tokens

We do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.

Requirements

The code of TheSyx-V1-7B-Base has been in the latest Hugging face transformers and i advise you to use the latest version of transformers.

Citation

...