Arsh-LLM (14B)
attention:
This model is still under development, and pretraining is being done!
Model Description
Arsh-LLM is a 14 billion parameter causal language model based on the ARSH architecture. This model features an extended context length of 16k tokens and has been optimized for efficient training and inference.
- Model type: Transformer-based language model
- Language(s): Primarily designed for English (can be fine-tuned for other languages)
Model Specifications
Parameter | Value |
---|---|
Architecture | ArshForCausalLM |
Parameters | 14B |
Layers | 40 |
Hidden Size | 5120 |
Attention Heads | 40 |
Key/Value Heads | 10 |
Head Dimension | 128 |
Intermediate Size | 17920 |
Max Sequence Length | 16384 |
Activation | SiLU |
Norm | RMSNorm (ε=1e-5) |
RoPE Theta | 250000 |
Vocabulary Size | 100352 |
Precision | float16 |
Uses
Direct Use
Arsh-LLM can be used for:
- Text generation
- Language understanding tasks
- As a foundation for further fine-tuning
Downstream Use
Potential applications include:
- Chatbots and conversational AI
- Content generation
- Code generation and completion
- Question answering systems
Out-of-Scope Use
The model should not be used for: - Generating harmful or misleading content
Training Details
Training Data
This model was pretrained in two steps:
1- Human like language generation
We used phi to calculate initial weights. Then, we trained model on great datasets.
2- Knowlage increase
We focused on model's knowlage here, Using datasets from different subjects (medical, mathematics, physics, chemistry, litterature, history & etc...) helped us to do this part.
Arsh-llm is trained on many datasets, which some are private and the most important public model is PILE, by ELEUTHER AI.
Technical Specifications
Compute Infrastructure
As the architecture is based on Arsh architecture, you can easily use it on unsloth.
License
This model is licensed under MIT. We'd appreciate it if you helped us developing this model! We used some codes to train from Phi (MIT), gpt neox (apache-2.0).
Special Thanks
Thanks to Meta (architecture), Microsoft (Phi), Eleuther ai (gpt neo, pile)
- Downloads last month
- 311