Arsh-LLM (14B)

attention:

This model is still under development, and pretraining is being done!

Model Description

Arsh-LLM is a 14 billion parameter causal language model based on the ARSH architecture. This model features an extended context length of 16k tokens and has been optimized for efficient training and inference.

  • Model type: Transformer-based language model
  • Language(s): Primarily designed for English (can be fine-tuned for other languages)

Model Specifications

Parameter Value
Architecture ArshForCausalLM
Parameters 14B
Layers 40
Hidden Size 5120
Attention Heads 40
Key/Value Heads 10
Head Dimension 128
Intermediate Size 17920
Max Sequence Length 16384
Activation SiLU
Norm RMSNorm (ε=1e-5)
RoPE Theta 250000
Vocabulary Size 100352
Precision float16

Uses

Direct Use

Arsh-LLM can be used for:

  • Text generation
  • Language understanding tasks
  • As a foundation for further fine-tuning

Downstream Use

Potential applications include:

  • Chatbots and conversational AI
  • Content generation
  • Code generation and completion
  • Question answering systems

Out-of-Scope Use

The model should not be used for: - Generating harmful or misleading content

Training Details

Training Data

This model was pretrained in two steps:

1- Human like language generation

We used phi to calculate initial weights. Then, we trained model on great datasets.

2- Knowlage increase

We focused on model's knowlage here, Using datasets from different subjects (medical, mathematics, physics, chemistry, litterature, history & etc...) helped us to do this part.

Arsh-llm is trained on many datasets, which some are private and the most important public model is PILE, by ELEUTHER AI.

Technical Specifications

Compute Infrastructure

As the architecture is based on Arsh architecture, you can easily use it on unsloth.

License

This model is licensed under MIT. We'd appreciate it if you helped us developing this model! We used some codes to train from Phi (MIT), gpt neox (apache-2.0).

Special Thanks

Thanks to Meta (architecture), Microsoft (Phi), Eleuther ai (gpt neo, pile)

Downloads last month
311
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 3 Ask for provider support

Model tree for arshiaafshani/Arsh-llm-14b

Finetunes
1 model
Quantizations
1 model