Arsh-LLM (14B)

attention:

This model is still under development, and pretraining is being done!

Model Description

Arsh-LLM is a 14 billion parameter causal language model based on the ARSH architecture. This model features an extended context length of 16k tokens and has been optimized for efficient training and inference.

Model type: Transformer-based language model
Language(s): Primarily designed for English (can be fine-tuned for other languages)

Model Specifications

Parameter	Value
Architecture	ArshForCausalLM
Parameters	14B
Layers	40
Hidden Size	5120
Attention Heads	40
Key/Value Heads	10
Head Dimension	128
Intermediate Size	17920
Max Sequence Length	16384
Activation	SiLU
Norm	RMSNorm (ε=1e-5)
RoPE Theta	250000
Vocabulary Size	100352
Precision	float16

Uses

Direct Use

Arsh-LLM can be used for:

Text generation
Language understanding tasks
As a foundation for further fine-tuning

Downstream Use

Potential applications include:

Chatbots and conversational AI
Content generation
Code generation and completion
Question answering systems

Out-of-Scope Use

The model should not be used for: - Generating harmful or misleading content

Training Details

Training Data

This model was pretrained in two steps:

1- Human like language generation

We used phi to calculate initial weights. Then, we trained model on great datasets.

2- Knowlage increase

We focused on model's knowlage here, Using datasets from different subjects (medical, mathematics, physics, chemistry, litterature, history & etc...) helped us to do this part.

Arsh-llm is trained on many datasets, which some are private and the most important public model is PILE, by ELEUTHER AI.

Technical Specifications

Compute Infrastructure

As the architecture is based on Arsh architecture, you can easily use it on unsloth.

License

This model is licensed under MIT. We'd appreciate it if you helped us developing this model! We used some codes to train from Phi (MIT), gpt neox (apache-2.0).

Special Thanks

Thanks to Meta (architecture), Microsoft (Phi), Eleuther ai (gpt neo, pile)

arshiaafshani
/

Arsh-llm-14b