lunahr commited on
Commit
64c205b
·
verified ·
1 Parent(s): 667502b

Added readme file

Browse files
Files changed (1) hide show
  1. README.md +242 -0
README.md ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - ServiceNow-AI/Apriel-5B-Base-Instruct
4
+ library_name: transformers
5
+ language:
6
+ - en
7
+ license: mit
8
+ tags:
9
+ - abliterated
10
+ - baukit-abliterated
11
+ ---
12
+
13
+ # Apriel-5B (Abliterated)
14
+ An abliterated version of below model, creating using the [universal Baukit abliteration notebook](https://www.kaggle.com/code/piotr25691/universal-abliteration-baukit).
15
+
16
+ # Apriel-5B
17
+
18
+ `/ˈɑː.pri.əl/`
19
+
20
+ ## Table of Contents
21
+
22
+ 1. [Model Summary](#model-summary)
23
+ 2. [Evaluation](#evaluation)
24
+ 3. [Intended Use](#intended-use)
25
+ 4. [Limitations](#limitations)
26
+ 5. [Security and Responsible Use](#security-and-responsible-use)
27
+ 6. [License](#license)
28
+ 7. [Citation](#citation)
29
+
30
+ ## Model Summary
31
+
32
+ Apriel is a family of models built for versatility, offering high throughput and efficiency across a wide range of tasks.
33
+
34
+ ### Apriel-5B-Base
35
+ Apriel-5B-base is a decoder-only transformer trained on 4.5T+ tokens of data. It is the first release in the Apriel model family, designed to support research on foundation models. Apriel-5B-base achieves strong performance across common benchmarks for models under 5B parameters.
36
+
37
+ ### Apriel-5B-Instruct
38
+ [Apriel-5B-Instruct](https://huggingface.co/ServiceNow-AI/Apriel-5B-Instruct) is built on top of [Apriel-5B-base](https://huggingface.co/ServiceNow-AI/Apriel-5B-base) using continual pretraining (CPT), supervised finetuning (SFT), and post-training alignment with DPO and RLVR.
39
+
40
+ Both CPT and SFT stages involved training multiple domain-biased variants with overlapping datasets (e.g., instruction, code, math). These were then merged to form a more general-purpose model before alignment. The final model is aligned for instruction following, reasoning, and safety-aware dialogue.
41
+
42
+ <img src="https://huggingface.co/ServiceNow-AI/Apriel-4.8B-base/resolve/main/eval_vs_latency.png" alt="graph" width="400"/>
43
+
44
+ The y-axis shows average downstream benchmark scores. Throughput (x-axis) was measured using [vLLM](https://github.com/vllm-project/vllm) with batch size 8, 256 input tokens, and 32 output tokens.
45
+
46
+ ### How to Use
47
+
48
+ ```bash
49
+ pip install transformers
50
+ ```
51
+
52
+ #### Running the Base model
53
+ ```python
54
+ import torch
55
+ from transformers import AutoModelForCausalLM, AutoTokenizer
56
+
57
+ checkpoint = "ServiceNow-AI/Apriel-5B-Base"
58
+ device = "cuda" # or "cpu"
59
+
60
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
61
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16).to(device)
62
+
63
+ inputs = tokenizer.encode("Snow is", return_tensors="pt").to(device)
64
+ outputs = model.generate(inputs)
65
+ print(tokenizer.decode(outputs[0]))
66
+ ```
67
+
68
+ ```bash
69
+ >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
70
+ Memory footprint: 9664.14 MB
71
+ ```
72
+
73
+ #### Running the Instruct model
74
+
75
+ ```python
76
+ import torch
77
+ from transformers import AutoModelForCausalLM, AutoTokenizer
78
+
79
+ checkpoint = "ServiceNow-AI/Apriel-5B-Instruct"
80
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
81
+ device = "cuda" if torch.cuda.is_available() else "cpu"
82
+
83
+ model = AutoModelForCausalLM.from_pretrained(
84
+ checkpoint,
85
+ torch_dtype=torch.bfloat16 if device == "cuda" else torch.float32
86
+ ).to(device)
87
+
88
+ messages = [
89
+ {"role": "system", "content": "You are a helpful AI assistant that provides accurate and concise information."},
90
+ {"role": "user", "content": "Tell me about artificial intelligence"}
91
+ ]
92
+
93
+ input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
94
+ inputs = tokenizer(input_text, return_tensors="pt").to(device)
95
+
96
+ generation_params = {
97
+ "max_new_tokens": 512,
98
+ "temperature": 0.2,
99
+ "top_p": 0.9,
100
+ "do_sample": True
101
+ }
102
+
103
+ outputs = model.generate(**inputs, **generation_params)
104
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
105
+ print(response)
106
+ ```
107
+
108
+ ### Chat Template
109
+
110
+ ```
111
+ <|system|>
112
+ System message here (optional)
113
+ <|end|>
114
+ <|user|>
115
+ User message here
116
+ <|end|>
117
+ <|assistant|>
118
+ Assistant response here
119
+ <|end|>
120
+ ```
121
+
122
+ If no system message is provided, the model inserts a blank system prompt to maintain format structure. The model supports structured interaction patterns, including tool calling and reasoning steps for more advanced workflows.
123
+
124
+ ## Evaluation
125
+
126
+ Evaluations were conducted using [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [evalchemy](https://github.com/mlfoundations/evalchemy).
127
+
128
+ ### Apriel-5B-Base
129
+
130
+ | Task Name | Apriel-5B-Base | OLMo-2-1124-7B | Llama-3.1-8B | Mistral-Nemo-Base-2407 |
131
+ |---------------------|------------------|----------------|--------------|-------------------------|
132
+ | **Average** | 58.7 | 58.71 | 61.72 | 66.01 |
133
+ | **ARC Challenge** | 56.7 | 62.7 | 58.2 | 62.9 |
134
+ | **ARC Easy** | 82.4 | 86.0 | 85.7 | 86.7 |
135
+ | **MMMLU** | 44.5 | 35.3 | 47.4 | 54.7 |
136
+ | **Global MMLU** | 57.4 | 52.4 | 61.1 | 68.4 |
137
+ | **GSM8k** | 64.2 | 63.2 | 54.8 | 58.5 |
138
+ | **HellaSwag** | 74.4 | 80.5 | 78.8 | 82.7 |
139
+ | **MUSR** | 39.1 | 39.6 | 38.0 | 39.9 |
140
+ | **MBPP** | 27.6 | 22.4 | 46.0 | 54.6 |
141
+ | **MMLU** | 61.3 | 63.9 | 66.0 | 69.6 |
142
+ | **PIQA** | 78.9 | 81.1 | 81.2 | 82.1 |
143
+
144
+
145
+
146
+ ### Apriel-5B-Instruct
147
+
148
+ | Task Name | Apriel-5B-Instruct | OLMo-2-1124-7B-Instruct | Llama-3.1-8B-Instruct | Mistral-Nemo-Instruct-2407 |
149
+ |--------------|--------------------|--------------------------|------------------------|----------------------------|
150
+ | **Average** | 49.64 | 43.91 | 52.60 | 48.63 |
151
+ | **ARC Challenge** | 59.04 | 61.45 | 64.25 | 66.38 |
152
+ | **GSM8k** | 80.36 | 79.68 | 82.63 | 77.63 |
153
+ | **Hellaswag** | 74.52 | 80.21 | 78.43 | 81.71 |
154
+ | **BBH** | 39.82 | 39.95 | 50.86 | 50.06 |
155
+ | **GPQA** | 28.36 | 27.85 | 29.19 | 29.45 |
156
+ | **IF Eval** | 80.78 | 72.64 | 79.67 | 62.85 |
157
+ | **MMLU Pro** | 29.19 | 26.57 | 37.74 | 35.09 |
158
+ | **MUSR** | 36.77 | 34.39 | 38.36 | 39.02 |
159
+ | **MBPP** | 45.80 | 28.00 | 59.00 | 57.60 |
160
+ | **TruthfulQA** | 56.09 | 56.46 | 55.05 | 57.69 |
161
+ | **Winogrande** | 62.35 | 65.35 | 67.01 | 70.01 |
162
+ | **Minerva Math** | 39.80 | 9.96 | 36.72 | 21.46 |
163
+ | **MATH500** | 53.00 | 31.4 | 45.80 | 34.40 |
164
+ | **AMC23** | 29.00 | 16.4 | 21.00 | 11.50 |
165
+ | **MixEval Hard** | 29.70 | 28.40 | 43.30 | 34.60 |
166
+
167
+ ## Intended Use
168
+
169
+ The Apriel family of models are designed for a variety of general-purpose instruction tasks, including:
170
+
171
+ - Question answering and information retrieval
172
+ - Content generation and summarization
173
+ - Code assistance and generation
174
+ - Logical reasoning and multi-step tasks
175
+ - Creative writing and ideation
176
+
177
+ They are **not intended** for use in safety-critical applications without human oversight or in scenarios requiring guaranteed factual accuracy.
178
+
179
+ ## Limitations
180
+
181
+ - **Factual accuracy:** May produce incorrect, misleading, or outdated content. Outputs should be verified before use in critical contexts.
182
+ - **Bias:** May reflect societal, cultural, or systemic biases present in training data.
183
+ - **Ethics:** Do not use the model to produce harmful, unlawful, or unethical content.
184
+ - **Language:** Strongest performance is in English. Output quality may degrade in underrepresented languages.
185
+ - **Critical use:** Not suitable for medical, legal, financial, or other high-risk applications without safeguards.
186
+
187
+ ## Security and Responsible Use
188
+
189
+ **Security Responsibilities:**
190
+ Deployers and users are strongly encouraged to align their security practices with established frameworks and regulatory guidelines such as the EU AI Act and the NIST AI Risk Management Framework (RMF).
191
+
192
+ **Guidelines for Deployers:**
193
+
194
+ - Regularly conduct robustness assessments to identify and mitigate adversarial inputs.
195
+ - Implement validation and filtering processes to prevent harmful or biased outputs.
196
+ - Continuously perform data privacy checks to guard against unintended data leaks.
197
+ - Document and communicate the model's limitations, intended usage, and known security risks to all end-users.
198
+ - Schedule periodic security reviews and updates to address emerging threats and vulnerabilities.
199
+
200
+ **Guidelines for Users:**
201
+
202
+ - Follow established security policies and usage guidelines provided by deployers.
203
+ - Protect and manage sensitive information when interacting with the model.
204
+ - Report anomalies, suspicious behavior, or unsafe outputs to deployers or developers.
205
+ - Maintain human oversight and apply judgment to mitigate potential security or ethical risks during interactions.
206
+
207
+ **Disclaimer:**
208
+ Users accept responsibility for securely deploying, managing, and using this open-source LLM. The model is provided "as-is," without explicit or implied warranty regarding security or fitness for any specific application or environment.
209
+
210
+ ## Pretraining
211
+
212
+ ### Model
213
+
214
+ - **Architecture:** Transformer decoder with grouped-query attention and YARN rotary embeddings
215
+ - **Tokens:** 4.5T
216
+ - **Precision:** bfloat16
217
+ - **Knowledge cutoff:** April 2024
218
+
219
+ ### Hardware
220
+
221
+ - **Compute:** 480 × H100 GPUs
222
+ - **GPU-hours:** ~91,000 H100-hours
223
+
224
+ ### Software
225
+
226
+ - **Training stack:** [Fast-LLM](https://github.com/ServiceNow/Fast-LLM)
227
+
228
+ ## License
229
+
230
+ MIT
231
+
232
+ ## Citation
233
+
234
+ ```bibtex
235
+ @misc{Apriel-small-language-models,
236
+ author = {Slam labs team},
237
+ title = {Apriel - a Family of performant small language models},
238
+ howpublished = {https://huggingface.co/ServiceNow-AI/Apriel-5B-Instruct},
239
+ publisher = {SLAM - ServiceNow Language Models Lab}
240
+ year = {2025}
241
+ }
242
+ ```