saching0071 commited on
Commit
6089c94
·
verified ·
1 Parent(s): eb484a8

Update main README with loading instructions

Browse files
Files changed (1) hide show
  1. README.md +28 -40
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- version: final
3
  family: smollm2-1.7b
4
  model_name: score0_only-600B
5
  license: mit
@@ -8,7 +8,7 @@ tags:
8
  - transformer
9
  - smollm2
10
  ---
11
- # SmolLM2 score0_only-600B (Version: final)
12
 
13
  ## Model Details
14
  - **Architecture:** SmolLM2
@@ -16,43 +16,31 @@ tags:
16
 
17
  ## Training Configuration
18
  ```yaml
19
- attention_logit_softcapping: null
20
- attention_scores_scalar: null
21
- attn_bias: false
22
- bias: false
23
- block_size: 8192
24
- final_logit_softcapping: null
25
- gelu_approximate: none
26
- head_size: 64
27
- hf_config:
28
- name: SmolLM2-1.7B
29
- org: HuggingFaceTB
30
- intermediate_size: 8192
31
- lm_head_bias: false
32
- mlp_class_name: LLaMAMLP
33
- n_embd: 2048
34
- n_expert: 0
35
- n_expert_per_token: 0
36
- n_head: 32
37
- n_layer: 24
38
- n_query_groups: 32
39
- name: SmolLM2-1.7B
40
- norm_class_name: RMSNorm
41
- norm_eps: 1.0e-05
42
- norm_qk: false
43
- padded_vocab_size: 49152
44
- padding_multiple: 512
45
- parallel_residual: false
46
- post_attention_norm: false
47
- post_mlp_norm: false
48
- rope_adjustments: null
49
- rope_base: 130000
50
- rope_condense_ratio: 1
51
- rotary_percentage: 1.0
52
- scale_embeddings: false
53
- shared_attention_norm: false
54
- sliding_window_layer_placing: null
55
- sliding_window_size: null
56
- vocab_size: 49152
57
 
58
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ version: main
3
  family: smollm2-1.7b
4
  model_name: score0_only-600B
5
  license: mit
 
8
  - transformer
9
  - smollm2
10
  ---
11
+ # SmolLM2 score0_only-600B (Version: main)
12
 
13
  ## Model Details
14
  - **Architecture:** SmolLM2
 
16
 
17
  ## Training Configuration
18
  ```yaml
19
+ optimizer:
20
+ class_path: torch.optim.AdamW
21
+ init_args:
22
+ lr: 0.0005
23
+ weight_decay: 0.01
24
+ precision: bf16-mixed
25
+ seed: 42
26
+ train:
27
+ global_batch_size: 1024
28
+ max_seq_length: 2048
29
+ max_tokens: 600000000000
30
+ micro_batch_size: 8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ```
33
+
34
+ ## Model Loading and Revision System
35
+
36
+ This repository hosts multiple revisions of the model.
37
+ To load a specific revision, use the `revision` parameter. For example:
38
+
39
+ ```python
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer
41
+
42
+ model = AutoModelForCausalLM.from_pretrained("locuslab/score0_only-600B", revision="final")
43
+ tokenizer = AutoTokenizer.from_pretrained("locuslab/score0_only-600B", revision="final")
44
+ ```
45
+
46
+ Replace `"final"` with the desired revision.