vishesh-t27 commited on
Commit
925ac6f
·
verified ·
1 Parent(s): fa706dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -17,12 +17,12 @@ library_name: transformers
17
 
18
  # Nano-Llama
19
 
20
- A compact 42M parameter LLaMA-2-style language model pretrained on FineWeb dataset.
21
 
22
  ## Model Details
23
 
24
  - **Architecture**: LLaMA-2-style transformer
25
- - **Parameters**: 42.48M
26
  - **Training Data**: FineWeb dataset (~100M tokens)
27
  - **Context Length**: 1024 tokens
28
  - **Layers**: 6
@@ -33,7 +33,7 @@ A compact 42M parameter LLaMA-2-style language model pretrained on FineWeb datas
33
 
34
  - **Dataset**: FineWeb (web-crawled high-quality text)
35
  - **Tokens Trained**: ~110M tokens
36
- - **Training Time**: ~8 hours on RTX 3090
37
  - **Optimizer**: AdamW
38
  - **Learning Rate**: 1e-4
39
 
@@ -67,10 +67,9 @@ print(generated_text)
67
 
68
  ## Limitations
69
 
70
- - Small model size (42M parameters)
71
  - Limited training data compared to larger models
72
  - May generate repetitive or nonsensical text
73
- - Best suited for short text generation tasks
74
 
75
  ## License
76
 
 
17
 
18
  # Nano-Llama
19
 
20
+ A compact 67M parameter LLaMA-2-style language model pretrained on FineWeb dataset.
21
 
22
  ## Model Details
23
 
24
  - **Architecture**: LLaMA-2-style transformer
25
+ - **Parameters**: 678M
26
  - **Training Data**: FineWeb dataset (~100M tokens)
27
  - **Context Length**: 1024 tokens
28
  - **Layers**: 6
 
33
 
34
  - **Dataset**: FineWeb (web-crawled high-quality text)
35
  - **Tokens Trained**: ~110M tokens
36
+ - **Training Time**: ~6 hours on RTX 3090
37
  - **Optimizer**: AdamW
38
  - **Learning Rate**: 1e-4
39
 
 
67
 
68
  ## Limitations
69
 
70
+ - Small model size (67M parameters)
71
  - Limited training data compared to larger models
72
  - May generate repetitive or nonsensical text
 
73
 
74
  ## License
75