Update README.md
Browse files
README.md
CHANGED
@@ -17,12 +17,12 @@ library_name: transformers
|
|
17 |
|
18 |
# Nano-Llama
|
19 |
|
20 |
-
A compact
|
21 |
|
22 |
## Model Details
|
23 |
|
24 |
- **Architecture**: LLaMA-2-style transformer
|
25 |
-
- **Parameters**:
|
26 |
- **Training Data**: FineWeb dataset (~100M tokens)
|
27 |
- **Context Length**: 1024 tokens
|
28 |
- **Layers**: 6
|
@@ -33,7 +33,7 @@ A compact 42M parameter LLaMA-2-style language model pretrained on FineWeb datas
|
|
33 |
|
34 |
- **Dataset**: FineWeb (web-crawled high-quality text)
|
35 |
- **Tokens Trained**: ~110M tokens
|
36 |
-
- **Training Time**: ~
|
37 |
- **Optimizer**: AdamW
|
38 |
- **Learning Rate**: 1e-4
|
39 |
|
@@ -67,10 +67,9 @@ print(generated_text)
|
|
67 |
|
68 |
## Limitations
|
69 |
|
70 |
-
- Small model size (
|
71 |
- Limited training data compared to larger models
|
72 |
- May generate repetitive or nonsensical text
|
73 |
-
- Best suited for short text generation tasks
|
74 |
|
75 |
## License
|
76 |
|
|
|
17 |
|
18 |
# Nano-Llama
|
19 |
|
20 |
+
A compact 67M parameter LLaMA-2-style language model pretrained on FineWeb dataset.
|
21 |
|
22 |
## Model Details
|
23 |
|
24 |
- **Architecture**: LLaMA-2-style transformer
|
25 |
+
- **Parameters**: 678M
|
26 |
- **Training Data**: FineWeb dataset (~100M tokens)
|
27 |
- **Context Length**: 1024 tokens
|
28 |
- **Layers**: 6
|
|
|
33 |
|
34 |
- **Dataset**: FineWeb (web-crawled high-quality text)
|
35 |
- **Tokens Trained**: ~110M tokens
|
36 |
+
- **Training Time**: ~6 hours on RTX 3090
|
37 |
- **Optimizer**: AdamW
|
38 |
- **Learning Rate**: 1e-4
|
39 |
|
|
|
67 |
|
68 |
## Limitations
|
69 |
|
70 |
+
- Small model size (67M parameters)
|
71 |
- Limited training data compared to larger models
|
72 |
- May generate repetitive or nonsensical text
|
|
|
73 |
|
74 |
## License
|
75 |
|