File size: 1,534 Bytes
20c3759 46e91e5 7cc4cdf 20c3759 46e91e5 20c3759 925ac6f 20c3759 9af4316 925ac6f 46e91e5 20c3759 46e91e5 20c3759 46e91e5 9af4316 925ac6f 46e91e5 20c3759 46e91e5 20c3759 46e91e5 fa706dd 20c3759 fa706dd 20c3759 fa706dd 46e91e5 fa706dd 46e91e5 20c3759 46e91e5 20c3759 925ac6f 46e91e5 9af4316 20c3759 46e91e5 20c3759 46e91e5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
language:
- en
license: mit
model-index:
- name: Nano-Llama
results: []
tags:
- pytorch
- causal-lm
- text-generation
- fineweb
datasets:
- HuggingFaceFW/fineweb
library_name: transformers
---
# Nano-Llama
A compact 67M parameter LLaMA-2-style language model pretrained on FineWeb dataset.
## Model Details
- **Architecture**: LLaMA-2-style transformer
- **Parameters**: 678M
- **Training Data**: FineWeb dataset (~100M tokens)
- **Context Length**: 1024 tokens
- **Layers**: 6
- **Hidden Size**: 768
- **Attention Heads**: 12
## Training
- **Dataset**: FineWeb (web-crawled high-quality text)
- **Tokens Trained**: ~110M tokens
- **Training Time**: ~6 hours on RTX 3090
- **Optimizer**: AdamW
- **Learning Rate**: 1e-4
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("vishesh-t27/Nano-Llama")
model = AutoModelForCausalLM.from_pretrained("vishesh-t27/Nano-Llama")
model.eval()
# Test prompt
text = "The future of artificial intelligence is"
inputs = tokenizer(text, return_tensors="pt")
# Generate text
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.8,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# Decode and print
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
## Limitations
- Small model size (67M parameters)
- Limited training data compared to larger models
- May generate repetitive or nonsensical text
## License
MIT License |