File size: 1,534 Bytes
20c3759
46e91e5
 
 
 
 
 
 
 
 
 
 
7cc4cdf
 
 
20c3759
 
46e91e5
20c3759
925ac6f
20c3759
 
 
9af4316
925ac6f
46e91e5
 
 
 
 
20c3759
46e91e5
20c3759
46e91e5
9af4316
925ac6f
46e91e5
 
20c3759
46e91e5
20c3759
46e91e5
fa706dd
20c3759
fa706dd
 
20c3759
fa706dd
 
 
46e91e5
 
fa706dd
 
 
 
 
 
 
 
 
 
 
 
 
46e91e5
20c3759
46e91e5
20c3759
925ac6f
46e91e5
9af4316
20c3759
46e91e5
20c3759
46e91e5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
language:
- en
license: mit
model-index:
- name: Nano-Llama
  results: []
tags:
- pytorch
- causal-lm
- text-generation
- fineweb
datasets:
- HuggingFaceFW/fineweb
library_name: transformers
---

# Nano-Llama

A compact 67M parameter LLaMA-2-style language model pretrained on FineWeb dataset.

## Model Details

- **Architecture**: LLaMA-2-style transformer
- **Parameters**: 678M
- **Training Data**: FineWeb dataset (~100M tokens)
- **Context Length**: 1024 tokens
- **Layers**: 6
- **Hidden Size**: 768
- **Attention Heads**: 12

## Training

- **Dataset**: FineWeb (web-crawled high-quality text)
- **Tokens Trained**: ~110M tokens
- **Training Time**: ~6 hours on RTX 3090
- **Optimizer**: AdamW
- **Learning Rate**: 1e-4

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("vishesh-t27/Nano-Llama")
model = AutoModelForCausalLM.from_pretrained("vishesh-t27/Nano-Llama")

model.eval()

# Test prompt
text = "The future of artificial intelligence is"
inputs = tokenizer(text, return_tensors="pt")

# Generate text
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.8,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

# Decode and print
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```

## Limitations

- Small model size (67M parameters)
- Limited training data compared to larger models
- May generate repetitive or nonsensical text 

## License

MIT License