mlopez6132 commited on
Commit
3ee5ebe
Β·
verified Β·
1 Parent(s): e3feec3

Upload README_free_H200.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README_free_H200.md +235 -0
README_free_H200.md ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ†“ Free H200 Training: Nano-Coder on Hugging Face
2
+
3
+ This guide shows you how to train a nano-coder model using **Hugging Face's free H200 GPU access** (4 minutes daily).
4
+
5
+ ## 🎯 What You Get
6
+
7
+ - **Free H200 GPU**: 4 minutes per day
8
+ - **No Credit Card Required**: Completely free
9
+ - **Easy Setup**: Just a few clicks
10
+ - **Model Sharing**: Automatic upload to HF Hub
11
+
12
+ ## πŸš€ Quick Start
13
+
14
+ ### Option 1: Hugging Face Space (Recommended)
15
+
16
+ 1. **Create HF Space:**
17
+ ```bash
18
+ huggingface-cli repo create nano-coder-free --type space
19
+ ```
20
+
21
+ 2. **Upload Files:**
22
+ - Upload all the Python files to your space
23
+ - Make sure `app.py` is in the root directory
24
+
25
+ 3. **Configure Space:**
26
+ - Set **Hardware**: H200 (free tier)
27
+ - Set **Python Version**: 3.9+
28
+ - Set **Requirements**: `requirements.txt`
29
+
30
+ 4. **Launch Training:**
31
+ - Go to your space URL
32
+ - Click "πŸš€ Start Free H200 Training"
33
+ - Wait for training to complete (3.5 minutes)
34
+
35
+ ### Option 2: Local Setup with HF Free Tier
36
+
37
+ 1. **Install Dependencies:**
38
+ ```bash
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ 2. **Set HF Token:**
43
+ ```bash
44
+ export HF_TOKEN="your_token_here"
45
+ ```
46
+
47
+ 3. **Run Free Training:**
48
+ ```bash
49
+ python hf_free_training.py
50
+ ```
51
+
52
+ ## πŸ“Š Model Configuration (Free Tier)
53
+
54
+ | Parameter | Free Tier | Full Model |
55
+ |-----------|-----------|------------|
56
+ | **Layers** | 6 | 12 |
57
+ | **Heads** | 6 | 12 |
58
+ | **Embedding** | 384 | 768 |
59
+ | **Context** | 512 | 1024 |
60
+ | **Parameters** | ~15M | ~124M |
61
+ | **Training Time** | 3.5 min | 2-4 hours |
62
+
63
+ ## ⏰ Time Management
64
+
65
+ - **Daily Limit**: 4 minutes of H200 time
66
+ - **Training Time**: 3.5 minutes (safe buffer)
67
+ - **Automatic Stop**: Script stops before time limit
68
+ - **Daily Reset**: New 4 minutes every day at midnight UTC
69
+
70
+ ## 🎨 Features
71
+
72
+ ### Training Features
73
+ - βœ… **Automatic Time Tracking**: Stops before limit
74
+ - βœ… **Frequent Checkpoints**: Every 200 iterations
75
+ - βœ… **HF Hub Upload**: Models saved automatically
76
+ - βœ… **Wandb Logging**: Real-time metrics
77
+ - βœ… **Progress Monitoring**: Time remaining display
78
+
79
+ ### Generation Features
80
+ - βœ… **Interactive UI**: Gradio interface
81
+ - βœ… **Custom Prompts**: Any Python code start
82
+ - βœ… **Adjustable Parameters**: Temperature, tokens
83
+ - βœ… **Real-time Generation**: Instant results
84
+
85
+ ## πŸ“ File Structure
86
+
87
+ ```
88
+ nano-coder-free/
89
+ β”œβ”€β”€ app.py # HF Space app
90
+ β”œβ”€β”€ hf_free_training.py # Free H200 training script
91
+ β”œβ”€β”€ prepare_code_dataset.py # Dataset preparation
92
+ β”œβ”€β”€ sample_nano_coder.py # Code generation
93
+ β”œβ”€β”€ requirements.txt # Dependencies
94
+ β”œβ”€β”€ model.py # nanoGPT model
95
+ β”œβ”€β”€ configurator.py # Configuration
96
+ └── README_free_H200.md # This file
97
+ ```
98
+
99
+ ## πŸ”§ Customization
100
+
101
+ ### Adjust Training Parameters
102
+
103
+ Edit `hf_free_training.py`:
104
+
105
+ ```python
106
+ # Model size (smaller = faster training)
107
+ n_layer = 4 # Even smaller
108
+ n_head = 4 # Even smaller
109
+ n_embd = 256 # Even smaller
110
+
111
+ # Training time (be conservative)
112
+ MAX_TRAINING_TIME = 3.0 * 60 # 3 minutes
113
+
114
+ # Batch size (larger = faster)
115
+ batch_size = 128 # If you have memory
116
+ ```
117
+
118
+ ### Change Dataset
119
+
120
+ ```python
121
+ # In prepare_code_dataset.py
122
+ dataset = load_dataset("your-dataset") # Your own dataset
123
+ ```
124
+
125
+ ## πŸ“ˆ Expected Results
126
+
127
+ After 3.5 minutes of training on H200:
128
+
129
+ - **Training Loss**: ~2.5-3.0
130
+ - **Validation Loss**: ~2.8-3.3
131
+ - **Model Size**: ~15MB
132
+ - **Code Quality**: Basic Python functions
133
+ - **Iterations**: ~500-1000
134
+
135
+ ## 🎯 Use Cases
136
+
137
+ ### Perfect For:
138
+ - βœ… **Learning**: Understand nanoGPT training
139
+ - βœ… **Prototyping**: Test ideas quickly
140
+ - βœ… **Experiments**: Try different configurations
141
+ - βœ… **Small Models**: Code generation demos
142
+
143
+ ### Not Suitable For:
144
+ - ❌ **Production**: Too small for real use
145
+ - ❌ **Large Models**: Limited by time/parameters
146
+ - ❌ **Long Training**: 4-minute daily limit
147
+
148
+ ## πŸ”„ Daily Workflow
149
+
150
+ 1. **Morning**: Check if you can train today
151
+ 2. **Prepare**: Have your dataset ready
152
+ 3. **Train**: Run 3.5-minute training session
153
+ 4. **Test**: Generate some code samples
154
+ 5. **Share**: Upload to HF Hub if good
155
+ 6. **Wait**: Come back tomorrow for more training
156
+
157
+ ## 🚨 Troubleshooting
158
+
159
+ ### Common Issues
160
+
161
+ 1. **"Daily limit reached"**
162
+ - Wait until tomorrow
163
+ - Check your timezone
164
+
165
+ 2. **"No GPU available"**
166
+ - H200 might be busy
167
+ - Try again in a few minutes
168
+
169
+ 3. **"Training too slow"**
170
+ - Reduce model size
171
+ - Increase batch size
172
+ - Use smaller context
173
+
174
+ 4. **"Out of memory"**
175
+ - Reduce batch_size
176
+ - Reduce block_size
177
+ - Reduce model size
178
+
179
+ ### Performance Tips
180
+
181
+ - **Batch Size**: Use largest that fits in memory
182
+ - **Context Length**: 512 is good for free tier
183
+ - **Model Size**: 6 layers is optimal
184
+ - **Learning Rate**: 1e-3 for fast convergence
185
+
186
+ ## πŸ“Š Monitoring
187
+
188
+ ### Wandb Dashboard
189
+ - Real-time loss curves
190
+ - Training metrics
191
+ - Model performance
192
+
193
+ ### HF Hub
194
+ - Model checkpoints
195
+ - Training logs
196
+ - Generated samples
197
+
198
+ ### Local Files
199
+ - `out-nano-coder-free/ckpt.pt` - Latest model
200
+ - `daily_limit_YYYY-MM-DD.txt` - Usage tracking
201
+
202
+ ## πŸŽ‰ Success Stories
203
+
204
+ Users have achieved:
205
+ - βœ… Basic Python function generation
206
+ - βœ… Simple class definitions
207
+ - βœ… List comprehensions
208
+ - βœ… Error handling patterns
209
+ - βœ… Docstring generation
210
+
211
+ ## πŸ”— Resources
212
+
213
+ - [Hugging Face Spaces](https://huggingface.co/spaces)
214
+ - [Free GPU Access](https://huggingface.co/docs/hub/spaces-sdks-docker-gpu)
215
+ - [NanoGPT Original](https://github.com/karpathy/nanoGPT)
216
+ - [Python Code Dataset](https://huggingface.co/datasets/flytech/python-codes-25k)
217
+
218
+ ## 🀝 Contributing
219
+
220
+ Want to improve the free H200 setup?
221
+
222
+ 1. **Optimize Model**: Make it train faster
223
+ 2. **Better UI**: Improve the Gradio interface
224
+ 3. **More Datasets**: Support other code datasets
225
+ 4. **Documentation**: Help others get started
226
+
227
+ ## πŸ“ License
228
+
229
+ This project follows the same license as the original nanoGPT repository.
230
+
231
+ ---
232
+
233
+ **Happy Free H200 Training! πŸš€**
234
+
235
+ Remember: 4 minutes a day keeps the AI doctor away! πŸ˜„