# 🆓 Free H200 Training: Nano-Coder on Hugging Face This guide shows you how to train a nano-coder model using **Hugging Face's free H200 GPU access** (4 minutes daily). ## 🎯 What You Get - **Free H200 GPU**: 4 minutes per day - **No Credit Card Required**: Completely free - **Easy Setup**: Just a few clicks - **Model Sharing**: Automatic upload to HF Hub ## 🚀 Quick Start ### Option 1: Hugging Face Space (Recommended) 1. **Create HF Space:** ```bash huggingface-cli repo create nano-coder-free --type space ``` 2. **Upload Files:** - Upload all the Python files to your space - Make sure `app.py` is in the root directory 3. **Configure Space:** - Set **Hardware**: H200 (free tier) - Set **Python Version**: 3.9+ - Set **Requirements**: `requirements.txt` 4. **Launch Training:** - Go to your space URL - Click "🚀 Start Free H200 Training" - Wait for training to complete (3.5 minutes) ### Option 2: Local Setup with HF Free Tier 1. **Install Dependencies:** ```bash pip install -r requirements.txt ``` 2. **Set HF Token:** ```bash export HF_TOKEN="your_token_here" ``` 3. **Run Free Training:** ```bash python hf_free_training.py ``` ## 📊 Model Configuration (Free Tier) | Parameter | Free Tier | Full Model | |-----------|-----------|------------| | **Layers** | 6 | 12 | | **Heads** | 6 | 12 | | **Embedding** | 384 | 768 | | **Context** | 512 | 1024 | | **Parameters** | ~15M | ~124M | | **Training Time** | 3.5 min | 2-4 hours | ## ⏰ Time Management - **Daily Limit**: 4 minutes of H200 time - **Training Time**: 3.5 minutes (safe buffer) - **Automatic Stop**: Script stops before time limit - **Daily Reset**: New 4 minutes every day at midnight UTC ## 🎨 Features ### Training Features - ✅ **Automatic Time Tracking**: Stops before limit - ✅ **Frequent Checkpoints**: Every 200 iterations - ✅ **HF Hub Upload**: Models saved automatically - ✅ **Wandb Logging**: Real-time metrics - ✅ **Progress Monitoring**: Time remaining display ### Generation Features - ✅ **Interactive UI**: Gradio interface - ✅ **Custom Prompts**: Any Python code start - ✅ **Adjustable Parameters**: Temperature, tokens - ✅ **Real-time Generation**: Instant results ## 📁 File Structure ``` nano-coder-free/ ├── app.py # HF Space app ├── hf_free_training.py # Free H200 training script ├── prepare_code_dataset.py # Dataset preparation ├── sample_nano_coder.py # Code generation ├── requirements.txt # Dependencies ├── model.py # nanoGPT model ├── configurator.py # Configuration └── README_free_H200.md # This file ``` ## 🔧 Customization ### Adjust Training Parameters Edit `hf_free_training.py`: ```python # Model size (smaller = faster training) n_layer = 4 # Even smaller n_head = 4 # Even smaller n_embd = 256 # Even smaller # Training time (be conservative) MAX_TRAINING_TIME = 3.0 * 60 # 3 minutes # Batch size (larger = faster) batch_size = 128 # If you have memory ``` ### Change Dataset ```python # In prepare_code_dataset.py dataset = load_dataset("your-dataset") # Your own dataset ``` ## 📈 Expected Results After 3.5 minutes of training on H200: - **Training Loss**: ~2.5-3.0 - **Validation Loss**: ~2.8-3.3 - **Model Size**: ~15MB - **Code Quality**: Basic Python functions - **Iterations**: ~500-1000 ## 🎯 Use Cases ### Perfect For: - ✅ **Learning**: Understand nanoGPT training - ✅ **Prototyping**: Test ideas quickly - ✅ **Experiments**: Try different configurations - ✅ **Small Models**: Code generation demos ### Not Suitable For: - ❌ **Production**: Too small for real use - ❌ **Large Models**: Limited by time/parameters - ❌ **Long Training**: 4-minute daily limit ## 🔄 Daily Workflow 1. **Morning**: Check if you can train today 2. **Prepare**: Have your dataset ready 3. **Train**: Run 3.5-minute training session 4. **Test**: Generate some code samples 5. **Share**: Upload to HF Hub if good 6. **Wait**: Come back tomorrow for more training ## 🚨 Troubleshooting ### Common Issues 1. **"Daily limit reached"** - Wait until tomorrow - Check your timezone 2. **"No GPU available"** - H200 might be busy - Try again in a few minutes 3. **"Training too slow"** - Reduce model size - Increase batch size - Use smaller context 4. **"Out of memory"** - Reduce batch_size - Reduce block_size - Reduce model size ### Performance Tips - **Batch Size**: Use largest that fits in memory - **Context Length**: 512 is good for free tier - **Model Size**: 6 layers is optimal - **Learning Rate**: 1e-3 for fast convergence ## 📊 Monitoring ### Wandb Dashboard - Real-time loss curves - Training metrics - Model performance ### HF Hub - Model checkpoints - Training logs - Generated samples ### Local Files - `out-nano-coder-free/ckpt.pt` - Latest model - `daily_limit_YYYY-MM-DD.txt` - Usage tracking ## 🎉 Success Stories Users have achieved: - ✅ Basic Python function generation - ✅ Simple class definitions - ✅ List comprehensions - ✅ Error handling patterns - ✅ Docstring generation ## 🔗 Resources - [Hugging Face Spaces](https://huggingface.co/spaces) - [Free GPU Access](https://huggingface.co/docs/hub/spaces-sdks-docker-gpu) - [NanoGPT Original](https://github.com/karpathy/nanoGPT) - [Python Code Dataset](https://huggingface.co/datasets/flytech/python-codes-25k) ## 🤝 Contributing Want to improve the free H200 setup? 1. **Optimize Model**: Make it train faster 2. **Better UI**: Improve the Gradio interface 3. **More Datasets**: Support other code datasets 4. **Documentation**: Help others get started ## 📝 License This project follows the same license as the original nanoGPT repository. --- **Happy Free H200 Training! 🚀** Remember: 4 minutes a day keeps the AI doctor away! 😄