# 🚀 ECG-FM API: Direct HF Loading Strategy ## **Overview** This ECG-FM API uses a **Direct HF Loading Strategy** to work within Hugging Face Spaces' 1GB limit while maintaining full model performance. ## **🎯 The Problem** - **ECG-FM Model Size**: ~1.09 GB - **HF Spaces Free Limit**: 1 GB - **Traditional Approach**: Store weights locally ❌ (exceeds limit) ## **💡 The Solution** **Load the model directly from the official repository at runtime:** ```python # Instead of storing weights locally from huggingface_hub import hf_hub_download # Download directly from official repo checkpoint = hf_hub_download( repo_id="wanglab/ecg-fm", filename="mimic_iv_ecg_physionet_pretrained.pt" ) ``` ## **✅ Benefits** 1. **No Local Storage**: Works within 1GB limit 2. **Always Updated**: Uses latest official weights 3. **Full Performance**: No quantization or compression 4. **Elegant Solution**: No model modification needed 5. **Scalable**: Clear upgrade path to Pro tier ## **🔧 How It Works** ### **Phase 1: Cold Start (First Request)** ``` User Request → Download Model (2-5 min) → Cache → Inference ``` ### **Phase 2: Cached (Subsequent Requests)** ``` User Request → Load from Cache → Fast Inference ``` ### **Phase 3: Space Sleep (After 15 min idle)** ``` Space Sleeps → Model Cleared → Next Request = Cold Start ``` ## **📊 Performance Characteristics** | Scenario | Time | Notes | |----------|------|-------| | **Cold Start** | 2-5 minutes | First request after deployment | | **Cached** | 15-30 seconds | Normal inference time | | **After Sleep** | 2-5 minutes | Space wakes up from idle | ## **🚀 Scaling Path** ### **Phase 1: Free Tier (Current)** - ✅ **Working API** within 1GB limit - ⚠️ **Slow cold start** (2-5 min) - ⚠️ **CPU only** (15-30 sec inference) - ⚠️ **Sleeps after 15 min** idle ### **Phase 2: Pro Tier ($9/month)** - ✅ **GPU acceleration** (2-5 sec inference) - ✅ **Always-on** (no sleep, no cold start) - ✅ **50GB limit** (could store weights locally) ### **Phase 3: Production** - ✅ **Dedicated endpoints** (always-on) - ✅ **Custom infrastructure** (full control) - ✅ **Load balancing** (multiple instances) ## **💾 Caching Strategy** ```python # Persistent cache directory cache_dir="/app/.cache/huggingface" # Model will be cached here # Survives container restarts # Faster reloads after sleep ``` ## **🔍 Technical Implementation** ### **Model Loading** ```python def load_model(): # Download from official repo ckpt_path = hf_hub_download( repo_id="wanglab/ecg-fm", filename="mimic_iv_ecg_physionet_pretrained.pt", cache_dir="/app/.cache/huggingface" ) # Load with fairseq-signals model = build_model_from_checkpoint(ckpt_path) return model ``` ### **Error Handling** ```python try: model = load_model() model_loaded = True except Exception as e: print(f"Model loading failed: {e}") model_loaded = False # API runs but inference fails ``` ## **📋 API Endpoints** - **`/`**: Root with strategy info - **`/health`**: Health check with model status - **`/info`**: Model information and strategy details - **`/predict`**: ECG inference endpoint ## **🎯 Use Cases** ### **Perfect For:** - ✅ **Testing & Development** - ✅ **Demo & Prototyping** - ✅ **Low-traffic APIs** - ✅ **Research & Education** ### **Consider Pro Tier For:** - ⚠️ **Production APIs** - ⚠️ **High-traffic services** - ⚠️ **Real-time applications** - ⚠️ **Always-on requirements** ## **🚨 Limitations & Considerations** 1. **Cold Start Delay**: 2-5 minutes for first request 2. **Sleep Behavior**: Free tier sleeps after 15 min idle 3. **CPU Performance**: Slower than GPU (15-30 sec vs 2-5 sec) 4. **Network Dependency**: Requires internet for model download ## **🔮 Future Improvements** 1. **Model Quantization**: Reduce size for local storage 2. **Progressive Loading**: Load essential parts first 3. **Smart Caching**: Pre-load during idle time 4. **Hybrid Approach**: Cache + direct loading ## **📚 References** - [Official ECG-FM Repository](https://huggingface.co/wanglab/ecg-fm) - [HF Spaces Documentation](https://huggingface.co/docs/hub/spaces) - [fairseq-signals Repository](https://github.com/Jwoo5/fairseq-signals) --- **This strategy gives us a working ECG-FM API within HF Spaces constraints while maintaining a clear path to production deployment!** 🎉