# 🚀 ECG-FM API: Direct HF Loading Strategy

## **Overview**

This ECG-FM API uses a **Direct HF Loading Strategy** to work within Hugging Face Spaces' 1GB limit while maintaining full model performance.

## **🎯 The Problem**

- **ECG-FM Model Size**: ~1.09 GB
- **HF Spaces Free Limit**: 1 GB
- **Traditional Approach**: Store weights locally ❌ (exceeds limit)

## **💡 The Solution**

**Load the model directly from the official repository at runtime:**

```python
# Instead of storing weights locally
from huggingface_hub import hf_hub_download

# Download directly from official repo
checkpoint = hf_hub_download(
    repo_id="wanglab/ecg-fm",
    filename="mimic_iv_ecg_physionet_pretrained.pt"
)
```

## **✅ Benefits**

1. **No Local Storage**: Works within 1GB limit
2. **Always Updated**: Uses latest official weights
3. **Full Performance**: No quantization or compression
4. **Elegant Solution**: No model modification needed
5. **Scalable**: Clear upgrade path to Pro tier

## **🔧 How It Works**

### **Phase 1: Cold Start (First Request)**
```
User Request → Download Model (2-5 min) → Cache → Inference
```

### **Phase 2: Cached (Subsequent Requests)**
```
User Request → Load from Cache → Fast Inference
```

### **Phase 3: Space Sleep (After 15 min idle)**
```
Space Sleeps → Model Cleared → Next Request = Cold Start
```

## **📊 Performance Characteristics**

| Scenario | Time | Notes |
|----------|------|-------|
| **Cold Start** | 2-5 minutes | First request after deployment |
| **Cached** | 15-30 seconds | Normal inference time |
| **After Sleep** | 2-5 minutes | Space wakes up from idle |

## **🚀 Scaling Path**

### **Phase 1: Free Tier (Current)**
- ✅ **Working API** within 1GB limit
- ⚠️ **Slow cold start** (2-5 min)
- ⚠️ **CPU only** (15-30 sec inference)
- ⚠️ **Sleeps after 15 min** idle

### **Phase 2: Pro Tier ($9/month)**
- ✅ **GPU acceleration** (2-5 sec inference)
- ✅ **Always-on** (no sleep, no cold start)
- ✅ **50GB limit** (could store weights locally)

### **Phase 3: Production**
- ✅ **Dedicated endpoints** (always-on)
- ✅ **Custom infrastructure** (full control)
- ✅ **Load balancing** (multiple instances)

## **💾 Caching Strategy**

```python
# Persistent cache directory
cache_dir="/app/.cache/huggingface"

# Model will be cached here
# Survives container restarts
# Faster reloads after sleep
```

## **🔍 Technical Implementation**

### **Model Loading**
```python
def load_model():
    # Download from official repo
    ckpt_path = hf_hub_download(
        repo_id="wanglab/ecg-fm",
        filename="mimic_iv_ecg_physionet_pretrained.pt",
        cache_dir="/app/.cache/huggingface"
    )
    
    # Load with fairseq-signals
    model = build_model_from_checkpoint(ckpt_path)
    return model
```

### **Error Handling**
```python
try:
    model = load_model()
    model_loaded = True
except Exception as e:
    print(f"Model loading failed: {e}")
    model_loaded = False
    # API runs but inference fails
```

## **📋 API Endpoints**

- **`/`**: Root with strategy info
- **`/health`**: Health check with model status
- **`/info`**: Model information and strategy details
- **`/predict`**: ECG inference endpoint

## **🎯 Use Cases**

### **Perfect For:**
- ✅ **Testing & Development**
- ✅ **Demo & Prototyping**
- ✅ **Low-traffic APIs**
- ✅ **Research & Education**

### **Consider Pro Tier For:**
- ⚠️ **Production APIs**
- ⚠️ **High-traffic services**
- ⚠️ **Real-time applications**
- ⚠️ **Always-on requirements**

## **🚨 Limitations & Considerations**

1. **Cold Start Delay**: 2-5 minutes for first request
2. **Sleep Behavior**: Free tier sleeps after 15 min idle
3. **CPU Performance**: Slower than GPU (15-30 sec vs 2-5 sec)
4. **Network Dependency**: Requires internet for model download

## **🔮 Future Improvements**

1. **Model Quantization**: Reduce size for local storage
2. **Progressive Loading**: Load essential parts first
3. **Smart Caching**: Pre-load during idle time
4. **Hybrid Approach**: Cache + direct loading

## **📚 References**

- [Official ECG-FM Repository](https://huggingface.co/wanglab/ecg-fm)
- [HF Spaces Documentation](https://huggingface.co/docs/hub/spaces)
- [fairseq-signals Repository](https://github.com/Jwoo5/fairseq-signals)

---

**This strategy gives us a working ECG-FM API within HF Spaces constraints while maintaining a clear path to production deployment!** 🎉