Spaces:

Allex21
/

LT

Sleeping

File size: 7,618 Bytes

077989f

# 🛠️ Guia de Instalação - LoRA Trainer Funcional

## 📋 Pré-requisitos

### Sistema Operacional
- **Linux**: Ubuntu 20.04+ (recomendado)
- **Windows**: Windows 10/11 com WSL2
- **macOS**: macOS 12+ (limitado, sem GPU)

### Hardware
- **GPU**: NVIDIA com 6GB+ VRAM (obrigatório)
- **RAM**: 16GB+ (recomendado)
- **Armazenamento**: 20GB+ livres
- **CPU**: Qualquer CPU moderno

### Software
- **Python**: 3.8 a 3.11
- **CUDA**: 11.8 ou 12.1
- **Git**: Para clonar repositórios

## 🚀 Instalação Local

### Método 1: Instalação Completa

```bash
# 1. Clone o repositório
git clone <repository-url>
cd lora_trainer_hf

# 2. Crie ambiente virtual
python -m venv venv
source venv/bin/activate  # Linux/Mac
# ou
venv\Scripts\activate     # Windows

# 3. Instale dependências
pip install -r requirements.txt

# 4. Execute a aplicação
python app.py
```

### Método 2: Instalação com Conda

```bash
# 1. Crie ambiente conda
conda create -n lora_trainer python=3.10
conda activate lora_trainer

# 2. Instale PyTorch com CUDA
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# 3. Clone e instale
git clone <repository-url>
cd lora_trainer_hf
pip install -r requirements.txt

# 4. Execute
python app.py
```

## 🐳 Instalação com Docker

### Dockerfile Incluído

```bash
# 1. Build da imagem
docker build -t lora-trainer .

# 2. Execute o container
docker run -p 7860:7860 --gpus all lora-trainer
```

### Docker Compose

```yaml
version: '3.8'
services:
  lora-trainer:
    build: .
    ports:
      - "7860:7860"
    volumes:
      - ./data:/tmp/lora_training
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

## ☁️ Deploy no Hugging Face Spaces

### Configuração do Space

1. **Crie um novo Space**:
   - Vá para [huggingface.co/new-space](https://huggingface.co/new-space)
   - Escolha "Gradio" como SDK
   - Selecione hardware com GPU

2. **Configure o Space**:
   ```yaml
   # space_config.yml
   title: LoRA Trainer Funcional
   emoji: 🎨
   colorFrom: blue
   colorTo: purple
   sdk: gradio
   sdk_version: 4.0.0
   app_file: app.py
   pinned: false
   hardware: t4-medium  # ou a100-large
   ```

3. **Upload dos arquivos**:
   - `app.py`
   - `requirements.txt`
   - `README.md`
   - Pasta `sd-scripts/`

### Configuração de Hardware

| Hardware | VRAM | RAM | Recomendado Para |
|----------|------|-----|------------------|
| CPU Basic | 0GB | 16GB | Apenas teste |
| T4 Small | 16GB | 15GB | Projetos pequenos |
| T4 Medium | 16GB | 30GB | Projetos médios |
| A10G Small | 24GB | 30GB | Projetos grandes |
| A100 Large | 40GB | 80GB | Projetos profissionais |

## 🔧 Configuração de Dependências

### Dependências Principais

```txt
# Core ML
torch>=2.0.0
torchvision>=0.15.0
diffusers>=0.21.0
transformers>=4.25.0
accelerate>=0.20.0

# LoRA Training
safetensors>=0.3.0
huggingface-hub>=0.16.0
xformers>=0.0.20
bitsandbytes>=0.41.0

# Interface
gradio>=4.0.0

# Utilities
Pillow>=9.0.0
opencv-python>=4.7.0
numpy>=1.21.0
toml>=0.10.0
tqdm>=4.64.0
```

### Instalação Manual de Dependências

```bash
# PyTorch (ajuste para sua versão CUDA)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Diffusers e Transformers
pip install diffusers transformers accelerate

# Otimização
pip install xformers bitsandbytes

# Interface
pip install gradio

# Utilitários
pip install safetensors huggingface-hub Pillow opencv-python numpy toml tqdm
```

## 🐛 Solução de Problemas de Instalação

### Erro: CUDA não encontrado

```bash
# Verifique instalação CUDA
nvidia-smi
nvcc --version

# Reinstale PyTorch com CUDA
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```

### Erro: xFormers não compatível

```bash
# Instale versão específica
pip install xformers==0.0.20

# Ou compile do código fonte
pip install -U xformers --index-url https://download.pytorch.org/whl/cu118
```

### Erro: Memória insuficiente

```bash
# Aumente swap (Linux)
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Configure variáveis de ambiente
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
```

### Erro: Dependências conflitantes

```bash
# Limpe cache pip
pip cache purge

# Crie ambiente limpo
python -m venv fresh_env
source fresh_env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```

## 🔒 Configuração de Segurança

### Variáveis de Ambiente

```bash
# .env
HUGGINGFACE_TOKEN=your_token_here
WANDB_API_KEY=your_wandb_key
CUDA_VISIBLE_DEVICES=0
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
```

### Limitações de Recursos

```python
# No app.py
import resource

# Limite de memória (8GB)
resource.setrlimit(resource.RLIMIT_AS, (8*1024*1024*1024, -1))

# Limite de processos
resource.setrlimit(resource.RLIMIT_NPROC, (100, -1))
```

## 📊 Verificação da Instalação

### Script de Teste

```python
# test_installation.py
import torch
import diffusers
import transformers
import gradio as gr
import safetensors

print("✅ Verificando instalação...")
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA disponível: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'Não disponível'}")
print(f"Diffusers: {diffusers.__version__}")
print(f"Transformers: {transformers.__version__}")
print(f"Gradio: {gr.__version__}")
print("✅ Instalação verificada!")
```

### Teste de GPU

```python
# test_gpu.py
import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
    x = torch.randn(1000, 1000).to(device)
    y = torch.randn(1000, 1000).to(device)
    z = torch.mm(x, y)
    print(f"✅ GPU funcionando: {torch.cuda.get_device_name(0)}")
    print(f"VRAM total: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
    print(f"VRAM livre: {torch.cuda.memory_reserved(0) / 1024**3:.1f} GB")
else:
    print("❌ GPU não disponível")
```

## 🚀 Otimização de Performance

### Configurações de Sistema

```bash
# Aumentar limites de arquivo
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf

# Otimizar scheduler
echo "performance" > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
```

### Configurações PyTorch

```python
# No início do app.py
import torch
torch.backends.cudnn.benchmark = True
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
```

## 📝 Logs e Monitoramento

### Configuração de Logs

```python
import logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('lora_trainer.log'),
        logging.StreamHandler()
    ]
)
```

### Monitoramento de Recursos

```bash
# Instalar htop e nvidia-ml-py
pip install nvidia-ml-py3 psutil

# Monitorar em tempo real
watch -n 1 nvidia-smi
```

## 🔄 Atualizações

### Atualizar Dependências

```bash
# Atualizar requirements
pip install --upgrade -r requirements.txt

# Atualizar kohya-ss
cd sd-scripts
git pull origin main
```

### Backup de Configurações

```bash
# Backup de modelos e configurações
tar -czf backup_$(date +%Y%m%d).tar.gz /tmp/lora_training/
```

---

**Nota**: Para suporte adicional, consulte a documentação oficial do kohya-ss e a comunidade Hugging Face.