Medical LLM Base Model (10M Parameters)

Model Description

This is a 10 million parameter GPT-2 style language model specifically trained for medical dialogue generation. The model is designed as a base model for fine-tuning on specialized medical tasks, particularly sensor interpretation for ESP32 edge deployment.

Model Details

  • Model Type: Causal Language Model (GPT-2 architecture)
  • Parameters: 10,126,336
  • Architecture: 10 layers, 256 hidden dimensions, 8 attention heads
  • Vocabulary: 8,192 custom medical tokens (SentencePiece BPE)
  • Context Length: 512 tokens
  • Training Data: 6,788 medical dialogues from professional sources

Performance

  • Validation Perplexity: 4.40
  • Training Loss: Converged to ~1.48
  • Success Rate: 100% response generation

Intended Use

Primary Use Case

  • Base model for medical dialogue fine-tuning
  • ESP32 sensor interpretation (temperature, heart rate, SpO2)
  • Edge deployment on resource-constrained devices

Fine-tuning Recommendations

  • Learning rate: 1e-4 (lower than base training)
  • Epochs: 2-3 (fewer epochs needed)
  • Batch size: 8
  • Target applications: Sensor data interpretation, medical assessment

Model Architecture

GPT2Config(
    vocab_size=8192,
    n_positions=512,
    n_embd=256,
    n_layer=10,
    n_head=8,
    n_inner=1024
)

Usage

Loading the Model

from transformers import GPT2LMHeadModel, GPT2Config
import torch

# Load configuration
config = GPT2Config.from_pretrained("OussamaEL/medical-llm-10m-base")

# Load model
model = GPT2LMHeadModel.from_pretrained("OussamaEL/medical-llm-10m-base")

# For ESP32 sensor interpretation fine-tuning
# Use the provided fine-tuning scripts with sensor datasets

Example Fine-tuning for Sensor Data

# Input format for sensor interpretation:
# "<bos>Sensors: Temp 38.5°C, HR 95 bpm, SpO2 96% ||| Assessment: [response]<eos>"

# Expected output:
# "Elevated temperature with normal heart rate. Possible mild infection."

Training Details

  • Training Data: Medical dialogue dataset (iCliniq professional responses)
  • Training Epochs: 5
  • Learning Rate: 5e-4 with cosine scheduling
  • Batch Size: 16 (effective)
  • Hardware: CUDA-enabled GPU
  • Training Time: ~2 hours

Limitations

  • Specialized vocabulary: Optimized for medical terminology
  • Context length: Limited to 512 tokens
  • Domain-specific: Best performance on medical dialogue tasks
  • Size constraints: Designed for edge deployment, may lack capacity for complex reasoning

Ethical Considerations

  • Medical advice: This model should NOT be used for direct medical diagnosis
  • Professional oversight: Always require medical professional validation
  • Edge deployment: Suitable for preliminary assessment only
  • Data privacy: Trained on anonymized medical dialogues

Technical Specifications

  • Model Size: ~38.6 MB (unquantized)
  • Deployment Size: ~10-15 MB (with quantization)
  • Memory Requirements: 50-100 MB RAM
  • Inference Speed: <1 second per assessment
  • Target Hardware: ESP32-S3, similar microcontrollers

Citation

If you use this model, please cite:

@model{medical_llm_10m,
  title={Medical LLM Base Model for ESP32 Deployment},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/OussamaEL/medical-llm-10m-base}
}

License

MIT License - See LICENSE file for details.

Downloads last month
123
Safetensors
Model size
10.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train OussamaEL/medical-llm-10m-base

Evaluation results