YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Evaluation System

A comprehensive system for evaluating local language models using standardized prompts and generating detailed markdown reports.

Files

  • inference.py - Simple inference function: Text in, text out
  • eval_prompts.json - A set of prompts to run evaluation on models
  • run_eval.py - Uses inference.py & eval_prompts.json to run evaluation on all local models and save responses to markdown
  • requirements.txt - Python dependencies
  • myenv/ - Python virtual environment

Quick Start

  1. Activate the virtual environment:

    source myenv/bin/activate
    
  2. Install dependencies (if not already installed):

    pip install -r requirements.txt
    
  3. Run evaluation on all models:

    python run_eval.py
    
  4. Test with a single model:

    python inference.py
    

Features

Inference System (inference.py)

  • ModelInference class: Load and run inference on local Hugging Face models
  • Memory management: Automatic model loading/unloading
  • GPU/CPU support: Automatically uses GPU if available, falls back to CPU
  • Model discovery: Automatically finds all local model directories

Evaluation Prompts (eval_prompts.json)

  • 12 diverse prompts covering:
    • Reasoning & logic
    • Mathematics & algebra
    • Coding & technical explanations
    • General knowledge & facts
    • Creative writing
    • Instruction following
    • Common sense reasoning
    • Text summarization

Evaluation Runner (run_eval.py)

  • Batch processing: Evaluates all local models automatically
  • Progress tracking: Shows real-time progress with emojis and timing
  • Error handling: Gracefully handles model loading failures
  • Markdown reports: Generates comprehensive evaluation reports
  • Memory efficient: Unloads models between evaluations

Model Requirements

Models should be in Hugging Face format with these files:

  • config.json
  • model.safetensors
  • tokenizer.json
  • vocab.json
  • Other standard HF files

Example Usage

from inference import ModelInference, get_local_models

# Find all models
models = get_local_models()
print(f"Found {len(models)} models")

# Quick inference
from inference import simple_inference
result = simple_inference(models[0], "What is AI?", max_length=256)
print(result)

# Advanced usage
inference = ModelInference(models[0])
if inference.load_model():
    response = inference.generate_text("Explain Python", max_length=512, temperature=0.7)
    print(response)
    inference.unload_model()

Output

The evaluation generates a markdown report (evaluation_results.md) with:

  • Summary table: Model performance overview
  • Detailed results: Full responses organized by category
  • Timing information: Evaluation duration per model
  • Error reporting: Any issues encountered

System Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • Transformers 4.30+
  • 4GB+ RAM (varies by model size)
  • Optional: CUDA-compatible GPU for faster inference

Notes

  • Evaluation can take significant time with many models (136 models detected)
  • Models are evaluated sequentially to manage memory usage
  • Small delays between prompts prevent overheating
  • Progress is shown with real-time updates
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ThomasTheMaker/ReplaceMe-Experiments