YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Evaluation System

A comprehensive system for evaluating local language models using standardized prompts and generating detailed markdown reports.

Files

inference.py - Simple inference function: Text in, text out
eval_prompts.json - A set of prompts to run evaluation on models
run_eval.py - Uses inference.py & eval_prompts.json to run evaluation on all local models and save responses to markdown
requirements.txt - Python dependencies
myenv/ - Python virtual environment

Quick Start

Activate the virtual environment:
```
source myenv/bin/activate
```
Install dependencies (if not already installed):
```
pip install -r requirements.txt
```
Run evaluation on all models:
```
python run_eval.py
```
Test with a single model:
```
python inference.py
```

Features

Inference System (`inference.py`)

ModelInference class: Load and run inference on local Hugging Face models
Memory management: Automatic model loading/unloading
GPU/CPU support: Automatically uses GPU if available, falls back to CPU
Model discovery: Automatically finds all local model directories

Evaluation Prompts (`eval_prompts.json`)

12 diverse prompts covering:
- Reasoning & logic
- Mathematics & algebra
- Coding & technical explanations
- General knowledge & facts
- Creative writing
- Instruction following
- Common sense reasoning
- Text summarization

Evaluation Runner (`run_eval.py`)

Batch processing: Evaluates all local models automatically
Progress tracking: Shows real-time progress with emojis and timing
Error handling: Gracefully handles model loading failures
Markdown reports: Generates comprehensive evaluation reports
Memory efficient: Unloads models between evaluations

Model Requirements

Models should be in Hugging Face format with these files:

config.json
model.safetensors
tokenizer.json
vocab.json
Other standard HF files

Example Usage

from inference import ModelInference, get_local_models

# Find all models
models = get_local_models()
print(f"Found {len(models)} models")

# Quick inference
from inference import simple_inference
result = simple_inference(models[0], "What is AI?", max_length=256)
print(result)

# Advanced usage
inference = ModelInference(models[0])
if inference.load_model():
    response = inference.generate_text("Explain Python", max_length=512, temperature=0.7)
    print(response)
    inference.unload_model()

Output

The evaluation generates a markdown report (evaluation_results.md) with:

Summary table: Model performance overview
Detailed results: Full responses organized by category
Timing information: Evaluation duration per model
Error reporting: Any issues encountered

System Requirements

Python 3.8+
PyTorch 2.0+
Transformers 4.30+
4GB+ RAM (varies by model size)
Optional: CUDA-compatible GPU for faster inference

Notes

Evaluation can take significant time with many models (136 models detected)
Models are evaluated sequentially to manage memory usage
Small delays between prompts prevent overheating
Progress is shown with real-time updates

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ThomasTheMaker/ReplaceMe-Experiments

ReplaceMe

Collection

A set of model pruning experiments • 1 item • Updated Sep 3