Madverse Music: AI Audio Classifier - Usage Guide

Quick Start

Option 1: Hugging Face Space (Recommended)

Use our deployed model on Hugging Face Spaces:

Web Interface:

Go to the Hugging Face Space URL
Upload your audio file
Click "Analyze Audio"
Get instant results

API Access:

# Health check
curl https://your-space-name.hf.space/health

# Analyze audio file
curl -X POST "https://your-space-name.hf.space/analyze" \
     -F "[email protected]"

Option 2: Local Setup

# Install dependencies
pip install -r requirements.txt

# Start the API server
python api.py

# Or start web interface
streamlit run app.py

Supported Audio Formats

WAV (.wav)
MP3 (.mp3)
FLAC (.flac)
M4A (.m4a)
OGG (.ogg)

API Usage

Hugging Face Space API

Health Check

GET /health

Analyze Audio

POST /analyze

Upload audio file using multipart/form-data

Request: Upload file using form data with field name "file"

Response Format:

{
  "classification": "Real",
  "confidence": 0.85,
  "probability": 0.15,
  "raw_score": -1.73,
  "duration": 30.5,
  "message": "Detected as real music"
}

Usage Examples

Python

import requests

# Upload file to HF Space
with open('your_song.mp3', 'rb') as f:
    response = requests.post('https://your-space-name.hf.space/analyze', 
                           files={'file': f})
result = response.json()
print(result)

JavaScript

const formData = new FormData();
formData.append('file', fileInput.files[0]);

const response = await fetch('https://your-space-name.hf.space/analyze', {
    method: 'POST',
    body: formData
});
const result = await response.json();

Understanding Results

The classifier will output:

"Real" = Human-created music
"Fake" = AI-generated music (from Suno, Udio, etc.)

API Response Format:

{
  "classification": "Real",
  "confidence": 0.85,
  "probability": 0.15,
  "raw_score": -1.73,
  "duration": 30.5,
  "message": "Detected as real music"
}

Command Line Output:

Analyzing: my_song.wav
Result: Fake (AI-generated music)
Confidence: 0.96 | Raw output: 3.786

Model Specifications

Model: SpecTTTra-α (120 seconds)
Sample Rate: 16kHz
Performance: 97% F1 score, 96% sensitivity, 99% specificity
Max Duration: 120 seconds (2 minutes)

Technical Details

How It Works:

Audio is loaded and resampled to 16kHz
Converted to mel-spectrograms
Processed by the SpecTTTra transformer model
Output logit is converted to probability using sigmoid
Classification: prob < 0.5 = Real, prob ≥ 0.5 = Fake

Testing Your Music

Get AI-generated samples: Download from Suno, Udio, or other AI music platforms
Get real music samples: Use traditional human-created songs
Run the classifier: Compare results to see how well it detects AI vs human music

Expected Performance

High accuracy on detecting modern AI-generated music
Works best with full songs (up to 120 seconds)
Optimized for music from platforms like Suno and Udio

Note: This model was trained specifically for detecting AI-generated songs, not just AI vocals over real instrumentals. It analyzes the entire musical composition.