Akashpaul123's picture
Upload Tiny-AST MAD classifier with 96.73% accuracy - 2025-08-20 11:01
f41b790 verified
---
language: en
license: apache-2.0
tags:
- audio-classification
- military-audio
- ast
- tiny-ast
- pytorch
- transformers
- surveillance
- edge-deployment
metrics:
- accuracy
- f1
model-index:
- name: tiny-ast-mad-military-audio-classifier
results:
- task:
type: audio-classification
name: Military Audio Classification
dataset:
name: MAD Dataset
type: military-audio
metrics:
- type: accuracy
value: 0.9673
name: Accuracy
- type: f1
value: 0.9674
name: F1-weighted
---
# Tiny-AST Military Audio Classifier
🎖️ **State-of-the-art military audio classification model** achieving **96.73% accuracy** on the Military Audio Dataset (MAD).
## Model Description
This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on the Military Audio Dataset (MAD). It's designed for **edge deployment** on devices like Raspberry Pi 5 for military surveillance applications.
### Key Features
- 🎯 **96.73% accuracy** on MAD dataset (7 military audio classes)
- 🚀 **Edge-optimized** for Raspberry Pi deployment
-**Fast inference** (<200ms per sample)
- 🧠 **Efficient** (16.5% of parameters fine-tuned)
- 🔊 **Robust** to real-world military environments
## Training Results
### Progressive Training Performance:
- **Phase 1** (Classifier only): 94.32% accuracy
- **Phase 2** (Top 2 layers): 96.73% accuracy ← **Best Model**
- **Phase 3** (Top 4 layers): 96.35% accuracy
- **Phase 4** (Top 6 layers): 96.73% accuracy
### Training Configuration:
- **Method**: Progressive unfreezing strategy
- **Learning Rates**: Conservative (1e-4 → 2e-5)
- **Normalization**: MAD-specific statistics (mean: -2.16, std: 2.85)
- **Class Weighting**: Balanced for imbalanced dataset
- **Training Time**: 40 minutes on RTX 3060
## Model Classes
The model classifies 7 military audio categories:
| Class ID | Class Name | Training Samples | Test Samples |
|----------|------------|------------------|--------------|
| 0 | Communication | 774 | 207 |
| 1 | Footsteps | 1,293 | 280 |
| 2 | Gunshot | 773 | 104 |
| 3 | Shelling | 883 | 104 |
| 4 | Vehicle | 910 | 122 |
| 5 | Helicopter | 934 | 91 |
| 6 | Fighter | 862 | 129 |
## Usage
### Quick Start
```python
from transformers import ASTForAudioClassification, ASTFeatureExtractor
import librosa
import torch
# Load model and feature extractor
model = ASTForAudioClassification.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier")
feature_extractor = ASTFeatureExtractor.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier")
# Load audio file (16kHz recommended)
audio, sr = librosa.load("military_audio.wav", sr=16000)
# Extract features
inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
# Predict
with torch.no_grad():
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
# Class mapping
classes = ['Communication', 'Footsteps', 'Gunshot', 'Shelling', 'Vehicle', 'Helicopter', 'Fighter']
print(f"Predicted class: {classes[predicted_class]}")
```
### Edge Deployment (Raspberry Pi 5)
```python
import onnxruntime as ort
# Load ONNX model for edge inference
session = ort.InferenceSession("tiny_ast_mad_optimized.onnx")
# ... inference code
```
## Training Details
### Dataset
- **Source**: Military Audio Dataset (MAD)
- **Total Samples**: 7,466 audio files
- **Duration**: 2-8 seconds per sample
- **Sample Rate**: 16kHz
- **Augmentation**: Military-specific (time stretch, pitch shift, noise injection)
### Architecture
- **Base Model**: Audio Spectrogram Transformer (AST)
- **Parameters**: 86.2M total, 14.2M trainable (16.5%)
- **Input**: Log-Mel spectrograms (1024 x 128)
- **Output**: 7 military audio classes
### Performance Metrics
- **Accuracy**: 96.73%
- **F1-Macro**: 96.84%
- **F1-Weighted**: 96.74%
- **Precision**: High across all classes
- **Recall**: Balanced performance
## Hardware Requirements
### Training
- **GPU**: RTX 3060 (12GB VRAM) or similar
- **RAM**: 16GB+ recommended
- **Storage**: 50GB for dataset and models
### Inference (Edge)
- **Device**: Raspberry Pi 5 or similar ARM device
- **RAM**: 2GB minimum
- **Inference Time**: <200ms per sample
- **Power**: <5W continuous operation
## Limitations and Considerations
- **Domain-specific**: Optimized for military audio contexts
- **Language**: Primarily English communication samples
- **Environment**: Trained on MAD dataset conditions
- **Real-time**: Designed for batch processing, not streaming
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{tiny-ast-mad-2024,
title={Tiny-AST Military Audio Classifier: Progressive Fine-tuning for Edge Deployment},
author={Paul, Akash},
year={2024},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/Akashpaul123/tiny-ast-mad-military-audio-classifier}
}
```
## License
This model is licensed under the Apache 2.0 License.
## Contact
- **Author**: Akash Paul
- **GitHub**: [@akashpaul123](https://github.com/akashpaul123)
- **Hugging Face**: [@akashpaul123](https://huggingface.co/akashpaul123)
---
*Model trained as part of military audio surveillance research with focus on edge deployment and real-world robustness.*