|
--- |
|
language: en |
|
license: apache-2.0 |
|
tags: |
|
- audio-classification |
|
- military-audio |
|
- ast |
|
- tiny-ast |
|
- pytorch |
|
- transformers |
|
- surveillance |
|
- edge-deployment |
|
metrics: |
|
- accuracy |
|
- f1 |
|
model-index: |
|
- name: tiny-ast-mad-military-audio-classifier |
|
results: |
|
- task: |
|
type: audio-classification |
|
name: Military Audio Classification |
|
dataset: |
|
name: MAD Dataset |
|
type: military-audio |
|
metrics: |
|
- type: accuracy |
|
value: 0.9673 |
|
name: Accuracy |
|
- type: f1 |
|
value: 0.9674 |
|
name: F1-weighted |
|
--- |
|
|
|
# Tiny-AST Military Audio Classifier |
|
|
|
🎖️ **State-of-the-art military audio classification model** achieving **96.73% accuracy** on the Military Audio Dataset (MAD). |
|
|
|
## Model Description |
|
|
|
This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on the Military Audio Dataset (MAD). It's designed for **edge deployment** on devices like Raspberry Pi 5 for military surveillance applications. |
|
|
|
### Key Features |
|
- 🎯 **96.73% accuracy** on MAD dataset (7 military audio classes) |
|
- 🚀 **Edge-optimized** for Raspberry Pi deployment |
|
- ⚡ **Fast inference** (<200ms per sample) |
|
- 🧠 **Efficient** (16.5% of parameters fine-tuned) |
|
- 🔊 **Robust** to real-world military environments |
|
|
|
## Training Results |
|
|
|
### Progressive Training Performance: |
|
- **Phase 1** (Classifier only): 94.32% accuracy |
|
- **Phase 2** (Top 2 layers): 96.73% accuracy ← **Best Model** |
|
- **Phase 3** (Top 4 layers): 96.35% accuracy |
|
- **Phase 4** (Top 6 layers): 96.73% accuracy |
|
|
|
### Training Configuration: |
|
- **Method**: Progressive unfreezing strategy |
|
- **Learning Rates**: Conservative (1e-4 → 2e-5) |
|
- **Normalization**: MAD-specific statistics (mean: -2.16, std: 2.85) |
|
- **Class Weighting**: Balanced for imbalanced dataset |
|
- **Training Time**: 40 minutes on RTX 3060 |
|
|
|
## Model Classes |
|
|
|
The model classifies 7 military audio categories: |
|
|
|
| Class ID | Class Name | Training Samples | Test Samples | |
|
|----------|------------|------------------|--------------| |
|
| 0 | Communication | 774 | 207 | |
|
| 1 | Footsteps | 1,293 | 280 | |
|
| 2 | Gunshot | 773 | 104 | |
|
| 3 | Shelling | 883 | 104 | |
|
| 4 | Vehicle | 910 | 122 | |
|
| 5 | Helicopter | 934 | 91 | |
|
| 6 | Fighter | 862 | 129 | |
|
|
|
## Usage |
|
|
|
### Quick Start |
|
```python |
|
from transformers import ASTForAudioClassification, ASTFeatureExtractor |
|
import librosa |
|
import torch |
|
|
|
# Load model and feature extractor |
|
model = ASTForAudioClassification.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier") |
|
feature_extractor = ASTFeatureExtractor.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier") |
|
|
|
# Load audio file (16kHz recommended) |
|
audio, sr = librosa.load("military_audio.wav", sr=16000) |
|
|
|
# Extract features |
|
inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt") |
|
|
|
# Predict |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
predicted_class = torch.argmax(outputs.logits, dim=-1).item() |
|
|
|
# Class mapping |
|
classes = ['Communication', 'Footsteps', 'Gunshot', 'Shelling', 'Vehicle', 'Helicopter', 'Fighter'] |
|
print(f"Predicted class: {classes[predicted_class]}") |
|
``` |
|
|
|
### Edge Deployment (Raspberry Pi 5) |
|
```python |
|
import onnxruntime as ort |
|
|
|
# Load ONNX model for edge inference |
|
session = ort.InferenceSession("tiny_ast_mad_optimized.onnx") |
|
# ... inference code |
|
``` |
|
|
|
## Training Details |
|
|
|
### Dataset |
|
- **Source**: Military Audio Dataset (MAD) |
|
- **Total Samples**: 7,466 audio files |
|
- **Duration**: 2-8 seconds per sample |
|
- **Sample Rate**: 16kHz |
|
- **Augmentation**: Military-specific (time stretch, pitch shift, noise injection) |
|
|
|
### Architecture |
|
- **Base Model**: Audio Spectrogram Transformer (AST) |
|
- **Parameters**: 86.2M total, 14.2M trainable (16.5%) |
|
- **Input**: Log-Mel spectrograms (1024 x 128) |
|
- **Output**: 7 military audio classes |
|
|
|
### Performance Metrics |
|
- **Accuracy**: 96.73% |
|
- **F1-Macro**: 96.84% |
|
- **F1-Weighted**: 96.74% |
|
- **Precision**: High across all classes |
|
- **Recall**: Balanced performance |
|
|
|
## Hardware Requirements |
|
|
|
### Training |
|
- **GPU**: RTX 3060 (12GB VRAM) or similar |
|
- **RAM**: 16GB+ recommended |
|
- **Storage**: 50GB for dataset and models |
|
|
|
### Inference (Edge) |
|
- **Device**: Raspberry Pi 5 or similar ARM device |
|
- **RAM**: 2GB minimum |
|
- **Inference Time**: <200ms per sample |
|
- **Power**: <5W continuous operation |
|
|
|
## Limitations and Considerations |
|
|
|
- **Domain-specific**: Optimized for military audio contexts |
|
- **Language**: Primarily English communication samples |
|
- **Environment**: Trained on MAD dataset conditions |
|
- **Real-time**: Designed for batch processing, not streaming |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
```bibtex |
|
@misc{tiny-ast-mad-2024, |
|
title={Tiny-AST Military Audio Classifier: Progressive Fine-tuning for Edge Deployment}, |
|
author={Paul, Akash}, |
|
year={2024}, |
|
howpublished={Hugging Face Model Hub}, |
|
url={https://huggingface.co/Akashpaul123/tiny-ast-mad-military-audio-classifier} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This model is licensed under the Apache 2.0 License. |
|
|
|
## Contact |
|
|
|
- **Author**: Akash Paul |
|
- **GitHub**: [@akashpaul123](https://github.com/akashpaul123) |
|
- **Hugging Face**: [@akashpaul123](https://huggingface.co/akashpaul123) |
|
|
|
--- |
|
|
|
*Model trained as part of military audio surveillance research with focus on edge deployment and real-world robustness.* |
|
|