--- language: en license: apache-2.0 tags: - audio-classification - military-audio - ast - tiny-ast - pytorch - transformers - surveillance - edge-deployment metrics: - accuracy - f1 model-index: - name: tiny-ast-mad-military-audio-classifier results: - task: type: audio-classification name: Military Audio Classification dataset: name: MAD Dataset type: military-audio metrics: - type: accuracy value: 0.9673 name: Accuracy - type: f1 value: 0.9674 name: F1-weighted --- # Tiny-AST Military Audio Classifier 🎖️ **State-of-the-art military audio classification model** achieving **96.73% accuracy** on the Military Audio Dataset (MAD). ## Model Description This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on the Military Audio Dataset (MAD). It's designed for **edge deployment** on devices like Raspberry Pi 5 for military surveillance applications. ### Key Features - 🎯 **96.73% accuracy** on MAD dataset (7 military audio classes) - 🚀 **Edge-optimized** for Raspberry Pi deployment - ⚡ **Fast inference** (<200ms per sample) - 🧠 **Efficient** (16.5% of parameters fine-tuned) - 🔊 **Robust** to real-world military environments ## Training Results ### Progressive Training Performance: - **Phase 1** (Classifier only): 94.32% accuracy - **Phase 2** (Top 2 layers): 96.73% accuracy ← **Best Model** - **Phase 3** (Top 4 layers): 96.35% accuracy - **Phase 4** (Top 6 layers): 96.73% accuracy ### Training Configuration: - **Method**: Progressive unfreezing strategy - **Learning Rates**: Conservative (1e-4 → 2e-5) - **Normalization**: MAD-specific statistics (mean: -2.16, std: 2.85) - **Class Weighting**: Balanced for imbalanced dataset - **Training Time**: 40 minutes on RTX 3060 ## Model Classes The model classifies 7 military audio categories: | Class ID | Class Name | Training Samples | Test Samples | |----------|------------|------------------|--------------| | 0 | Communication | 774 | 207 | | 1 | Footsteps | 1,293 | 280 | | 2 | Gunshot | 773 | 104 | | 3 | Shelling | 883 | 104 | | 4 | Vehicle | 910 | 122 | | 5 | Helicopter | 934 | 91 | | 6 | Fighter | 862 | 129 | ## Usage ### Quick Start ```python from transformers import ASTForAudioClassification, ASTFeatureExtractor import librosa import torch # Load model and feature extractor model = ASTForAudioClassification.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier") feature_extractor = ASTFeatureExtractor.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier") # Load audio file (16kHz recommended) audio, sr = librosa.load("military_audio.wav", sr=16000) # Extract features inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt") # Predict with torch.no_grad(): outputs = model(**inputs) predicted_class = torch.argmax(outputs.logits, dim=-1).item() # Class mapping classes = ['Communication', 'Footsteps', 'Gunshot', 'Shelling', 'Vehicle', 'Helicopter', 'Fighter'] print(f"Predicted class: {classes[predicted_class]}") ``` ### Edge Deployment (Raspberry Pi 5) ```python import onnxruntime as ort # Load ONNX model for edge inference session = ort.InferenceSession("tiny_ast_mad_optimized.onnx") # ... inference code ``` ## Training Details ### Dataset - **Source**: Military Audio Dataset (MAD) - **Total Samples**: 7,466 audio files - **Duration**: 2-8 seconds per sample - **Sample Rate**: 16kHz - **Augmentation**: Military-specific (time stretch, pitch shift, noise injection) ### Architecture - **Base Model**: Audio Spectrogram Transformer (AST) - **Parameters**: 86.2M total, 14.2M trainable (16.5%) - **Input**: Log-Mel spectrograms (1024 x 128) - **Output**: 7 military audio classes ### Performance Metrics - **Accuracy**: 96.73% - **F1-Macro**: 96.84% - **F1-Weighted**: 96.74% - **Precision**: High across all classes - **Recall**: Balanced performance ## Hardware Requirements ### Training - **GPU**: RTX 3060 (12GB VRAM) or similar - **RAM**: 16GB+ recommended - **Storage**: 50GB for dataset and models ### Inference (Edge) - **Device**: Raspberry Pi 5 or similar ARM device - **RAM**: 2GB minimum - **Inference Time**: <200ms per sample - **Power**: <5W continuous operation ## Limitations and Considerations - **Domain-specific**: Optimized for military audio contexts - **Language**: Primarily English communication samples - **Environment**: Trained on MAD dataset conditions - **Real-time**: Designed for batch processing, not streaming ## Citation If you use this model in your research, please cite: ```bibtex @misc{tiny-ast-mad-2024, title={Tiny-AST Military Audio Classifier: Progressive Fine-tuning for Edge Deployment}, author={Paul, Akash}, year={2024}, howpublished={Hugging Face Model Hub}, url={https://huggingface.co/Akashpaul123/tiny-ast-mad-military-audio-classifier} } ``` ## License This model is licensed under the Apache 2.0 License. ## Contact - **Author**: Akash Paul - **GitHub**: [@akashpaul123](https://github.com/akashpaul123) - **Hugging Face**: [@akashpaul123](https://huggingface.co/akashpaul123) --- *Model trained as part of military audio surveillance research with focus on edge deployment and real-world robustness.*