Akashpaul123 commited on
Commit
f41b790
·
verified ·
1 Parent(s): 0ca2c47

Upload Tiny-AST MAD classifier with 96.73% accuracy - 2025-08-20 11:01

Browse files
README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - audio-classification
6
+ - military-audio
7
+ - ast
8
+ - tiny-ast
9
+ - pytorch
10
+ - transformers
11
+ - surveillance
12
+ - edge-deployment
13
+ metrics:
14
+ - accuracy
15
+ - f1
16
+ model-index:
17
+ - name: tiny-ast-mad-military-audio-classifier
18
+ results:
19
+ - task:
20
+ type: audio-classification
21
+ name: Military Audio Classification
22
+ dataset:
23
+ name: MAD Dataset
24
+ type: military-audio
25
+ metrics:
26
+ - type: accuracy
27
+ value: 0.9673
28
+ name: Accuracy
29
+ - type: f1
30
+ value: 0.9674
31
+ name: F1-weighted
32
+ ---
33
+
34
+ # Tiny-AST Military Audio Classifier
35
+
36
+ 🎖️ **State-of-the-art military audio classification model** achieving **96.73% accuracy** on the Military Audio Dataset (MAD).
37
+
38
+ ## Model Description
39
+
40
+ This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on the Military Audio Dataset (MAD). It's designed for **edge deployment** on devices like Raspberry Pi 5 for military surveillance applications.
41
+
42
+ ### Key Features
43
+ - 🎯 **96.73% accuracy** on MAD dataset (7 military audio classes)
44
+ - 🚀 **Edge-optimized** for Raspberry Pi deployment
45
+ - ⚡ **Fast inference** (<200ms per sample)
46
+ - 🧠 **Efficient** (16.5% of parameters fine-tuned)
47
+ - 🔊 **Robust** to real-world military environments
48
+
49
+ ## Training Results
50
+
51
+ ### Progressive Training Performance:
52
+ - **Phase 1** (Classifier only): 94.32% accuracy
53
+ - **Phase 2** (Top 2 layers): 96.73% accuracy ← **Best Model**
54
+ - **Phase 3** (Top 4 layers): 96.35% accuracy
55
+ - **Phase 4** (Top 6 layers): 96.73% accuracy
56
+
57
+ ### Training Configuration:
58
+ - **Method**: Progressive unfreezing strategy
59
+ - **Learning Rates**: Conservative (1e-4 → 2e-5)
60
+ - **Normalization**: MAD-specific statistics (mean: -2.16, std: 2.85)
61
+ - **Class Weighting**: Balanced for imbalanced dataset
62
+ - **Training Time**: 40 minutes on RTX 3060
63
+
64
+ ## Model Classes
65
+
66
+ The model classifies 7 military audio categories:
67
+
68
+ | Class ID | Class Name | Training Samples | Test Samples |
69
+ |----------|------------|------------------|--------------|
70
+ | 0 | Communication | 774 | 207 |
71
+ | 1 | Footsteps | 1,293 | 280 |
72
+ | 2 | Gunshot | 773 | 104 |
73
+ | 3 | Shelling | 883 | 104 |
74
+ | 4 | Vehicle | 910 | 122 |
75
+ | 5 | Helicopter | 934 | 91 |
76
+ | 6 | Fighter | 862 | 129 |
77
+
78
+ ## Usage
79
+
80
+ ### Quick Start
81
+ ```python
82
+ from transformers import ASTForAudioClassification, ASTFeatureExtractor
83
+ import librosa
84
+ import torch
85
+
86
+ # Load model and feature extractor
87
+ model = ASTForAudioClassification.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier")
88
+ feature_extractor = ASTFeatureExtractor.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier")
89
+
90
+ # Load audio file (16kHz recommended)
91
+ audio, sr = librosa.load("military_audio.wav", sr=16000)
92
+
93
+ # Extract features
94
+ inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
95
+
96
+ # Predict
97
+ with torch.no_grad():
98
+ outputs = model(**inputs)
99
+ predicted_class = torch.argmax(outputs.logits, dim=-1).item()
100
+
101
+ # Class mapping
102
+ classes = ['Communication', 'Footsteps', 'Gunshot', 'Shelling', 'Vehicle', 'Helicopter', 'Fighter']
103
+ print(f"Predicted class: {classes[predicted_class]}")
104
+ ```
105
+
106
+ ### Edge Deployment (Raspberry Pi 5)
107
+ ```python
108
+ import onnxruntime as ort
109
+
110
+ # Load ONNX model for edge inference
111
+ session = ort.InferenceSession("tiny_ast_mad_optimized.onnx")
112
+ # ... inference code
113
+ ```
114
+
115
+ ## Training Details
116
+
117
+ ### Dataset
118
+ - **Source**: Military Audio Dataset (MAD)
119
+ - **Total Samples**: 7,466 audio files
120
+ - **Duration**: 2-8 seconds per sample
121
+ - **Sample Rate**: 16kHz
122
+ - **Augmentation**: Military-specific (time stretch, pitch shift, noise injection)
123
+
124
+ ### Architecture
125
+ - **Base Model**: Audio Spectrogram Transformer (AST)
126
+ - **Parameters**: 86.2M total, 14.2M trainable (16.5%)
127
+ - **Input**: Log-Mel spectrograms (1024 x 128)
128
+ - **Output**: 7 military audio classes
129
+
130
+ ### Performance Metrics
131
+ - **Accuracy**: 96.73%
132
+ - **F1-Macro**: 96.84%
133
+ - **F1-Weighted**: 96.74%
134
+ - **Precision**: High across all classes
135
+ - **Recall**: Balanced performance
136
+
137
+ ## Hardware Requirements
138
+
139
+ ### Training
140
+ - **GPU**: RTX 3060 (12GB VRAM) or similar
141
+ - **RAM**: 16GB+ recommended
142
+ - **Storage**: 50GB for dataset and models
143
+
144
+ ### Inference (Edge)
145
+ - **Device**: Raspberry Pi 5 or similar ARM device
146
+ - **RAM**: 2GB minimum
147
+ - **Inference Time**: <200ms per sample
148
+ - **Power**: <5W continuous operation
149
+
150
+ ## Limitations and Considerations
151
+
152
+ - **Domain-specific**: Optimized for military audio contexts
153
+ - **Language**: Primarily English communication samples
154
+ - **Environment**: Trained on MAD dataset conditions
155
+ - **Real-time**: Designed for batch processing, not streaming
156
+
157
+ ## Citation
158
+
159
+ If you use this model in your research, please cite:
160
+
161
+ ```bibtex
162
+ @misc{tiny-ast-mad-2024,
163
+ title={Tiny-AST Military Audio Classifier: Progressive Fine-tuning for Edge Deployment},
164
+ author={Paul, Akash},
165
+ year={2024},
166
+ howpublished={Hugging Face Model Hub},
167
+ url={https://huggingface.co/Akashpaul123/tiny-ast-mad-military-audio-classifier}
168
+ }
169
+ ```
170
+
171
+ ## License
172
+
173
+ This model is licensed under the Apache 2.0 License.
174
+
175
+ ## Contact
176
+
177
+ - **Author**: Akash Paul
178
+ - **GitHub**: [@akashpaul123](https://github.com/akashpaul123)
179
+ - **Hugging Face**: [@akashpaul123](https://huggingface.co/akashpaul123)
180
+
181
+ ---
182
+
183
+ *Model trained as part of military audio surveillance research with focus on edge deployment and real-world robustness.*
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ASTForAudioClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "frequency_stride": 10,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.0,
9
+ "hidden_size": 768,
10
+ "id2label": {
11
+ "0": "LABEL_0",
12
+ "1": "LABEL_1",
13
+ "2": "LABEL_2",
14
+ "3": "LABEL_3",
15
+ "4": "LABEL_4",
16
+ "5": "LABEL_5",
17
+ "6": "LABEL_6"
18
+ },
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 3072,
21
+ "label2id": {
22
+ "LABEL_0": 0,
23
+ "LABEL_1": 1,
24
+ "LABEL_2": 2,
25
+ "LABEL_3": 3,
26
+ "LABEL_4": 4,
27
+ "LABEL_5": 5,
28
+ "LABEL_6": 6
29
+ },
30
+ "layer_norm_eps": 1e-12,
31
+ "max_length": 1024,
32
+ "model_type": "audio-spectrogram-transformer",
33
+ "num_attention_heads": 12,
34
+ "num_hidden_layers": 12,
35
+ "num_mel_bins": 128,
36
+ "patch_size": 16,
37
+ "problem_type": "single_label_classification",
38
+ "qkv_bias": true,
39
+ "time_stride": 10,
40
+ "torch_dtype": "float32",
41
+ "transformers_version": "4.55.2"
42
+ }
inference_example.py ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Example inference script for Tiny-AST MAD Military Audio Classifier
3
+ """
4
+
5
+ from transformers import ASTForAudioClassification, ASTFeatureExtractor
6
+ import librosa
7
+ import torch
8
+ import numpy as np
9
+
10
+ def classify_military_audio(audio_path, model_name="akashpaul123/tiny-ast-mad-military-audio-classifier"):
11
+ """
12
+ Classify military audio using the fine-tuned Tiny-AST model
13
+
14
+ Args:
15
+ audio_path (str): Path to audio file
16
+ model_name (str): Hugging Face model name
17
+
18
+ Returns:
19
+ dict: Classification results
20
+ """
21
+
22
+ # Load model and feature extractor
23
+ model = ASTForAudioClassification.from_pretrained(model_name)
24
+ feature_extractor = ASTFeatureExtractor.from_pretrained(model_name)
25
+
26
+ # Load and preprocess audio
27
+ audio, sr = librosa.load(audio_path, sr=16000, duration=10.0)
28
+
29
+ # Extract features
30
+ inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
31
+
32
+ # Predict
33
+ with torch.no_grad():
34
+ outputs = model(**inputs)
35
+ probabilities = torch.softmax(outputs.logits, dim=-1)
36
+ predicted_class = torch.argmax(probabilities, dim=-1).item()
37
+ confidence = probabilities[0][predicted_class].item()
38
+
39
+ # Class mapping
40
+ classes = ['Communication', 'Footsteps', 'Gunshot', 'Shelling',
41
+ 'Vehicle', 'Helicopter', 'Fighter']
42
+
43
+ return {
44
+ 'predicted_class': classes[predicted_class],
45
+ 'class_id': predicted_class,
46
+ 'confidence': confidence,
47
+ 'all_probabilities': {cls: prob.item() for cls, prob in zip(classes, probabilities[0])}
48
+ }
49
+
50
+ # Example usage
51
+ if __name__ == "__main__":
52
+ # Replace with your audio file path
53
+ result = classify_military_audio("path/to/your/military_audio.wav")
54
+
55
+ print(f"Predicted class: {result['predicted_class']}")
56
+ print(f"Confidence: {result['confidence']:.4f}")
57
+ print("\nAll class probabilities:")
58
+ for class_name, prob in result['all_probabilities'].items():
59
+ print(f" {class_name}: {prob:.4f}")
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:938ba3a3d129bf148bfab506bd8284a1d79b819008e17d1cb3c862836fb109b1
3
+ size 344805420
preprocessor_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "feature_extractor_type": "ASTFeatureExtractor",
4
+ "feature_size": 1,
5
+ "max_length": 1024,
6
+ "mean": -2.164904,
7
+ "num_mel_bins": 128,
8
+ "padding_side": "right",
9
+ "padding_value": 0.0,
10
+ "return_attention_mask": false,
11
+ "sampling_rate": 16000,
12
+ "std": 2.854887
13
+ }
training_info.json ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "tiny-ast-mad-military-audio-classifier",
3
+ "base_model": "MIT/ast-finetuned-audioset-10-10-0.4593",
4
+ "dataset": "Military Audio Dataset (MAD)",
5
+ "training_method": "Progressive Unfreezing",
6
+ "best_phase": 2,
7
+ "final_accuracy": 0.9673,
8
+ "final_f1_weighted": 0.9674,
9
+ "training_time_minutes": 40.1,
10
+ "classes": [
11
+ "Communication",
12
+ "Footsteps",
13
+ "Gunshot",
14
+ "Shelling",
15
+ "Vehicle",
16
+ "Helicopter",
17
+ "Fighter"
18
+ ],
19
+ "class_mapping": {
20
+ "0": "Communication",
21
+ "1": "Footsteps",
22
+ "2": "Gunshot",
23
+ "3": "Shelling",
24
+ "4": "Vehicle",
25
+ "5": "Helicopter",
26
+ "6": "Fighter"
27
+ },
28
+ "normalization_stats": {
29
+ "mean": -2.164904,
30
+ "std": 2.854887
31
+ },
32
+ "phase_results": {
33
+ "phase_1": {
34
+ "accuracy": 0.9432,
35
+ "f1_weighted": 0.9432
36
+ },
37
+ "phase_2": {
38
+ "accuracy": 0.9673,
39
+ "f1_weighted": 0.9674
40
+ },
41
+ "phase_3": {
42
+ "accuracy": 0.9635,
43
+ "f1_weighted": 0.9635
44
+ },
45
+ "phase_4": {
46
+ "accuracy": 0.9673,
47
+ "f1_weighted": 0.9674
48
+ }
49
+ },
50
+ "upload_date": "2025-08-20T11:01:50.672302",
51
+ "hardware_used": "RTX 3060 (12GB VRAM)"
52
+ }