mharvill23/hubert-xlarge-bowel-sound-detector
This model is trained for bowel sound detection in audio recordings using HuBERT (Hidden-Unit BERT). It can classify audio frames as either containing bowel sounds (class 1) or not containing bowel sounds (class 0).
Model Details
- Model Type: HuBERT for Audio Frame Classification
- Task: Binary classification of audio frames for bowel sound detection
- Input: Audio waveforms (2-second segments at 16kHz)
- Output: Frame-level predictions (49.5 Hz frame rate)
- Classes: 0 (no bowel sound), 1 (bowel sound present)
Training Details
- Base Model: Fine-tuned on pre-trained HuBERT
- Dataset: Bowel Sounds Dataset from Kaggle
- Training Data: 2-second audio segments with frame-level annotations
- Evaluation: Frame-level accuracy on test set
Usage
from src.train.hubert_for_audio_frame_classification import HubertForAudioFrameClassification
import torch
import librosa
import numpy as np
# Load model
model = HubertForAudioFrameClassification.from_pretrained("mharvill23/hubert-xlarge-bowel-sound-detector")
# Load and preprocess audio
audio, sr = librosa.load("audio_file.wav", sr=16000)
# Ensure audio is 2 seconds (32000 samples at 16kHz)
if len(audio) < 32000:
# Pad with zeros if shorter
audio = np.pad(audio, (0, 32000 - len(audio)), 'constant')
else:
# Truncate if longer
audio = audio[:32000]
# Convert to tensor and add batch dimension
audio_tensor = torch.FloatTensor(audio).unsqueeze(0) # Shape: (1, 32000)
# Get predictions
with torch.no_grad():
outputs = model(audio_tensor)
predictions = torch.argmax(outputs.logits, dim=-1)
# predictions contains frame-level classifications
# 0 = no bowel sound, 1 = bowel sound present
# Shape: (1, 99) - 99 frames for 2 seconds at 49.5 Hz
Training and Evaluation
This repository includes training and evaluation scripts:
src/train/train.py
: Training script with theBowelSoundDataset
class for loading and preprocessing bowel sound datasrc/train/evaluate.py
: Evaluation script for testing model performance on the test set
To train your own model:
# Train a new model
uv run src/train/train.py --data_dir /path/to/data --output_dir ./my_model
# Evaluate a trained model
uv run src/train/evaluate.py --model_path ./my_model --data_dir /path/to/data
The BowelSoundDataset
class handles:
- Loading audio files and CSV annotations
- Preprocessing 2-second audio segments
- Converting time annotations to frame-level labels
- Caching processed data for faster training
Dataset
This model was trained on the Bowel Sounds Dataset which contains audio recordings with manual annotations of bowel sound events.
I converted the dataset to a format that can be used with the BowelSoundDataset
class (any type of bowel sound = class 1, no bowel sound = class 0).
Limitations
- Trained on 2-second audio segments
- May not generalize to significantly different recording conditions
- Requires 16kHz audio input
- Frame-level predictions at 49.5 Hz rate
Code
Full codebase can be found here
Citation
If you use this model in your research, please cite:
@misc{mharvill23/hubert_xlarge_bowel_sound_detector,
author = {Matthew Harvill},
title = {Bowel Sound Detection Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/mharvill23/hubert-xlarge-bowel-sound-detector}
}
## Model Performance
- **Test Accuracy**: 0.9670 (96.70%)
- **Precision**: 0.8404 (84.04%)
- **Recall**: 0.7674 (76.74%)
- **F1-Score**: 0.8023 (80.23%)
- **Specificity**: 0.9861 (98.61%)
### Confusion Matrix
- True Positives: 2,128
- False Positives: 404
- True Negatives: 28,602
- False Negatives: 645
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support