mharvill23/hubert-xlarge-bowel-sound-detector

This model is trained for bowel sound detection in audio recordings using HuBERT (Hidden-Unit BERT). It can classify audio frames as either containing bowel sounds (class 1) or not containing bowel sounds (class 0).

Model Details

Model Type: HuBERT for Audio Frame Classification
Task: Binary classification of audio frames for bowel sound detection
Input: Audio waveforms (2-second segments at 16kHz)
Output: Frame-level predictions (49.5 Hz frame rate)
Classes: 0 (no bowel sound), 1 (bowel sound present)

Training Details

Base Model: Fine-tuned on pre-trained HuBERT
Dataset: Bowel Sounds Dataset from Kaggle
Training Data: 2-second audio segments with frame-level annotations
Evaluation: Frame-level accuracy on test set

Usage

from src.train.hubert_for_audio_frame_classification import HubertForAudioFrameClassification
import torch
import librosa
import numpy as np

# Load model
model = HubertForAudioFrameClassification.from_pretrained("mharvill23/hubert-xlarge-bowel-sound-detector")

# Load and preprocess audio
audio, sr = librosa.load("audio_file.wav", sr=16000)

# Ensure audio is 2 seconds (32000 samples at 16kHz)
if len(audio) < 32000:
    # Pad with zeros if shorter
    audio = np.pad(audio, (0, 32000 - len(audio)), 'constant')
else:
    # Truncate if longer
    audio = audio[:32000]

# Convert to tensor and add batch dimension
audio_tensor = torch.FloatTensor(audio).unsqueeze(0)  # Shape: (1, 32000)

# Get predictions
with torch.no_grad():
    outputs = model(audio_tensor)
    predictions = torch.argmax(outputs.logits, dim=-1)

# predictions contains frame-level classifications
# 0 = no bowel sound, 1 = bowel sound present
# Shape: (1, 99) - 99 frames for 2 seconds at 49.5 Hz

Training and Evaluation

This repository includes training and evaluation scripts:

src/train/train.py: Training script with the BowelSoundDataset class for loading and preprocessing bowel sound data
src/train/evaluate.py: Evaluation script for testing model performance on the test set

To train your own model:

# Train a new model
uv run src/train/train.py --data_dir /path/to/data --output_dir ./my_model

# Evaluate a trained model
uv run src/train/evaluate.py --model_path ./my_model --data_dir /path/to/data

The BowelSoundDataset class handles:

Loading audio files and CSV annotations
Preprocessing 2-second audio segments
Converting time annotations to frame-level labels
Caching processed data for faster training

Dataset

This model was trained on the Bowel Sounds Dataset which contains audio recordings with manual annotations of bowel sound events. I converted the dataset to a format that can be used with the BowelSoundDataset class (any type of bowel sound = class 1, no bowel sound = class 0).

Limitations

Trained on 2-second audio segments
May not generalize to significantly different recording conditions
Requires 16kHz audio input
Frame-level predictions at 49.5 Hz rate

Code

Full codebase can be found here

Citation

If you use this model in your research, please cite:

@misc{mharvill23/hubert_xlarge_bowel_sound_detector,
  author = {Matthew Harvill},
  title = {Bowel Sound Detection Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/mharvill23/hubert-xlarge-bowel-sound-detector}
}

## Model Performance

- **Test Accuracy**: 0.9670 (96.70%)
- **Precision**: 0.8404 (84.04%)
- **Recall**: 0.7674 (76.74%)
- **F1-Score**: 0.8023 (80.23%)
- **Specificity**: 0.9861 (98.61%)

### Confusion Matrix
- True Positives: 2,128
- False Positives: 404
- True Negatives: 28,602
- False Negatives: 645