You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.12.2

base_model: sudoping01/bambara-llm-exp3-merged
processor_type: AutoProcessor
hub_model_id: sudoping01/bambara-asr-llm-exp1

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true


skip_prepare_dataset: true  
remove_unused_columns: false
sample_packing: false


ddp: true
ddp_find_unused_parameters: true

# Template and tokens
chat_template: gemma3n
eot_tokens:
  - <end_of_turn>
special_tokens:
  eot_token: <end_of_turn>


datasets:
  - path: instruction_dataset_asr_axolotl_format.jsonl
    type: chat_template


val_set_size: 0.01
output_dir: ./outputs/bambara-gemma3n-asr-lora-exp1-v2


adapter: lora
lora_r: 64  # Reduced from 64 for stability
lora_alpha: 128  # Reduced from 128 for stability
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

# Sequence and batch settings - conservative for audio
sequence_len: 4096  # Reduced from 4096
pad_to_sequence_len: false
micro_batch_size: 8  # Increased: You have 8x H100s, can handle larger batches
gradient_accumulation_steps: 2

# Training parameters
num_epochs: 6  # Start with 1 epoch for testing
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-4  # Slightly higher as per research
warmup_ratio: 0.1  # Increased warmup for multimodal
weight_decay: 0.0  # Set to 0 for multimodal


bf16: true  # Must be true, not auto
tf32: false
load_in_4bit: false  # Keep false for quality
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# Monitoring
logging_steps: 1  # More frequent for debugging
saves_per_epoch: 2
evals_per_epoch: 2

# ASR metrics
metrics:
  - name: wer
  - name: cer

bambara-asr-llm-exp1

This model is a fine-tuned version of sudoping01/bambara-llm-exp3-merged on the instruction_dataset_asr_axolotl_format.jsonl dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0544
  • Memory/max Mem Active(gib): 18.76
  • Memory/max Mem Allocated(gib): 18.76
  • Memory/device Mem Reserved(gib): 19.99

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Use adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 350
  • training_steps: 3508

Training results

Training Loss Epoch Step Validation Loss Mem Active(gib) Mem Allocated(gib) Mem Reserved(gib)
No log 0 0 2.3381 18.76 18.76 19.99
0.4621 0.5009 293 0.5051 18.76 18.76 19.99
0.3689 1.0017 586 0.3825 18.76 18.76 19.99
0.3447 1.5026 879 0.3151 18.76 18.76 19.99
0.2844 2.0034 1172 0.2623 18.76 18.76 19.99
0.217 2.5043 1465 0.2172 18.76 18.76 19.99
0.1302 3.0051 1758 0.1837 18.76 18.76 19.99
0.1559 3.5060 2051 0.1448 18.76 18.76 19.99
0.1213 4.0068 2344 0.1147 18.76 18.76 19.99
0.0744 4.5077 2637 0.0851 18.76 18.76 19.99
0.0555 5.0085 2930 0.0646 18.76 18.76 19.99
0.0378 5.5094 3223 0.0544 18.76 18.76 19.99

Framework versions

  • PEFT 0.17.0
  • Transformers 4.55.2
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sudoping01/bambara-gemma3n-asr-lora-exp1-v2-all

Adapter
(2)
this model