You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

See axolotl config

axolotl version: 0.12.2

base_model: sudoping01/bambara-llm-exp3-merged
processor_type: AutoProcessor
hub_model_id: sudoping01/bambara-asr-llm-exp1

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true


skip_prepare_dataset: true  
remove_unused_columns: false
sample_packing: false


ddp: true
ddp_find_unused_parameters: true

# Template and tokens
chat_template: gemma3n
eot_tokens:
  - <end_of_turn>
special_tokens:
  eot_token: <end_of_turn>


datasets:
  - path: instruction_dataset_asr_axolotl_format.jsonl
    type: chat_template


val_set_size: 0.01
output_dir: ./outputs/bambara-gemma3n-asr-lora-exp1-v2


adapter: lora
lora_r: 64  # Reduced from 64 for stability
lora_alpha: 128  # Reduced from 128 for stability
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

# Sequence and batch settings - conservative for audio
sequence_len: 4096  # Reduced from 4096
pad_to_sequence_len: false
micro_batch_size: 8  # Increased: You have 8x H100s, can handle larger batches
gradient_accumulation_steps: 2

# Training parameters
num_epochs: 6  # Start with 1 epoch for testing
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-4  # Slightly higher as per research
warmup_ratio: 0.1  # Increased warmup for multimodal
weight_decay: 0.0  # Set to 0 for multimodal


bf16: true  # Must be true, not auto
tf32: false
load_in_4bit: false  # Keep false for quality
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# Monitoring
logging_steps: 1  # More frequent for debugging
saves_per_epoch: 2
evals_per_epoch: 2

# ASR metrics
metrics:
  - name: wer
  - name: cer

bambara-asr-llm-exp1

This model is a fine-tuned version of sudoping01/bambara-llm-exp3-merged on the instruction_dataset_asr_axolotl_format.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 0.0544
Memory/max Mem Active(gib): 18.76
Memory/max Mem Allocated(gib): 18.76
Memory/device Mem Reserved(gib): 19.99

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Use adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 350
training_steps: 3508

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	2.3381	18.76	18.76	19.99
0.4621	0.5009	293	0.5051	18.76	18.76	19.99
0.3689	1.0017	586	0.3825	18.76	18.76	19.99
0.3447	1.5026	879	0.3151	18.76	18.76	19.99
0.2844	2.0034	1172	0.2623	18.76	18.76	19.99
0.217	2.5043	1465	0.2172	18.76	18.76	19.99
0.1302	3.0051	1758	0.1837	18.76	18.76	19.99
0.1559	3.5060	2051	0.1448	18.76	18.76	19.99
0.1213	4.0068	2344	0.1147	18.76	18.76	19.99
0.0744	4.5077	2637	0.0851	18.76	18.76	19.99
0.0555	5.0085	2930	0.0646	18.76	18.76	19.99
0.0378	5.5094	3223	0.0544	18.76	18.76	19.99

Framework versions

PEFT 0.17.0
Transformers 4.55.2
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 7

Model tree for sudoping01/bambara-gemma3n-asr-lora-exp1-v2-all

Base model

sudoping01/bambara-llm-exp3-merged

Adapter

(2)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard