oumi-l8b-ultrachat / logs /rank_0003.log
penfever's picture
Upload folder using huggingface_hub
b661f28 verified
[2025-01-31 02:24:32,068][oumi][rank3][pid:10761][MainThread][INFO]][train.py:144] Resolved 'training.dataloader_num_workers=auto' to 'training.dataloader_num_workers=8'
[2025-01-31 02:24:33,035][oumi][rank3][pid:10761][MainThread][WARNING]][models.py:412] Undefined pad token. Setting it to `<|finetune_right_pad_id|>`.
[2025-01-31 02:24:33,036][oumi][rank3][pid:10761][MainThread][INFO]][models.py:428] Using the chat template 'llama3-instruct' specified in model config!
[2025-01-31 02:24:33,036][oumi][rank3][pid:10761][MainThread][INFO]][models.py:180] Building model for distributed training (world_size: 4)...
[2025-01-31 02:24:33,036][oumi][rank3][pid:10761][MainThread][INFO]][models.py:185] Building model using device_map: cuda:3 (DeviceRankInfo(world_size=4, rank=3, local_world_size=4, local_rank=3))...
[2025-01-31 02:24:33,036][oumi][rank3][pid:10761][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
[2025-01-31 02:24:39,641][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:68] Creating map dataset (type: UltrachatH4Dataset) dataset_name: 'HuggingFaceH4/ultrachat_200k', dataset_path: 'None'...
[2025-01-31 02:25:29,800][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:472] Dataset Info:
Split: train_sft
Version: 0.0.0
Dataset size: 3047427114
Download size: 1624049723
Size: 4671476837 bytes
Rows: 207865
Columns: ['prompt', 'prompt_id', 'messages']
[2025-01-31 02:25:33,301][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:411] Loaded DataFrame with shape: (207865, 3). Columns:
prompt object
prompt_id object
messages object
dtype: object
[2025-01-31 02:25:33,411][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:297] UltrachatH4Dataset: features=dict_keys(['input_ids', 'attention_mask'])
[2025-01-31 02:39:30,856][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:361] Finished transforming dataset (UltrachatH4Dataset)! Speed: 248.21 examples/sec. Examples: 207865. Duration: 837.4 sec. Transform workers: 1.
[2025-01-31 02:39:31,453][oumi][rank3][pid:10761][MainThread][INFO]][torch_profiler_utils.py:150] PROF: Torch Profiler disabled!
[2025-01-31 02:39:31,454][oumi][rank3][pid:10761][MainThread][WARNING]][callbacks.py:54] MFU logging requires packed datasets. Skipping MFU callbacks.
[2025-01-31 02:39:31,566][oumi][rank3][pid:10761][MainThread][INFO]][device_utils.py:283] GPU Metrics Before Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=4, used_memory_mb=19007.0, temperature=35, fan_speed=None, fan_speeds=None, power_usage_watts=78.83800000000001, power_limit_watts=400.0, gpu_utilization=0, memory_utilization=0, performance_state=0, clock_speed_graphics=1155, clock_speed_sm=1155, clock_speed_memory=1593).
[2025-01-31 02:39:31,568][oumi][rank3][pid:10761][MainThread][INFO]][train.py:312] Training init time: 901.293s
[2025-01-31 02:39:31,568][oumi][rank3][pid:10761][MainThread][INFO]][train.py:313] Starting training... (TrainerType.TRL_SFT, transformers: 4.45.2)
[2025-01-31 13:18:45,140][oumi][rank3][pid:10761][MainThread][INFO]][train.py:320] Training is Complete.
[2025-01-31 13:18:45,163][oumi][rank3][pid:10761][MainThread][INFO]][device_utils.py:283] GPU Metrics After Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=4, used_memory_mb=69213.0, temperature=42, fan_speed=None, fan_speeds=None, power_usage_watts=78.479, power_limit_watts=400.0, gpu_utilization=2, memory_utilization=0, performance_state=0, clock_speed_graphics=1155, clock_speed_sm=1155, clock_speed_memory=1593).
[2025-01-31 13:18:45,163][oumi][rank3][pid:10761][MainThread][INFO]][torch_utils.py:117] Peak GPU memory usage: 47.51 GB
[2025-01-31 13:18:45,163][oumi][rank3][pid:10761][MainThread][INFO]][train.py:327] Saving final state...
[2025-01-31 13:18:45,431][oumi][rank3][pid:10761][MainThread][INFO]][train.py:332] Saving final model...
[2025-01-31 13:19:21,035][oumi][rank3][pid:10761][MainThread][INFO]][train.py:339]
» We're always looking for feedback. What's one thing we can improve? https://oumi.ai/feedback