|
[2025-01-31 02:24:32,068][oumi][rank3][pid:10761][MainThread][INFO]][train.py:144] Resolved 'training.dataloader_num_workers=auto' to 'training.dataloader_num_workers=8' |
|
[2025-01-31 02:24:33,035][oumi][rank3][pid:10761][MainThread][WARNING]][models.py:412] Undefined pad token. Setting it to `<|finetune_right_pad_id|>`. |
|
[2025-01-31 02:24:33,036][oumi][rank3][pid:10761][MainThread][INFO]][models.py:428] Using the chat template 'llama3-instruct' specified in model config! |
|
[2025-01-31 02:24:33,036][oumi][rank3][pid:10761][MainThread][INFO]][models.py:180] Building model for distributed training (world_size: 4)... |
|
[2025-01-31 02:24:33,036][oumi][rank3][pid:10761][MainThread][INFO]][models.py:185] Building model using device_map: cuda:3 (DeviceRankInfo(world_size=4, rank=3, local_world_size=4, local_rank=3))... |
|
[2025-01-31 02:24:33,036][oumi][rank3][pid:10761][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model. |
|
[2025-01-31 02:24:39,641][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:68] Creating map dataset (type: UltrachatH4Dataset) dataset_name: 'HuggingFaceH4/ultrachat_200k', dataset_path: 'None'... |
|
[2025-01-31 02:25:29,800][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:472] Dataset Info: |
|
Split: train_sft |
|
Version: 0.0.0 |
|
Dataset size: 3047427114 |
|
Download size: 1624049723 |
|
Size: 4671476837 bytes |
|
Rows: 207865 |
|
Columns: ['prompt', 'prompt_id', 'messages'] |
|
[2025-01-31 02:25:33,301][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:411] Loaded DataFrame with shape: (207865, 3). Columns: |
|
prompt object |
|
prompt_id object |
|
messages object |
|
dtype: object |
|
[2025-01-31 02:25:33,411][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:297] UltrachatH4Dataset: features=dict_keys(['input_ids', 'attention_mask']) |
|
[2025-01-31 02:39:30,856][oumi][rank3][pid:10761][MainThread][INFO]][base_map_dataset.py:361] Finished transforming dataset (UltrachatH4Dataset)! Speed: 248.21 examples/sec. Examples: 207865. Duration: 837.4 sec. Transform workers: 1. |
|
[2025-01-31 02:39:31,453][oumi][rank3][pid:10761][MainThread][INFO]][torch_profiler_utils.py:150] PROF: Torch Profiler disabled! |
|
[2025-01-31 02:39:31,454][oumi][rank3][pid:10761][MainThread][WARNING]][callbacks.py:54] MFU logging requires packed datasets. Skipping MFU callbacks. |
|
[2025-01-31 02:39:31,566][oumi][rank3][pid:10761][MainThread][INFO]][device_utils.py:283] GPU Metrics Before Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=4, used_memory_mb=19007.0, temperature=35, fan_speed=None, fan_speeds=None, power_usage_watts=78.83800000000001, power_limit_watts=400.0, gpu_utilization=0, memory_utilization=0, performance_state=0, clock_speed_graphics=1155, clock_speed_sm=1155, clock_speed_memory=1593). |
|
[2025-01-31 02:39:31,568][oumi][rank3][pid:10761][MainThread][INFO]][train.py:312] Training init time: 901.293s |
|
[2025-01-31 02:39:31,568][oumi][rank3][pid:10761][MainThread][INFO]][train.py:313] Starting training... (TrainerType.TRL_SFT, transformers: 4.45.2) |
|
[2025-01-31 13:18:45,140][oumi][rank3][pid:10761][MainThread][INFO]][train.py:320] Training is Complete. |
|
[2025-01-31 13:18:45,163][oumi][rank3][pid:10761][MainThread][INFO]][device_utils.py:283] GPU Metrics After Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=4, used_memory_mb=69213.0, temperature=42, fan_speed=None, fan_speeds=None, power_usage_watts=78.479, power_limit_watts=400.0, gpu_utilization=2, memory_utilization=0, performance_state=0, clock_speed_graphics=1155, clock_speed_sm=1155, clock_speed_memory=1593). |
|
[2025-01-31 13:18:45,163][oumi][rank3][pid:10761][MainThread][INFO]][torch_utils.py:117] Peak GPU memory usage: 47.51 GB |
|
[2025-01-31 13:18:45,163][oumi][rank3][pid:10761][MainThread][INFO]][train.py:327] Saving final state... |
|
[2025-01-31 13:18:45,431][oumi][rank3][pid:10761][MainThread][INFO]][train.py:332] Saving final model... |
|
[2025-01-31 13:19:21,035][oumi][rank3][pid:10761][MainThread][INFO]][train.py:339] |
|
|
|
» We're always looking for feedback. What's one thing we can improve? https://oumi.ai/feedback |
|
|