mistral-7b-instruct-v0.3-mimic4-adapt-multilabel-classify
/
classification_log_2025-06-16_21-42-41.log
2025-06-16 21:42:41,136 - INFO - ================================================================================ - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:42:41,136 - INFO - = π INITIALIZING TRAINING ENVIRONMENT = - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:42:41,136 - INFO - ================================================================================ - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:42:41,136 - INFO - π Setting up data paths and environment variables... - [multilabel_classify.py:3940:main] | |
2025-06-16 21:42:41,137 - INFO - π Using output directory: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3946:main] | |
2025-06-16 21:42:41,137 - INFO - π οΈ Command-line Arguments: - [multilabel_classify.py:371:print_args] | |
2025-06-16 21:42:41,137 - INFO - | |
πΉ output_dir: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b | |
πΉ source_url: XURLs.MIMIC4_DEMO | |
πΉ data: mimic4_icd10_full | |
πΉ logfile: classification_log | |
πΉ base_dir: ../tmp/MIMIC4_DEMO | |
πΉ hub_model_id: deb101/mistral-7b-instruct-v0.3-mimic4-adapt | |
πΉ model_name: mistralai/Mistral-7B-Instruct-v0.3 | |
πΉ max_length: 512 | |
πΉ do_fresh_training: True | |
πΉ load_from_checkpoint: False | |
πΉ task: multilabel-classify | |
πΉ num_train_epochs: 5 | |
πΉ per_device_train_batch_size: 8 | |
πΉ per_device_eval_batch_size: 8 | |
πΉ metric_for_best_model: precision_at_15 | |
πΉ learning_rate: 0.0001 | |
πΉ final_lr_scheduling: 1e-06 | |
πΉ warmup_steps: 500 | |
πΉ logfile_path: ../tmp/logs/classification_log_2025-06-16_21-42-41.log | |
πΉ source: /home/ubuntu/.xcube/data/mimic4_demo - [multilabel_classify.py:372:print_args] | |
2025-06-16 21:42:41,137 - INFO - ββββββββββββββββββββββββββββββ - [multilabel_classify.py:373:print_args] | |
2025-06-16 21:42:41,148 - INFO - | |
π Quick Git Info: π xcube | πΏ plant | π 9a164a6 | π€ Debjyoti Saha Roy | π’ STAGED (1) | π¬ git show 9a164a6 - [multilabel_classify.py:3952:main] | |
2025-06-16 21:42:41,148 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:42:41,148 - INFO - + β¨ LOADING DATASETS + - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:42:41,148 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:42:41,148 - INFO - π Loading main datasets.... - [multilabel_classify.py:3955:main] | |
2025-06-16 21:42:49,755 - INFO - π Total unique labels in dataset: 7942 - [multilabel_classify.py:3731:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,768 - INFO - π§ͺ Attempt 1: Sampled 122 rows covering 863 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,777 - INFO - π§ͺ Attempt 2: Sampled 122 rows covering 816 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,786 - INFO - π§ͺ Attempt 3: Sampled 122 rows covering 885 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,795 - INFO - π§ͺ Attempt 4: Sampled 122 rows covering 828 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,804 - INFO - π§ͺ Attempt 5: Sampled 122 rows covering 879 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,813 - INFO - π§ͺ Attempt 6: Sampled 122 rows covering 852 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,821 - INFO - π§ͺ Attempt 7: Sampled 122 rows covering 838 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,831 - INFO - π§ͺ Attempt 8: Sampled 122 rows covering 851 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,839 - INFO - π§ͺ Attempt 9: Sampled 122 rows covering 825 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,848 - INFO - π§ͺ Attempt 10: Sampled 122 rows covering 833 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage] | |
2025-06-16 21:42:49,852 - INFO - π οΈ Fixing missing labels: 7109 remaining... - [multilabel_classify.py:3778:sample_df_with_full_label_coverage] | |
2025-06-16 21:46:19,271 - INFO - β Added 1648 rows to achieve full label coverage. - [multilabel_classify.py:3810:sample_df_with_full_label_coverage] | |
2025-06-16 21:46:19,274 - INFO - π Final total labels: 7942 - [multilabel_classify.py:3813:sample_df_with_full_label_coverage] | |
2025-06-16 21:46:19,274 - INFO - β Final row count: 1770 (Valid: 420, Not-valid: 1350) - [multilabel_classify.py:3821:sample_df_with_full_label_coverage] | |
2025-06-16 21:46:20,015 - INFO - ******************************************************************************** - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:46:20,015 - INFO - * π STARTING MULTI_LABEL CLASSIFICATION MODEL TRAINING * - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:46:20,015 - INFO - ******************************************************************************** - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:46:20,015 - INFO - π Loaded authentication token from environment - [multilabel_classify.py:3982:main] | |
2025-06-16 21:46:20,015 - INFO - π·οΈ Hub Model ID for this Classification task: deb101/mistral-7b-instruct-v0.3-mimic4-adapt-multilabel-classify - [multilabel_classify.py:3986:main] | |
2025-06-16 21:46:20,016 - INFO - -------------------------------------------------------------------------------- - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:46:20,016 - INFO - - π MODEL EXISTENCE CHECK - - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:46:20,016 - INFO - -------------------------------------------------------------------------------- - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:46:20,016 - INFO - π Checking model existence locally and on Hugging Face Hub... - [multilabel_classify.py:3846:check_model_existence] | |
2025-06-16 21:46:20,016 - INFO - β Model exists locally at: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3851:check_model_existence] | |
2025-06-16 21:46:20,070 - INFO - β Model exists on Hugging Face Hub with ID: deb101/mistral-7b-instruct-v0.3-mimic4-adapt-multilabel-classify - [multilabel_classify.py:3865:check_model_existence] | |
2025-06-16 21:46:20,070 - INFO - π Model exists either locally or on Hub - [multilabel_classify.py:3891:check_model_existence] | |
2025-06-16 21:46:20,070 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:46:20,070 - INFO - + β¨ STARTING FRESH TRAINING + - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:46:20,070 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:46:20,070 - INFO - π Starting fresh training (either forced or model not found)... - [multilabel_classify.py:3999:main] | |
2025-06-16 21:46:20,085 - WARNING - Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured. - [_login.py:415:_login] | |
2025-06-16 21:46:20,085 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:46:20,085 - INFO - + β¨ LOADING BASE MODEL + - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:46:20,085 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:46:20,085 - INFO - π₯ Loading pretrained model and tokenizer... - [multilabel_classify.py:4031:main] | |
2025-06-16 21:46:20,085 - INFO - π Starting model and tokenizer loading process... - [multilabel_classify.py:1603:load_base_model_and_tokenizer] | |
2025-06-16 21:46:20,086 - INFO - π Quantization config: 4-bit, nf4, double_quant, bfloat16 - [multilabel_classify.py:1612:load_base_model_and_tokenizer] | |
2025-06-16 21:46:20,086 - INFO - π€ Loading tokenizer for model: deb101/mistral-7b-instruct-v0.3-mimic4-adapt... - [multilabel_classify.py:1616:load_base_model_and_tokenizer] | |
2025-06-16 21:46:20,476 - INFO - π Checking if deb101/mistral-7b-instruct-v0.3-mimic4-adapt is a PEFT model... - [multilabel_classify.py:1627:load_base_model_and_tokenizer] | |
2025-06-16 21:46:20,498 - INFO - β Detected PEFT model. Base model: mistralai/Mistral-7B-Instruct-v0.3 - [multilabel_classify.py:1631:load_base_model_and_tokenizer] | |
2025-06-16 21:46:20,498 - INFO - π Loading model configuration for mistralai/Mistral-7B-Instruct-v0.3... - [multilabel_classify.py:1639:load_base_model_and_tokenizer] | |
2025-06-16 21:46:20,521 - INFO - Model type: mistral, Architectures: ['MistralForCausalLM'] - [multilabel_classify.py:1654:load_base_model_and_tokenizer] | |
2025-06-16 21:46:20,521 - INFO - π§ Loading base model: mistralai/Mistral-7B-Instruct-v0.3... - [multilabel_classify.py:1722:load_base_model_and_tokenizer] | |
2025-06-16 21:46:21,030 - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). - [modeling.py:991:get_balanced_memory] | |
2025-06-16 21:46:26,310 - INFO - π§© Loading PEFT adapters for deb101/mistral-7b-instruct-v0.3-mimic4-adapt... - [multilabel_classify.py:1742:load_base_model_and_tokenizer] | |
2025-06-16 21:46:26,561 - INFO - π§ Before enabling PEFT adapters - [multilabel_classify.py:1744:load_base_model_and_tokenizer] | |
2025-06-16 21:46:26,563 - INFO - π trainable params: 0 || all params: 7,254,839,296 || trainable%: 0.0000 - [multilabel_classify.py:162:log_print_output] | |
2025-06-16 21:46:26,567 - INFO - Enabled gradients for parameters: ['base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight'] - [multilabel_classify.py:1754:load_base_model_and_tokenizer] | |
2025-06-16 21:46:26,567 - INFO - π§ After enabling PEFT adapters - [multilabel_classify.py:1755:load_base_model_and_tokenizer] | |
2025-06-16 21:46:26,569 - INFO - π trainable params: 6,815,744 || all params: 7,254,839,296 || trainable%: 0.0939 - [multilabel_classify.py:162:log_print_output] | |
2025-06-16 21:46:26,570 - INFO - β Model and tokenizer successfully loaded! - [multilabel_classify.py:1793:load_base_model_and_tokenizer] | |
2025-06-16 21:46:26,570 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:46:26,570 - INFO - + β¨ DATA PREPROCESSING + - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:46:26,570 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:46:26,570 - INFO - π Loading and preprocessing training data... - [multilabel_classify.py:4041:main] | |
2025-06-16 21:46:26,747 - INFO - Total number of labels: 7942 - [multilabel_classify.py:1196:preprocess_data] | |
2025-06-16 21:46:26,747 - INFO - Rare labels (freq < 50): 7817 - [multilabel_classify.py:1197:preprocess_data] | |
2025-06-16 21:46:26,747 - INFO - Not rare labels (freq >= 50): 125 - [multilabel_classify.py:1198:preprocess_data] | |
2025-06-16 21:46:26,747 - INFO - Label partitions and classes saved to ../tmp/MIMIC4_DEMO/labels_partition.json - [multilabel_classify.py:1199:preprocess_data] | |
2025-06-16 21:47:24,136 - INFO - The size of training set: 8393 - [multilabel_classify.py:1295:preprocess_data] | |
2025-06-16 21:47:24,136 - INFO - The size of Evaluation set: 2528 - [multilabel_classify.py:1296:preprocess_data] | |
2025-06-16 21:47:24,538 - INFO - Number of unique ICD-10 codes: 7942 - [multilabel_classify.py:4047:main] | |
2025-06-16 21:47:24,541 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:47:24,541 - INFO - + β¨ MODEL INITIALIZATION + - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:47:24,541 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:47:24,541 - INFO - π§ Initializing custom L2R model for outputting per-token relevance scores per ICD-10 codes. - [multilabel_classify.py:4050:main] | |
2025-06-16 21:47:24,541 - INFO - π₯π Creating MultilabelICDClassifier - Standard multilabel medical classifier! π¬π« - [multilabel_classify.py:884:define_model] | |
2025-06-16 21:47:24,541 - INFO - Will now start to create Multilabel-Classification Model from the base model - [multilabel_classify.py:567:__init__] | |
2025-06-16 21:47:24,545 - INFO - π trainable params: 6,815,744 || all params: 3,765,178,368 || trainable%: 0.1810 - [utils.py:476:compute_trainable_params] | |
2025-06-16 21:47:26,261 - INFO - Creating the Multi-Label Classification Model from base model mistralai/Mistral-7B-Instruct-v0.3 completed!!! - [multilabel_classify.py:609:__init__] | |
2025-06-16 21:47:26,265 - INFO - π trainable params: 171,532,417 || all params: 3,929,895,041 || trainable%: 4.3648 - [utils.py:476:compute_trainable_params] | |
2025-06-16 21:47:26,265 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:47:26,265 - INFO - + β¨ TRAINING PREPARATION + - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:47:26,265 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:47:26,265 - INFO - βοΈ Preparing training components and optimizers... - [multilabel_classify.py:4057:main] | |
2025-06-16 21:47:26,349 - INFO - π₯οΈ Device: NVIDIA GH200 480GB - [multilabel_classify.py:1043:log_training_configuration] | |
2025-06-16 21:47:26,349 - INFO - π CUDA Available: True - [multilabel_classify.py:1046:log_training_configuration] | |
2025-06-16 21:47:26,349 - INFO - πΎ CUDA Device Count: 1 - [multilabel_classify.py:1047:log_training_configuration] | |
2025-06-16 21:47:26,351 - INFO - | |
π Training Configuration π | |
+----------+-----------------------------+------------------------------------------------------------------+ | |
| π Emoji | π·οΈ Parameter | π Value | | |
+----------+-----------------------------+------------------------------------------------------------------+ | |
| π | Output Directory | ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b | | |
| π | Training Epochs | 5 | | |
| ποΈ | Train Batch Size | 8 | | |
| π | Eval Batch Size | 8 | | |
| π | Gradient Accumulation Steps | 4 | | |
| π | Learning Rate | 0.0001 | | |
| π | Warmup Steps | 500 | | |
| πΎ | Save Strategy | epoch | | |
| πΎ | Save Total Limit | 10 | | |
| π | Evaluation Strategy | epoch | | |
| π― | Best Model Metric | precision_at_15 | | |
| π | Logging Strategy | steps (every 10 steps) | | |
| π | Push to Hub | True | | |
| π | Hub Model ID | deb101/mistral-7b-instruct-v0.3-mimic4-adapt-multilabel-classify | | |
| π’ | Steps per Epoch | 262 | | |
| π’ | Total Training Steps | 1310 | | |
| π’ | Evaluation Steps | 316 | | |
| π | Training Dataset Size | 8393 samples ποΈ | | |
| π | Evaluation Dataset Size | 2528 samples π | | |
+----------+-----------------------------+------------------------------------------------------------------+ - [multilabel_classify.py:1035:log_training_args] | |
2025-06-16 21:47:26,351 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section] | |
2025-06-16 21:47:26,351 - INFO - + β¨ MODEL TRAINING + - [multilabel_classify.py:104:log_section] | |
2025-06-16 21:47:26,351 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section] | |
2025-06-16 21:47:26,352 - INFO - ποΈ Starting model training process... - [multilabel_classify.py:4079:main] | |
2025-06-16 21:47:26,396 - INFO - We are registering the tokenizer deb101/mistral-7b-instruct-v0.3-mimic4-adapt in Custom Trainer - [multilabel_classify.py:2364:__init__] | |
2025-06-16 21:47:26,644 - INFO - π Starting Training... - [multilabel_classify.py:2018:on_train_begin] | |
2025-06-16 21:47:51,185 - INFO - | |
[36mπ Training Metrics (Step 10) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -1.3281 | | |
+---------------+----------+ | |
| grad_norm | 0.009771 | | |
+---------------+----------+ | |
| learning_rate | 2e-06 | | |
+---------------+----------+ | |
| epoch | 0.038095 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:48:11,800 - INFO - | |
[36mπ Training Metrics (Step 20) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -1.2681 | | |
+---------------+----------+ | |
| grad_norm | 0.008548 | | |
+---------------+----------+ | |
| learning_rate | 4e-06 | | |
+---------------+----------+ | |
| epoch | 0.07619 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:48:32,431 - INFO - | |
[36mπ Training Metrics (Step 30) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -1.2844 | | |
+---------------+----------+ | |
| grad_norm | 0.010616 | | |
+---------------+----------+ | |
| learning_rate | 6e-06 | | |
+---------------+----------+ | |
| epoch | 0.114286 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:48:53,066 - INFO - | |
[36mπ Training Metrics (Step 40) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -1.33 | | |
+---------------+----------+ | |
| grad_norm | 0.020943 | | |
+---------------+----------+ | |
| learning_rate | 8e-06 | | |
+---------------+----------+ | |
| epoch | 0.152381 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:49:13,743 - INFO - | |
[36mπ Training Metrics (Step 50) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -1.3244 | | |
+---------------+----------+ | |
| grad_norm | 0.069555 | | |
+---------------+----------+ | |
| learning_rate | 1e-05 | | |
+---------------+----------+ | |
| epoch | 0.190476 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:49:34,457 - INFO - | |
[36mπ Training Metrics (Step 60) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -1.3176 | | |
+---------------+----------+ | |
| grad_norm | 1.1682 | | |
+---------------+----------+ | |
| learning_rate | 1.2e-05 | | |
+---------------+----------+ | |
| epoch | 0.228571 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:49:55,223 - INFO - | |
[36mπ Training Metrics (Step 70) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.083 | | |
+---------------+----------+ | |
| grad_norm | 1.83967 | | |
+---------------+----------+ | |
| learning_rate | 1.4e-05 | | |
+---------------+----------+ | |
| epoch | 0.266667 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:50:15,943 - INFO - | |
[36mπ Training Metrics (Step 80) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.3497 | | |
+---------------+----------+ | |
| grad_norm | 0.998332 | | |
+---------------+----------+ | |
| learning_rate | 1.6e-05 | | |
+---------------+----------+ | |
| epoch | 0.304762 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:50:36,662 - INFO - | |
[36mπ Training Metrics (Step 90) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.4039 | | |
+---------------+----------+ | |
| grad_norm | 1.1678 | | |
+---------------+----------+ | |
| learning_rate | 1.8e-05 | | |
+---------------+----------+ | |
| epoch | 0.342857 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:50:57,346 - INFO - | |
[36mπ Training Metrics (Step 100) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.5742 | | |
+---------------+----------+ | |
| grad_norm | 2.53676 | | |
+---------------+----------+ | |
| learning_rate | 2e-05 | | |
+---------------+----------+ | |
| epoch | 0.380952 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:51:18,014 - INFO - | |
[36mπ Training Metrics (Step 110) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.599 | | |
+---------------+----------+ | |
| grad_norm | 1.1263 | | |
+---------------+----------+ | |
| learning_rate | 2.2e-05 | | |
+---------------+----------+ | |
| epoch | 0.419048 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:51:38,668 - INFO - | |
[36mπ Training Metrics (Step 120) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.7175 | | |
+---------------+----------+ | |
| grad_norm | 1.1186 | | |
+---------------+----------+ | |
| learning_rate | 2.4e-05 | | |
+---------------+----------+ | |
| epoch | 0.457143 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:51:59,288 - INFO - | |
[36mπ Training Metrics (Step 130) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.6394 | | |
+---------------+----------+ | |
| grad_norm | 1.62559 | | |
+---------------+----------+ | |
| learning_rate | 2.6e-05 | | |
+---------------+----------+ | |
| epoch | 0.495238 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:52:19,922 - INFO - | |
[36mπ Training Metrics (Step 140) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.7082 | | |
+---------------+----------+ | |
| grad_norm | 0.867737 | | |
+---------------+----------+ | |
| learning_rate | 2.8e-05 | | |
+---------------+----------+ | |
| epoch | 0.533333 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:52:40,541 - INFO - | |
[36mπ Training Metrics (Step 150) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.4458 | | |
+---------------+----------+ | |
| grad_norm | 1.29707 | | |
+---------------+----------+ | |
| learning_rate | 3e-05 | | |
+---------------+----------+ | |
| epoch | 0.571429 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:53:01,158 - INFO - | |
[36mπ Training Metrics (Step 160) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.5431 | | |
+---------------+----------+ | |
| grad_norm | 0.911872 | | |
+---------------+----------+ | |
| learning_rate | 3.2e-05 | | |
+---------------+----------+ | |
| epoch | 0.609524 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:53:21,750 - INFO - | |
[36mπ Training Metrics (Step 170) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.6463 | | |
+---------------+----------+ | |
| grad_norm | 1.06875 | | |
+---------------+----------+ | |
| learning_rate | 3.4e-05 | | |
+---------------+----------+ | |
| epoch | 0.647619 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:53:42,366 - INFO - | |
[36mπ Training Metrics (Step 180) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.5042 | | |
+---------------+----------+ | |
| grad_norm | 1.2099 | | |
+---------------+----------+ | |
| learning_rate | 3.6e-05 | | |
+---------------+----------+ | |
| epoch | 0.685714 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:54:02,983 - INFO - | |
[36mπ Training Metrics (Step 190) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.5143 | | |
+---------------+----------+ | |
| grad_norm | 0.903909 | | |
+---------------+----------+ | |
| learning_rate | 3.8e-05 | | |
+---------------+----------+ | |
| epoch | 0.72381 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:54:23,596 - INFO - | |
[36mπ Training Metrics (Step 200) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.4899 | | |
+---------------+----------+ | |
| grad_norm | 0.870928 | | |
+---------------+----------+ | |
| learning_rate | 4e-05 | | |
+---------------+----------+ | |
| epoch | 0.761905 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:54:44,214 - INFO - | |
[36mπ Training Metrics (Step 210) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.7119 | | |
+---------------+---------+ | |
| grad_norm | 1.17446 | | |
+---------------+---------+ | |
| learning_rate | 4.2e-05 | | |
+---------------+---------+ | |
| epoch | 0.8 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:55:04,855 - INFO - | |
[36mπ Training Metrics (Step 220) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.6202 | | |
+---------------+----------+ | |
| grad_norm | 1.01015 | | |
+---------------+----------+ | |
| learning_rate | 4.4e-05 | | |
+---------------+----------+ | |
| epoch | 0.838095 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:55:25,489 - INFO - | |
[36mπ Training Metrics (Step 230) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.5483 | | |
+---------------+---------+ | |
| grad_norm | 1.28282 | | |
+---------------+---------+ | |
| learning_rate | 4.6e-05 | | |
+---------------+---------+ | |
| epoch | 0.87619 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:55:46,146 - INFO - | |
[36mπ Training Metrics (Step 240) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.5052 | | |
+---------------+----------+ | |
| grad_norm | 1.70203 | | |
+---------------+----------+ | |
| learning_rate | 4.8e-05 | | |
+---------------+----------+ | |
| epoch | 0.914286 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:56:06,825 - INFO - | |
[36mπ Training Metrics (Step 250) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.519 | | |
+---------------+----------+ | |
| grad_norm | 5.4329 | | |
+---------------+----------+ | |
| learning_rate | 5e-05 | | |
+---------------+----------+ | |
| epoch | 0.952381 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:56:27,493 - INFO - | |
[36mπ Training Metrics (Step 260) π | |
+---------------+----------+ | |
| Metric | Value | | |
+===============+==========+ | |
| loss | -2.5733 | | |
+---------------+----------+ | |
| grad_norm | 3.50952 | | |
+---------------+----------+ | |
| learning_rate | 5.2e-05 | | |
+---------------+----------+ | |
| epoch | 0.990476 | | |
+---------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 21:56:32,140 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate] | |
2025-06-16 22:15:34,916 - INFO - | |
[33mπ Evaluation Metrics π | |
+-------------------------------+----------+ | |
| Metric | Value | | |
+===============================+==========+ | |
| eval_f1_micro | 0.008615 | | |
+-------------------------------+----------+ | |
| eval_f1_macro | 0.005978 | | |
+-------------------------------+----------+ | |
| eval_precision_at_5 | 0.203244 | | |
+-------------------------------+----------+ | |
| eval_recall_at_5 | 0.0452 | | |
+-------------------------------+----------+ | |
| eval_precision_at_8 | 0.197538 | | |
+-------------------------------+----------+ | |
| eval_recall_at_8 | 0.069411 | | |
+-------------------------------+----------+ | |
| eval_precision_at_15 | 0.182648 | | |
+-------------------------------+----------+ | |
| eval_recall_at_15 | 0.11848 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_micro | 0.005097 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_macro | 0.003982 | | |
+-------------------------------+----------+ | |
| eval_rare_precision | 0.002557 | | |
+-------------------------------+----------+ | |
| eval_rare_recall | 0.789408 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_5 | 0.036946 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_5 | 0.011236 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_8 | 0.032931 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_8 | 0.016217 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_15 | 0.029008 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_15 | 0.027004 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_micro | 0.135446 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_macro | 0.130824 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision | 0.072642 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall | 1 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_5 | 0.201187 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_5 | 0.118668 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_8 | 0.196252 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_8 | 0.184178 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_15 | 0.180195 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_15 | 0.311517 | | |
+-------------------------------+----------+ | |
| eval_loss | -2.18078 | | |
+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate] | |
2025-06-16 22:15:37,017 - INFO - πΎ Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262 - [multilabel_classify.py:2469:_save] | |
2025-06-16 22:15:37,020 - INFO - βοΈ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262 - [multilabel_classify.py:2474:_save] | |
2025-06-16 22:15:37,021 - INFO - π Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262: | |
+---------+--------------------+------------+ | |
| Index | Saved File | Size | | |
+=========+====================+============+ | |
| 1 | training_args.bin | 0.01 MB | | |
+---------+--------------------+------------+ | |
| 2 | optimizer.pt | 1308.77 MB | | |
+---------+--------------------+------------+ | |
| 3 | model.safetensors | 4600.97 MB | | |
+---------+--------------------+------------+ | |
| 4 | scaler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 5 | config.json | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 6 | scheduler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 7 | trainer_state.json | 0.01 MB | | |
+---------+--------------------+------------+ | |
| 8 | rng_state.pth | 0.01 MB | | |
+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save] | |
2025-06-16 22:15:58,054 - INFO - | |
[36mπ Training Metrics (Step 270) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6944 | | |
+---------------+---------+ | |
| grad_norm | 1.66623 | | |
+---------------+---------+ | |
| learning_rate | 5.4e-05 | | |
+---------------+---------+ | |
| epoch | 1.03048 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:16:18,682 - INFO - | |
[36mπ Training Metrics (Step 280) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6468 | | |
+---------------+---------+ | |
| grad_norm | 5.73856 | | |
+---------------+---------+ | |
| learning_rate | 5.6e-05 | | |
+---------------+---------+ | |
| epoch | 1.06857 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:16:39,356 - INFO - | |
[36mπ Training Metrics (Step 290) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.7772 | | |
+---------------+---------+ | |
| grad_norm | 1.32201 | | |
+---------------+---------+ | |
| learning_rate | 5.8e-05 | | |
+---------------+---------+ | |
| epoch | 1.10667 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:17:00,040 - INFO - | |
[36mπ Training Metrics (Step 300) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.5104 | | |
+---------------+---------+ | |
| grad_norm | 1.28285 | | |
+---------------+---------+ | |
| learning_rate | 6e-05 | | |
+---------------+---------+ | |
| epoch | 1.14476 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:17:20,738 - INFO - | |
[36mπ Training Metrics (Step 310) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6475 | | |
+---------------+---------+ | |
| grad_norm | 2.5773 | | |
+---------------+---------+ | |
| learning_rate | 6.2e-05 | | |
+---------------+---------+ | |
| epoch | 1.18286 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:17:41,412 - INFO - | |
[36mπ Training Metrics (Step 320) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.5365 | | |
+---------------+---------+ | |
| grad_norm | 3.06585 | | |
+---------------+---------+ | |
| learning_rate | 6.4e-05 | | |
+---------------+---------+ | |
| epoch | 1.22095 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:18:02,294 - INFO - | |
[36mπ Training Metrics (Step 330) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6598 | | |
+---------------+---------+ | |
| grad_norm | 2.69829 | | |
+---------------+---------+ | |
| learning_rate | 6.6e-05 | | |
+---------------+---------+ | |
| epoch | 1.25905 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:18:22,986 - INFO - | |
[36mπ Training Metrics (Step 340) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6969 | | |
+---------------+---------+ | |
| grad_norm | 4.4668 | | |
+---------------+---------+ | |
| learning_rate | 6.8e-05 | | |
+---------------+---------+ | |
| epoch | 1.29714 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:18:43,681 - INFO - | |
[36mπ Training Metrics (Step 350) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.5643 | | |
+---------------+---------+ | |
| grad_norm | 2.15029 | | |
+---------------+---------+ | |
| learning_rate | 7e-05 | | |
+---------------+---------+ | |
| epoch | 1.33524 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:19:04,375 - INFO - | |
[36mπ Training Metrics (Step 360) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6392 | | |
+---------------+---------+ | |
| grad_norm | 4.94501 | | |
+---------------+---------+ | |
| learning_rate | 7.2e-05 | | |
+---------------+---------+ | |
| epoch | 1.37333 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:19:25,066 - INFO - | |
[36mπ Training Metrics (Step 370) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6347 | | |
+---------------+---------+ | |
| grad_norm | 1.98727 | | |
+---------------+---------+ | |
| learning_rate | 7.4e-05 | | |
+---------------+---------+ | |
| epoch | 1.41143 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:19:45,772 - INFO - | |
[36mπ Training Metrics (Step 380) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.8543 | | |
+---------------+---------+ | |
| grad_norm | 10.4538 | | |
+---------------+---------+ | |
| learning_rate | 7.6e-05 | | |
+---------------+---------+ | |
| epoch | 1.44952 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:20:06,478 - INFO - | |
[36mπ Training Metrics (Step 390) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9327 | | |
+---------------+---------+ | |
| grad_norm | 2.76192 | | |
+---------------+---------+ | |
| learning_rate | 7.8e-05 | | |
+---------------+---------+ | |
| epoch | 1.48762 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:20:27,183 - INFO - | |
[36mπ Training Metrics (Step 400) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.8006 | | |
+---------------+---------+ | |
| grad_norm | 4.33592 | | |
+---------------+---------+ | |
| learning_rate | 8e-05 | | |
+---------------+---------+ | |
| epoch | 1.52571 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:20:47,886 - INFO - | |
[36mπ Training Metrics (Step 410) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.7399 | | |
+---------------+---------+ | |
| grad_norm | 5.77753 | | |
+---------------+---------+ | |
| learning_rate | 8.2e-05 | | |
+---------------+---------+ | |
| epoch | 1.56381 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:21:08,595 - INFO - | |
[36mπ Training Metrics (Step 420) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.8422 | | |
+---------------+---------+ | |
| grad_norm | 3.92247 | | |
+---------------+---------+ | |
| learning_rate | 8.4e-05 | | |
+---------------+---------+ | |
| epoch | 1.6019 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:21:29,301 - INFO - | |
[36mπ Training Metrics (Step 430) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.2485 | | |
+---------------+---------+ | |
| grad_norm | 4.74773 | | |
+---------------+---------+ | |
| learning_rate | 8.5e-05 | | |
+---------------+---------+ | |
| epoch | 1.64 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:21:49,992 - INFO - | |
[36mπ Training Metrics (Step 440) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9139 | | |
+---------------+---------+ | |
| grad_norm | 3.45202 | | |
+---------------+---------+ | |
| learning_rate | 8.7e-05 | | |
+---------------+---------+ | |
| epoch | 1.67809 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:22:10,679 - INFO - | |
[36mπ Training Metrics (Step 450) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6702 | | |
+---------------+---------+ | |
| grad_norm | 3.90906 | | |
+---------------+---------+ | |
| learning_rate | 8.9e-05 | | |
+---------------+---------+ | |
| epoch | 1.71619 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:22:31,373 - INFO - | |
[36mπ Training Metrics (Step 460) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.8502 | | |
+---------------+---------+ | |
| grad_norm | 3.40761 | | |
+---------------+---------+ | |
| learning_rate | 9.1e-05 | | |
+---------------+---------+ | |
| epoch | 1.75429 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:22:52,068 - INFO - | |
[36mπ Training Metrics (Step 470) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.6422 | | |
+---------------+---------+ | |
| grad_norm | 4.53801 | | |
+---------------+---------+ | |
| learning_rate | 9.3e-05 | | |
+---------------+---------+ | |
| epoch | 1.79238 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:23:12,780 - INFO - | |
[36mπ Training Metrics (Step 480) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9417 | | |
+---------------+---------+ | |
| grad_norm | 2.15287 | | |
+---------------+---------+ | |
| learning_rate | 9.5e-05 | | |
+---------------+---------+ | |
| epoch | 1.83048 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:23:33,479 - INFO - | |
[36mπ Training Metrics (Step 490) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9854 | | |
+---------------+---------+ | |
| grad_norm | 3.23702 | | |
+---------------+---------+ | |
| learning_rate | 9.7e-05 | | |
+---------------+---------+ | |
| epoch | 1.86857 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:23:54,194 - INFO - | |
[36mπ Training Metrics (Step 500) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.825 | | |
+---------------+---------+ | |
| grad_norm | 2.25583 | | |
+---------------+---------+ | |
| learning_rate | 9.9e-05 | | |
+---------------+---------+ | |
| epoch | 1.90667 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:24:14,880 - INFO - | |
[36mπ Training Metrics (Step 510) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0095 | | |
+---------------+---------+ | |
| grad_norm | 4.32831 | | |
+---------------+---------+ | |
| learning_rate | 0.0001 | | |
+---------------+---------+ | |
| epoch | 1.94476 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:24:35,542 - INFO - | |
[36mπ Training Metrics (Step 520) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.8745 | | |
+---------------+---------+ | |
| grad_norm | 10.9455 | | |
+---------------+---------+ | |
| learning_rate | 0.0001 | | |
+---------------+---------+ | |
| epoch | 1.98286 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:24:44,336 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate] | |
2025-06-16 22:43:33,519 - INFO - | |
[33mπ Evaluation Metrics π | |
+-------------------------------+----------+ | |
| Metric | Value | | |
+===============================+==========+ | |
| eval_f1_micro | 0.007046 | | |
+-------------------------------+----------+ | |
| eval_f1_macro | 0.006161 | | |
+-------------------------------+----------+ | |
| eval_precision_at_5 | 0.115348 | | |
+-------------------------------+----------+ | |
| eval_recall_at_5 | 0.031091 | | |
+-------------------------------+----------+ | |
| eval_precision_at_8 | 0.107892 | | |
+-------------------------------+----------+ | |
| eval_recall_at_8 | 0.045557 | | |
+-------------------------------+----------+ | |
| eval_precision_at_15 | 0.093328 | | |
+-------------------------------+----------+ | |
| eval_recall_at_15 | 0.072304 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_micro | 0.004353 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_macro | 0.004122 | | |
+-------------------------------+----------+ | |
| eval_rare_precision | 0.002182 | | |
+-------------------------------+----------+ | |
| eval_rare_recall | 0.868453 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_5 | 0.039082 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_5 | 0.015531 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_8 | 0.033327 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_8 | 0.020969 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_15 | 0.028138 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_15 | 0.032345 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_micro | 0.139875 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_macro | 0.133686 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision | 0.07536 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall | 0.971989 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_5 | 0.173497 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_5 | 0.110951 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_8 | 0.15442 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_8 | 0.155294 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_15 | 0.140005 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_15 | 0.255021 | | |
+-------------------------------+----------+ | |
| eval_loss | -2.29713 | | |
+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate] | |
2025-06-16 22:43:36,565 - INFO - πΎ Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-524 - [multilabel_classify.py:2469:_save] | |
2025-06-16 22:43:36,567 - INFO - βοΈ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-524 - [multilabel_classify.py:2474:_save] | |
2025-06-16 22:43:36,568 - INFO - π Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-524: | |
+---------+--------------------+------------+ | |
| Index | Saved File | Size | | |
+=========+====================+============+ | |
| 1 | training_args.bin | 0.01 MB | | |
+---------+--------------------+------------+ | |
| 2 | optimizer.pt | 1308.77 MB | | |
+---------+--------------------+------------+ | |
| 3 | model.safetensors | 4600.97 MB | | |
+---------+--------------------+------------+ | |
| 4 | scaler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 5 | config.json | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 6 | scheduler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 7 | trainer_state.json | 0.01 MB | | |
+---------+--------------------+------------+ | |
| 8 | rng_state.pth | 0.01 MB | | |
+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save] | |
2025-06-16 22:43:53,283 - INFO - | |
[36mπ Training Metrics (Step 530) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.1318 | | |
+---------------+---------+ | |
| grad_norm | 6.27872 | | |
+---------------+---------+ | |
| learning_rate | 0.0001 | | |
+---------------+---------+ | |
| epoch | 2.02286 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:44:13,913 - INFO - | |
[36mπ Training Metrics (Step 540) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9013 | | |
+---------------+---------+ | |
| grad_norm | 5.13338 | | |
+---------------+---------+ | |
| learning_rate | 9.9e-05 | | |
+---------------+---------+ | |
| epoch | 2.06095 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:44:34,550 - INFO - | |
[36mπ Training Metrics (Step 550) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9378 | | |
+---------------+---------+ | |
| grad_norm | 2.31937 | | |
+---------------+---------+ | |
| learning_rate | 9.9e-05 | | |
+---------------+---------+ | |
| epoch | 2.09905 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:44:55,220 - INFO - | |
[36mπ Training Metrics (Step 560) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.1557 | | |
+---------------+---------+ | |
| grad_norm | 2.88556 | | |
+---------------+---------+ | |
| learning_rate | 9.9e-05 | | |
+---------------+---------+ | |
| epoch | 2.13714 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:45:15,871 - INFO - | |
[36mπ Training Metrics (Step 570) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.1589 | | |
+---------------+---------+ | |
| grad_norm | 16.5141 | | |
+---------------+---------+ | |
| learning_rate | 9.8e-05 | | |
+---------------+---------+ | |
| epoch | 2.17524 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:45:36,535 - INFO - | |
[36mπ Training Metrics (Step 580) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0709 | | |
+---------------+---------+ | |
| grad_norm | 20.6117 | | |
+---------------+---------+ | |
| learning_rate | 9.8e-05 | | |
+---------------+---------+ | |
| epoch | 2.21333 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:45:57,196 - INFO - | |
[36mπ Training Metrics (Step 590) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0031 | | |
+---------------+---------+ | |
| grad_norm | 3.41571 | | |
+---------------+---------+ | |
| learning_rate | 9.7e-05 | | |
+---------------+---------+ | |
| epoch | 2.25143 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:46:17,859 - INFO - | |
[36mπ Training Metrics (Step 600) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0642 | | |
+---------------+---------+ | |
| grad_norm | 3.67429 | | |
+---------------+---------+ | |
| learning_rate | 9.7e-05 | | |
+---------------+---------+ | |
| epoch | 2.28952 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:46:38,540 - INFO - | |
[36mπ Training Metrics (Step 610) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.8556 | | |
+---------------+---------+ | |
| grad_norm | 3.29057 | | |
+---------------+---------+ | |
| learning_rate | 9.6e-05 | | |
+---------------+---------+ | |
| epoch | 2.32762 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:46:59,219 - INFO - | |
[36mπ Training Metrics (Step 620) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0252 | | |
+---------------+---------+ | |
| grad_norm | 4.15559 | | |
+---------------+---------+ | |
| learning_rate | 9.5e-05 | | |
+---------------+---------+ | |
| epoch | 2.36571 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:47:19,873 - INFO - | |
[36mπ Training Metrics (Step 630) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.043 | | |
+---------------+---------+ | |
| grad_norm | 7.9306 | | |
+---------------+---------+ | |
| learning_rate | 9.4e-05 | | |
+---------------+---------+ | |
| epoch | 2.40381 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:47:40,534 - INFO - | |
[36mπ Training Metrics (Step 640) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9925 | | |
+---------------+---------+ | |
| grad_norm | 5.63441 | | |
+---------------+---------+ | |
| learning_rate | 9.3e-05 | | |
+---------------+---------+ | |
| epoch | 2.44191 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:48:01,208 - INFO - | |
[36mπ Training Metrics (Step 650) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.1192 | | |
+---------------+---------+ | |
| grad_norm | 6.16559 | | |
+---------------+---------+ | |
| learning_rate | 9.2e-05 | | |
+---------------+---------+ | |
| epoch | 2.48 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:48:21,875 - INFO - | |
[36mπ Training Metrics (Step 660) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.7578 | | |
+---------------+---------+ | |
| grad_norm | 7.27245 | | |
+---------------+---------+ | |
| learning_rate | 9.1e-05 | | |
+---------------+---------+ | |
| epoch | 2.5181 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:48:42,549 - INFO - | |
[36mπ Training Metrics (Step 670) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0685 | | |
+---------------+---------+ | |
| grad_norm | 4.86883 | | |
+---------------+---------+ | |
| learning_rate | 9e-05 | | |
+---------------+---------+ | |
| epoch | 2.55619 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:49:03,221 - INFO - | |
[36mπ Training Metrics (Step 680) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.3125 | | |
+---------------+---------+ | |
| grad_norm | 4.60443 | | |
+---------------+---------+ | |
| learning_rate | 8.9e-05 | | |
+---------------+---------+ | |
| epoch | 2.59429 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:49:23,892 - INFO - | |
[36mπ Training Metrics (Step 690) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9993 | | |
+---------------+---------+ | |
| grad_norm | 5.1602 | | |
+---------------+---------+ | |
| learning_rate | 8.8e-05 | | |
+---------------+---------+ | |
| epoch | 2.63238 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:49:44,576 - INFO - | |
[36mπ Training Metrics (Step 700) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -2.9074 | | |
+---------------+---------+ | |
| grad_norm | 3.71175 | | |
+---------------+---------+ | |
| learning_rate | 8.6e-05 | | |
+---------------+---------+ | |
| epoch | 2.67048 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:50:05,236 - INFO - | |
[36mπ Training Metrics (Step 710) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.248 | | |
+---------------+---------+ | |
| grad_norm | 5.70862 | | |
+---------------+---------+ | |
| learning_rate | 8.5e-05 | | |
+---------------+---------+ | |
| epoch | 2.70857 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:50:25,900 - INFO - | |
[36mπ Training Metrics (Step 720) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.1012 | | |
+---------------+---------+ | |
| grad_norm | 3.30394 | | |
+---------------+---------+ | |
| learning_rate | 8.3e-05 | | |
+---------------+---------+ | |
| epoch | 2.74667 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:50:46,563 - INFO - | |
[36mπ Training Metrics (Step 730) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.2892 | | |
+---------------+---------+ | |
| grad_norm | 4.57689 | | |
+---------------+---------+ | |
| learning_rate | 8.2e-05 | | |
+---------------+---------+ | |
| epoch | 2.78476 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:51:07,213 - INFO - | |
[36mπ Training Metrics (Step 740) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0007 | | |
+---------------+---------+ | |
| grad_norm | 4.63606 | | |
+---------------+---------+ | |
| learning_rate | 8.1e-05 | | |
+---------------+---------+ | |
| epoch | 2.82286 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:51:27,880 - INFO - | |
[36mπ Training Metrics (Step 750) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0416 | | |
+---------------+---------+ | |
| grad_norm | 6.01303 | | |
+---------------+---------+ | |
| learning_rate | 7.9e-05 | | |
+---------------+---------+ | |
| epoch | 2.86095 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:51:48,533 - INFO - | |
[36mπ Training Metrics (Step 760) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.2314 | | |
+---------------+---------+ | |
| grad_norm | 3.14631 | | |
+---------------+---------+ | |
| learning_rate | 7.7e-05 | | |
+---------------+---------+ | |
| epoch | 2.89905 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:52:09,199 - INFO - | |
[36mπ Training Metrics (Step 770) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.1514 | | |
+---------------+---------+ | |
| grad_norm | 3.72293 | | |
+---------------+---------+ | |
| learning_rate | 7.6e-05 | | |
+---------------+---------+ | |
| epoch | 2.93714 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:52:29,843 - INFO - | |
[36mπ Training Metrics (Step 780) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.0665 | | |
+---------------+---------+ | |
| grad_norm | 6.07238 | | |
+---------------+---------+ | |
| learning_rate | 7.4e-05 | | |
+---------------+---------+ | |
| epoch | 2.97524 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 22:52:42,757 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate] | |
2025-06-16 23:11:26,833 - INFO - | |
[33mπ Evaluation Metrics π | |
+-------------------------------+----------+ | |
| Metric | Value | | |
+===============================+==========+ | |
| eval_f1_micro | 0.006362 | | |
+-------------------------------+----------+ | |
| eval_f1_macro | 0.006038 | | |
+-------------------------------+----------+ | |
| eval_precision_at_5 | 0.052532 | | |
+-------------------------------+----------+ | |
| eval_recall_at_5 | 0.014819 | | |
+-------------------------------+----------+ | |
| eval_precision_at_8 | 0.045045 | | |
+-------------------------------+----------+ | |
| eval_recall_at_8 | 0.020262 | | |
+-------------------------------+----------+ | |
| eval_precision_at_15 | 0.039214 | | |
+-------------------------------+----------+ | |
| eval_recall_at_15 | 0.030945 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_micro | 0.004069 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_macro | 0.004018 | | |
+-------------------------------+----------+ | |
| eval_rare_precision | 0.002039 | | |
+-------------------------------+----------+ | |
| eval_rare_recall | 0.968818 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_5 | 0.015032 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_5 | 0.006142 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_8 | 0.0134 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_8 | 0.008599 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_15 | 0.010707 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_15 | 0.012881 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_micro | 0.137554 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_macro | 0.132349 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision | 0.073945 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall | 0.983969 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_5 | 0.149842 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_5 | 0.09499 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_8 | 0.123616 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_8 | 0.124506 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_15 | 0.114662 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_15 | 0.204124 | | |
+-------------------------------+----------+ | |
| eval_loss | -2.32242 | | |
+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate] | |
2025-06-16 23:11:30,098 - INFO - πΎ Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-786 - [multilabel_classify.py:2469:_save] | |
2025-06-16 23:11:30,100 - INFO - βοΈ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-786 - [multilabel_classify.py:2474:_save] | |
2025-06-16 23:11:30,101 - INFO - π Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-786: | |
+---------+--------------------+------------+ | |
| Index | Saved File | Size | | |
+=========+====================+============+ | |
| 1 | training_args.bin | 0.01 MB | | |
+---------+--------------------+------------+ | |
| 2 | optimizer.pt | 1308.77 MB | | |
+---------+--------------------+------------+ | |
| 3 | model.safetensors | 4600.97 MB | | |
+---------+--------------------+------------+ | |
| 4 | scaler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 5 | config.json | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 6 | scheduler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 7 | trainer_state.json | 0.02 MB | | |
+---------+--------------------+------------+ | |
| 8 | rng_state.pth | 0.01 MB | | |
+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save] | |
2025-06-16 23:11:42,691 - INFO - | |
[36mπ Training Metrics (Step 790) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.2797 | | |
+---------------+---------+ | |
| grad_norm | 3.42903 | | |
+---------------+---------+ | |
| learning_rate | 7.2e-05 | | |
+---------------+---------+ | |
| epoch | 3.01524 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:12:03,473 - INFO - | |
[36mπ Training Metrics (Step 800) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.1285 | | |
+---------------+---------+ | |
| grad_norm | 3.36507 | | |
+---------------+---------+ | |
| learning_rate | 7.1e-05 | | |
+---------------+---------+ | |
| epoch | 3.05333 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:12:24,097 - INFO - | |
[36mπ Training Metrics (Step 810) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.3476 | | |
+---------------+---------+ | |
| grad_norm | 12.2284 | | |
+---------------+---------+ | |
| learning_rate | 6.9e-05 | | |
+---------------+---------+ | |
| epoch | 3.09143 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:12:44,741 - INFO - | |
[36mπ Training Metrics (Step 820) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.2807 | | |
+---------------+---------+ | |
| grad_norm | 4.27001 | | |
+---------------+---------+ | |
| learning_rate | 6.7e-05 | | |
+---------------+---------+ | |
| epoch | 3.12952 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:13:05,412 - INFO - | |
[36mπ Training Metrics (Step 830) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.2203 | | |
+---------------+---------+ | |
| grad_norm | 7.73442 | | |
+---------------+---------+ | |
| learning_rate | 6.5e-05 | | |
+---------------+---------+ | |
| epoch | 3.16762 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:13:26,068 - INFO - | |
[36mπ Training Metrics (Step 840) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.4928 | | |
+---------------+---------+ | |
| grad_norm | 5.21309 | | |
+---------------+---------+ | |
| learning_rate | 6.3e-05 | | |
+---------------+---------+ | |
| epoch | 3.20571 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:13:46,711 - INFO - | |
[36mπ Training Metrics (Step 850) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.3565 | | |
+---------------+---------+ | |
| grad_norm | 9.77577 | | |
+---------------+---------+ | |
| learning_rate | 6.2e-05 | | |
+---------------+---------+ | |
| epoch | 3.24381 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:14:07,376 - INFO - | |
[36mπ Training Metrics (Step 860) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.3151 | | |
+---------------+---------+ | |
| grad_norm | 5.86498 | | |
+---------------+---------+ | |
| learning_rate | 6e-05 | | |
+---------------+---------+ | |
| epoch | 3.28191 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:14:28,035 - INFO - | |
[36mπ Training Metrics (Step 870) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.3922 | | |
+---------------+---------+ | |
| grad_norm | 6.25596 | | |
+---------------+---------+ | |
| learning_rate | 5.8e-05 | | |
+---------------+---------+ | |
| epoch | 3.32 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:14:48,708 - INFO - | |
[36mπ Training Metrics (Step 880) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.4076 | | |
+---------------+---------+ | |
| grad_norm | 6.42964 | | |
+---------------+---------+ | |
| learning_rate | 5.6e-05 | | |
+---------------+---------+ | |
| epoch | 3.3581 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:15:09,378 - INFO - | |
[36mπ Training Metrics (Step 890) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6344 | | |
+---------------+---------+ | |
| grad_norm | 6.22911 | | |
+---------------+---------+ | |
| learning_rate | 5.4e-05 | | |
+---------------+---------+ | |
| epoch | 3.39619 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:15:30,073 - INFO - | |
[36mπ Training Metrics (Step 900) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.4873 | | |
+---------------+---------+ | |
| grad_norm | 3.87399 | | |
+---------------+---------+ | |
| learning_rate | 5.2e-05 | | |
+---------------+---------+ | |
| epoch | 3.43429 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:15:50,754 - INFO - | |
[36mπ Training Metrics (Step 910) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.309 | | |
+---------------+---------+ | |
| grad_norm | 4.9241 | | |
+---------------+---------+ | |
| learning_rate | 5e-05 | | |
+---------------+---------+ | |
| epoch | 3.47238 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:16:11,415 - INFO - | |
[36mπ Training Metrics (Step 920) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.3892 | | |
+---------------+---------+ | |
| grad_norm | 6.75714 | | |
+---------------+---------+ | |
| learning_rate | 4.8e-05 | | |
+---------------+---------+ | |
| epoch | 3.51048 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:16:32,086 - INFO - | |
[36mπ Training Metrics (Step 930) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6367 | | |
+---------------+---------+ | |
| grad_norm | 6.01969 | | |
+---------------+---------+ | |
| learning_rate | 4.6e-05 | | |
+---------------+---------+ | |
| epoch | 3.54857 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:16:52,762 - INFO - | |
[36mπ Training Metrics (Step 940) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6291 | | |
+---------------+---------+ | |
| grad_norm | 10.1146 | | |
+---------------+---------+ | |
| learning_rate | 4.4e-05 | | |
+---------------+---------+ | |
| epoch | 3.58667 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:17:13,420 - INFO - | |
[36mπ Training Metrics (Step 950) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.5316 | | |
+---------------+---------+ | |
| grad_norm | 7.94565 | | |
+---------------+---------+ | |
| learning_rate | 4.2e-05 | | |
+---------------+---------+ | |
| epoch | 3.62476 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:17:34,093 - INFO - | |
[36mπ Training Metrics (Step 960) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.3274 | | |
+---------------+---------+ | |
| grad_norm | 4.13957 | | |
+---------------+---------+ | |
| learning_rate | 4.1e-05 | | |
+---------------+---------+ | |
| epoch | 3.66286 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:17:54,762 - INFO - | |
[36mπ Training Metrics (Step 970) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.3817 | | |
+---------------+---------+ | |
| grad_norm | 7.41069 | | |
+---------------+---------+ | |
| learning_rate | 3.9e-05 | | |
+---------------+---------+ | |
| epoch | 3.70095 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:18:15,439 - INFO - | |
[36mπ Training Metrics (Step 980) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.7929 | | |
+---------------+---------+ | |
| grad_norm | 6.45495 | | |
+---------------+---------+ | |
| learning_rate | 3.7e-05 | | |
+---------------+---------+ | |
| epoch | 3.73905 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:18:36,120 - INFO - | |
[36mπ Training Metrics (Step 990) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6203 | | |
+---------------+---------+ | |
| grad_norm | 10.8201 | | |
+---------------+---------+ | |
| learning_rate | 3.5e-05 | | |
+---------------+---------+ | |
| epoch | 3.77714 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:18:56,774 - INFO - | |
[36mπ Training Metrics (Step 1000) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.5213 | | |
+---------------+---------+ | |
| grad_norm | 3.94306 | | |
+---------------+---------+ | |
| learning_rate | 3.3e-05 | | |
+---------------+---------+ | |
| epoch | 3.81524 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:19:17,405 - INFO - | |
[36mπ Training Metrics (Step 1010) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6218 | | |
+---------------+---------+ | |
| grad_norm | 7.93971 | | |
+---------------+---------+ | |
| learning_rate | 3.2e-05 | | |
+---------------+---------+ | |
| epoch | 3.85333 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:19:38,038 - INFO - | |
[36mπ Training Metrics (Step 1020) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.5477 | | |
+---------------+---------+ | |
| grad_norm | 13.3498 | | |
+---------------+---------+ | |
| learning_rate | 3e-05 | | |
+---------------+---------+ | |
| epoch | 3.89143 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:19:58,698 - INFO - | |
[36mπ Training Metrics (Step 1030) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6429 | | |
+---------------+---------+ | |
| grad_norm | 10.1506 | | |
+---------------+---------+ | |
| learning_rate | 2.8e-05 | | |
+---------------+---------+ | |
| epoch | 3.92952 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:20:19,330 - INFO - | |
[36mπ Training Metrics (Step 1040) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.5627 | | |
+---------------+---------+ | |
| grad_norm | 4.49416 | | |
+---------------+---------+ | |
| learning_rate | 2.6e-05 | | |
+---------------+---------+ | |
| epoch | 3.96762 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:20:36,375 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate] | |
2025-06-16 23:39:17,428 - INFO - | |
[33mπ Evaluation Metrics π | |
+-------------------------------+----------+ | |
| Metric | Value | | |
+===============================+==========+ | |
| eval_f1_micro | 0.006241 | | |
+-------------------------------+----------+ | |
| eval_f1_macro | 0.00597 | | |
+-------------------------------+----------+ | |
| eval_precision_at_5 | 0.018196 | | |
+-------------------------------+----------+ | |
| eval_recall_at_5 | 0.005862 | | |
+-------------------------------+----------+ | |
| eval_precision_at_8 | 0.01518 | | |
+-------------------------------+----------+ | |
| eval_recall_at_8 | 0.007524 | | |
+-------------------------------+----------+ | |
| eval_precision_at_15 | 0.016297 | | |
+-------------------------------+----------+ | |
| eval_recall_at_15 | 0.013505 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_micro | 0.004005 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_macro | 0.003967 | | |
+-------------------------------+----------+ | |
| eval_rare_precision | 0.002007 | | |
+-------------------------------+----------+ | |
| eval_rare_recall | 0.992014 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_5 | 0.006883 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_5 | 0.00311 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_8 | 0.005241 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_8 | 0.003914 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_15 | 0.004404 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_15 | 0.006155 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_micro | 0.136059 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_macro | 0.131255 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision | 0.073009 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall | 0.997299 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_5 | 0.139399 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_5 | 0.085527 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_8 | 0.109276 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_8 | 0.105525 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_15 | 0.102189 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_15 | 0.175595 | | |
+-------------------------------+----------+ | |
| eval_loss | -2.32388 | | |
+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate] | |
2025-06-16 23:39:21,151 - INFO - πΎ Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1048 - [multilabel_classify.py:2469:_save] | |
2025-06-16 23:39:21,153 - INFO - βοΈ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1048 - [multilabel_classify.py:2474:_save] | |
2025-06-16 23:39:21,154 - INFO - π Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1048: | |
+---------+--------------------+------------+ | |
| Index | Saved File | Size | | |
+=========+====================+============+ | |
| 1 | training_args.bin | 0.01 MB | | |
+---------+--------------------+------------+ | |
| 2 | optimizer.pt | 1308.77 MB | | |
+---------+--------------------+------------+ | |
| 3 | model.safetensors | 4600.97 MB | | |
+---------+--------------------+------------+ | |
| 4 | scaler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 5 | config.json | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 6 | scheduler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 7 | trainer_state.json | 0.02 MB | | |
+---------+--------------------+------------+ | |
| 8 | rng_state.pth | 0.01 MB | | |
+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save] | |
2025-06-16 23:39:29,580 - INFO - | |
[36mπ Training Metrics (Step 1050) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.4879 | | |
+---------------+---------+ | |
| grad_norm | 6.39619 | | |
+---------------+---------+ | |
| learning_rate | 2.5e-05 | | |
+---------------+---------+ | |
| epoch | 4.00762 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:39:50,155 - INFO - | |
[36mπ Training Metrics (Step 1060) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.9362 | | |
+---------------+---------+ | |
| grad_norm | 10.1241 | | |
+---------------+---------+ | |
| learning_rate | 2.3e-05 | | |
+---------------+---------+ | |
| epoch | 4.04571 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:40:10,763 - INFO - | |
[36mπ Training Metrics (Step 1070) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6064 | | |
+---------------+---------+ | |
| grad_norm | 9.74406 | | |
+---------------+---------+ | |
| learning_rate | 2.2e-05 | | |
+---------------+---------+ | |
| epoch | 4.08381 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:40:31,375 - INFO - | |
[36mπ Training Metrics (Step 1080) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6683 | | |
+---------------+---------+ | |
| grad_norm | 3.95963 | | |
+---------------+---------+ | |
| learning_rate | 2e-05 | | |
+---------------+---------+ | |
| epoch | 4.1219 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:40:52,017 - INFO - | |
[36mπ Training Metrics (Step 1090) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.8221 | | |
+---------------+---------+ | |
| grad_norm | 7.74502 | | |
+---------------+---------+ | |
| learning_rate | 1.9e-05 | | |
+---------------+---------+ | |
| epoch | 4.16 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:41:12,654 - INFO - | |
[36mπ Training Metrics (Step 1100) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6546 | | |
+---------------+---------+ | |
| grad_norm | 4.37811 | | |
+---------------+---------+ | |
| learning_rate | 1.7e-05 | | |
+---------------+---------+ | |
| epoch | 4.1981 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:41:33,298 - INFO - | |
[36mπ Training Metrics (Step 1110) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.8077 | | |
+---------------+---------+ | |
| grad_norm | 6.02418 | | |
+---------------+---------+ | |
| learning_rate | 1.6e-05 | | |
+---------------+---------+ | |
| epoch | 4.23619 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:41:54,124 - INFO - | |
[36mπ Training Metrics (Step 1120) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.9832 | | |
+---------------+---------+ | |
| grad_norm | 11.386 | | |
+---------------+---------+ | |
| learning_rate | 1.4e-05 | | |
+---------------+---------+ | |
| epoch | 4.27429 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:42:14,776 - INFO - | |
[36mπ Training Metrics (Step 1130) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.8361 | | |
+---------------+---------+ | |
| grad_norm | 6.99711 | | |
+---------------+---------+ | |
| learning_rate | 1.3e-05 | | |
+---------------+---------+ | |
| epoch | 4.31238 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:42:35,401 - INFO - | |
[36mπ Training Metrics (Step 1140) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.9406 | | |
+---------------+---------+ | |
| grad_norm | 11.1791 | | |
+---------------+---------+ | |
| learning_rate | 1.2e-05 | | |
+---------------+---------+ | |
| epoch | 4.35048 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:42:56,059 - INFO - | |
[36mπ Training Metrics (Step 1150) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.8839 | | |
+---------------+---------+ | |
| grad_norm | 5.22412 | | |
+---------------+---------+ | |
| learning_rate | 1.1e-05 | | |
+---------------+---------+ | |
| epoch | 4.38857 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:43:16,701 - INFO - | |
[36mπ Training Metrics (Step 1160) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.9782 | | |
+---------------+---------+ | |
| grad_norm | 5.76971 | | |
+---------------+---------+ | |
| learning_rate | 1e-05 | | |
+---------------+---------+ | |
| epoch | 4.42667 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:43:37,361 - INFO - | |
[36mπ Training Metrics (Step 1170) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.7407 | | |
+---------------+---------+ | |
| grad_norm | 13.5051 | | |
+---------------+---------+ | |
| learning_rate | 9e-06 | | |
+---------------+---------+ | |
| epoch | 4.46476 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:43:58,001 - INFO - | |
[36mπ Training Metrics (Step 1180) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6642 | | |
+---------------+---------+ | |
| grad_norm | 5.81691 | | |
+---------------+---------+ | |
| learning_rate | 8e-06 | | |
+---------------+---------+ | |
| epoch | 4.50286 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:44:18,651 - INFO - | |
[36mπ Training Metrics (Step 1190) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.8646 | | |
+---------------+---------+ | |
| grad_norm | 5.51144 | | |
+---------------+---------+ | |
| learning_rate | 7e-06 | | |
+---------------+---------+ | |
| epoch | 4.54095 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:44:39,299 - INFO - | |
[36mπ Training Metrics (Step 1200) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -4.1574 | | |
+---------------+---------+ | |
| grad_norm | 7.77753 | | |
+---------------+---------+ | |
| learning_rate | 6e-06 | | |
+---------------+---------+ | |
| epoch | 4.57905 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:44:59,944 - INFO - | |
[36mπ Training Metrics (Step 1210) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.8684 | | |
+---------------+---------+ | |
| grad_norm | 9.38395 | | |
+---------------+---------+ | |
| learning_rate | 5e-06 | | |
+---------------+---------+ | |
| epoch | 4.61714 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:45:20,611 - INFO - | |
[36mπ Training Metrics (Step 1220) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6834 | | |
+---------------+---------+ | |
| grad_norm | 7.10574 | | |
+---------------+---------+ | |
| learning_rate | 4e-06 | | |
+---------------+---------+ | |
| epoch | 4.65524 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:45:41,282 - INFO - | |
[36mπ Training Metrics (Step 1230) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.9973 | | |
+---------------+---------+ | |
| grad_norm | 3.95851 | | |
+---------------+---------+ | |
| learning_rate | 4e-06 | | |
+---------------+---------+ | |
| epoch | 4.69333 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:46:01,971 - INFO - | |
[36mπ Training Metrics (Step 1240) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.747 | | |
+---------------+---------+ | |
| grad_norm | 5.98381 | | |
+---------------+---------+ | |
| learning_rate | 3e-06 | | |
+---------------+---------+ | |
| epoch | 4.73143 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:46:22,631 - INFO - | |
[36mπ Training Metrics (Step 1250) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.8056 | | |
+---------------+---------+ | |
| grad_norm | 6.93498 | | |
+---------------+---------+ | |
| learning_rate | 3e-06 | | |
+---------------+---------+ | |
| epoch | 4.76952 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:46:43,296 - INFO - | |
[36mπ Training Metrics (Step 1260) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.7874 | | |
+---------------+---------+ | |
| grad_norm | 3.92144 | | |
+---------------+---------+ | |
| learning_rate | 2e-06 | | |
+---------------+---------+ | |
| epoch | 4.80762 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:47:03,970 - INFO - | |
[36mπ Training Metrics (Step 1270) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.895 | | |
+---------------+---------+ | |
| grad_norm | 7.15651 | | |
+---------------+---------+ | |
| learning_rate | 2e-06 | | |
+---------------+---------+ | |
| epoch | 4.84571 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:47:24,621 - INFO - | |
[36mπ Training Metrics (Step 1280) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6054 | | |
+---------------+---------+ | |
| grad_norm | 14.3564 | | |
+---------------+---------+ | |
| learning_rate | 1e-06 | | |
+---------------+---------+ | |
| epoch | 4.88381 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:47:45,288 - INFO - | |
[36mπ Training Metrics (Step 1290) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.6814 | | |
+---------------+---------+ | |
| grad_norm | 8.71897 | | |
+---------------+---------+ | |
| learning_rate | 1e-06 | | |
+---------------+---------+ | |
| epoch | 4.9219 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:48:05,952 - INFO - | |
[36mπ Training Metrics (Step 1300) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -3.7682 | | |
+---------------+---------+ | |
| grad_norm | 8.88537 | | |
+---------------+---------+ | |
| learning_rate | 1e-06 | | |
+---------------+---------+ | |
| epoch | 4.96 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:48:26,621 - INFO - | |
[36mπ Training Metrics (Step 1310) π | |
+---------------+---------+ | |
| Metric | Value | | |
+===============+=========+ | |
| loss | -4.0526 | | |
+---------------+---------+ | |
| grad_norm | 8.41898 | | |
+---------------+---------+ | |
| learning_rate | 1e-06 | | |
+---------------+---------+ | |
| epoch | 4.9981 | | |
+---------------+---------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-16 23:48:27,962 - INFO - πΎ Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310 - [multilabel_classify.py:2469:_save] | |
2025-06-16 23:48:27,964 - INFO - βοΈ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310 - [multilabel_classify.py:2474:_save] | |
2025-06-16 23:48:27,965 - INFO - π Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310: | |
+---------+-------------------+------------+ | |
| Index | Saved File | Size | | |
+=========+===================+============+ | |
| 1 | training_args.bin | 0.01 MB | | |
+---------+-------------------+------------+ | |
| 2 | model.safetensors | 4600.97 MB | | |
+---------+-------------------+------------+ | |
| 3 | config.json | 0.00 MB | | |
+---------+-------------------+------------+ - [multilabel_classify.py:2491:_save] | |
2025-06-16 23:48:28,605 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate] | |
2025-06-17 00:07:03,821 - INFO - | |
[33mπ Evaluation Metrics π | |
+-------------------------------+----------+ | |
| Metric | Value | | |
+===============================+==========+ | |
| eval_f1_micro | 0.006203 | | |
+-------------------------------+----------+ | |
| eval_f1_macro | 0.005948 | | |
+-------------------------------+----------+ | |
| eval_precision_at_5 | 0.013054 | | |
+-------------------------------+----------+ | |
| eval_recall_at_5 | 0.004007 | | |
+-------------------------------+----------+ | |
| eval_precision_at_8 | 0.010779 | | |
+-------------------------------+----------+ | |
| eval_recall_at_8 | 0.005572 | | |
+-------------------------------+----------+ | |
| eval_precision_at_15 | 0.012447 | | |
+-------------------------------+----------+ | |
| eval_recall_at_15 | 0.010075 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_micro | 0.003987 | | |
+-------------------------------+----------+ | |
| eval_rare_f1_macro | 0.003951 | | |
+-------------------------------+----------+ | |
| eval_rare_precision | 0.001998 | | |
+-------------------------------+----------+ | |
| eval_rare_recall | 0.999163 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_5 | 0.005538 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_5 | 0.002468 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_8 | 0.004104 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_8 | 0.002894 | | |
+-------------------------------+----------+ | |
| eval_rare_precision_at_15 | 0.003191 | | |
+-------------------------------+----------+ | |
| eval_rare_recall_at_15 | 0.004396 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_micro | 0.135441 | | |
+-------------------------------+----------+ | |
| eval_not_rare_f1_macro | 0.130814 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision | 0.072641 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall | 0.999782 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_5 | 0.139082 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_5 | 0.084165 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_8 | 0.106557 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_8 | 0.100509 | | |
+-------------------------------+----------+ | |
| eval_not_rare_precision_at_15 | 0.098866 | | |
+-------------------------------+----------+ | |
| eval_not_rare_recall_at_15 | 0.165002 | | |
+-------------------------------+----------+ | |
| eval_loss | -2.3104 | | |
+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate] | |
2025-06-17 00:07:07,535 - INFO - πΎ Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310 - [multilabel_classify.py:2469:_save] | |
2025-06-17 00:07:07,537 - INFO - βοΈ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310 - [multilabel_classify.py:2474:_save] | |
2025-06-17 00:07:07,538 - INFO - π Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310: | |
+---------+--------------------+------------+ | |
| Index | Saved File | Size | | |
+=========+====================+============+ | |
| 1 | training_args.bin | 0.01 MB | | |
+---------+--------------------+------------+ | |
| 2 | optimizer.pt | 1308.77 MB | | |
+---------+--------------------+------------+ | |
| 3 | model.safetensors | 4600.97 MB | | |
+---------+--------------------+------------+ | |
| 4 | scaler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 5 | config.json | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 6 | scheduler.pt | 0.00 MB | | |
+---------+--------------------+------------+ | |
| 7 | trainer_state.json | 0.03 MB | | |
+---------+--------------------+------------+ | |
| 8 | rng_state.pth | 0.01 MB | | |
+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save] | |
2025-06-17 00:07:08,790 - INFO - π Loading best model from ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262 - [multilabel_classify.py:2543:_load_best_model] | |
2025-06-17 00:07:08,791 - INFO - π₯οΈ Model is on device: cuda:0 - [multilabel_classify.py:2553:_load_best_model] | |
2025-06-17 00:07:08,853 - INFO - π Key order comparison: | |
+---------+--------------------------------------------+--------------------------------------------------------------------------------------+ | |
| Index | Saved state_dict Keys | Model state_dict Keys | | |
+=========+============================================+======================================================================================+ | |
| 1 | attention.in_proj_bias | boost_mul | | |
+---------+--------------------------------------------+--------------------------------------------------------------------------------------+ | |
| 2 | attention.in_proj_weight | boost_add | | |
+---------+--------------------------------------------+--------------------------------------------------------------------------------------+ | |
| 3 | attention.out_proj.bias | base_model.base_model.model.model.embed_tokens.weight | | |
+---------+--------------------------------------------+--------------------------------------------------------------------------------------+ | |
| 4 | attention.out_proj.weight | base_model.base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight | | |
+---------+--------------------------------------------+--------------------------------------------------------------------------------------+ | |
| 5 | base_model.base_model.model.lm_head.weight | base_model.base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight.absmax | | |
+---------+--------------------------------------------+--------------------------------------------------------------------------------------+ - [multilabel_classify.py:2577:_load_best_model] | |
2025-06-17 00:07:09,846 - INFO - β Loaded best model weights from ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262/model.safetensors - [multilabel_classify.py:2594:_load_best_model] | |
2025-06-17 00:07:09,885 - INFO - βοΈ Weight for boost_mul matches between saved and loaded state_dict - [multilabel_classify.py:2606:_load_best_model] | |
2025-06-17 00:07:09,918 - INFO - βοΈ Weight for boost_add matches between saved and loaded state_dict - [multilabel_classify.py:2606:_load_best_model] | |
2025-06-17 00:07:09,935 - INFO - | |
[36mπ Training Metrics (Step 1310) π | |
+--------------------------+----------+ | |
| Metric | Value | | |
+==========================+==========+ | |
| train_runtime | 8383.29 | | |
+--------------------------+----------+ | |
| train_samples_per_second | 5.006 | | |
+--------------------------+----------+ | |
| train_steps_per_second | 0.156 | | |
+--------------------------+----------+ | |
| total_flos | 0 | | |
+--------------------------+----------+ | |
| train_loss | -3.07712 | | |
+--------------------------+----------+ | |
| epoch | 4.9981 | | |
+--------------------------+----------+[0m - [multilabel_classify.py:2212:on_log] | |
2025-06-17 00:07:09,935 - INFO - β¨ Training Completed! β¨ - [multilabel_classify.py:2085:on_train_end] | |
2025-06-17 00:07:10,008 - INFO - π Training loss plot saved as '../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/train_loss_plot.png' - [multilabel_classify.py:2281:on_train_end] | |
2025-06-17 00:07:10,069 - INFO - π Evaluation loss plot saved as '../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/eval_loss_plot.png' - [multilabel_classify.py:2295:on_train_end] | |
2025-06-17 00:07:10,128 - INFO - π Evaluation metric plot saved as '../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/eval_precision_at_15_plot.png' - [multilabel_classify.py:2316:on_train_end] | |
2025-06-17 00:07:10,128 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section] | |
2025-06-17 00:07:10,128 - INFO - + β¨ MODEL SAVING + - [multilabel_classify.py:104:log_section] | |
2025-06-17 00:07:10,128 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section] | |
2025-06-17 00:07:10,128 - INFO - πΎ Saving trained model and pushing to Hugging Face Hub... - [multilabel_classify.py:4093:main] | |
2025-06-17 00:07:10,128 - INFO - π Creating/using output directory: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3069:save_and_push] | |
2025-06-17 00:07:14,321 - INFO - πΎ Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:2469:_save] | |
2025-06-17 00:07:14,323 - INFO - βοΈ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:2474:_save] | |
2025-06-17 00:07:14,324 - INFO - π Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b: | |
+---------+--------------------------------------------+------------+ | |
| Index | Saved File | Size | | |
+=========+============================================+============+ | |
| 1 | eval_loss_plot.png | 0.03 MB | | |
+---------+--------------------------------------------+------------+ | |
| 2 | training_args.bin | 0.01 MB | | |
+---------+--------------------------------------------+------------+ | |
| 3 | tokenizer.model | 0.56 MB | | |
+---------+--------------------------------------------+------------+ | |
| 4 | tokenizer.json | 3.50 MB | | |
+---------+--------------------------------------------+------------+ | |
| 5 | model.safetensors | 4600.97 MB | | |
+---------+--------------------------------------------+------------+ | |
| 6 | config.json | 0.00 MB | | |
+---------+--------------------------------------------+------------+ | |
| 7 | special_tokens_map.json | 0.00 MB | | |
+---------+--------------------------------------------+------------+ | |
| 8 | tokenizer_config.json | 0.13 MB | | |
+---------+--------------------------------------------+------------+ | |
| 9 | train_loss_plot.png | 0.04 MB | | |
+---------+--------------------------------------------+------------+ | |
| 10 | eval_precision_at_15_plot.png | 0.04 MB | | |
+---------+--------------------------------------------+------------+ | |
| 11 | README.md | 0.01 MB | | |
+---------+--------------------------------------------+------------+ | |
| 12 | classification_log_2025-06-16_18-00-57.log | 0.09 MB | | |
+---------+--------------------------------------------+------------+ - [multilabel_classify.py:2491:_save] | |
2025-06-17 00:07:18,278 - INFO - πΎ Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:2469:_save] | |
2025-06-17 00:07:18,280 - INFO - βοΈ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:2474:_save] | |
2025-06-17 00:07:18,281 - INFO - π Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b: | |
+---------+--------------------------------------------+------------+ | |
| Index | Saved File | Size | | |
+=========+============================================+============+ | |
| 1 | eval_loss_plot.png | 0.03 MB | | |
+---------+--------------------------------------------+------------+ | |
| 2 | training_args.bin | 0.01 MB | | |
+---------+--------------------------------------------+------------+ | |
| 3 | tokenizer.model | 0.56 MB | | |
+---------+--------------------------------------------+------------+ | |
| 4 | tokenizer.json | 3.50 MB | | |
+---------+--------------------------------------------+------------+ | |
| 5 | model.safetensors | 4600.97 MB | | |
+---------+--------------------------------------------+------------+ | |
| 6 | config.json | 0.00 MB | | |
+---------+--------------------------------------------+------------+ | |
| 7 | special_tokens_map.json | 0.00 MB | | |
+---------+--------------------------------------------+------------+ | |
| 8 | tokenizer_config.json | 0.13 MB | | |
+---------+--------------------------------------------+------------+ | |
| 9 | train_loss_plot.png | 0.04 MB | | |
+---------+--------------------------------------------+------------+ | |
| 10 | eval_precision_at_15_plot.png | 0.04 MB | | |
+---------+--------------------------------------------+------------+ | |
| 11 | README.md | 0.01 MB | | |
+---------+--------------------------------------------+------------+ | |
| 12 | classification_log_2025-06-16_18-00-57.log | 0.09 MB | | |
+---------+--------------------------------------------+------------+ - [multilabel_classify.py:2491:_save] | |
2025-06-17 00:08:53,532 - INFO - πΎ Model saved to: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3073:save_and_push] | |
2025-06-17 00:08:53,564 - INFO - ποΈ Tokenizer saved to: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3077:save_and_push] | |