{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "IqM-T1RTzY6C" }, "source": [ "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", "
\n", "\n", "To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).\n", "\n", "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).\n", "\n", "[NEW] Gemma 2 comes in 3 sizes: 2b, 9b and 27b. 2b uses 2 trillion tokens distilled from 27b!\n", "\n", "**[NEW] Try 2x faster inference in a free Colab for Gemma-2 2b Instruct [here](https://colab.research.google.com/drive/1i-8ESvtLRGNkkUQQr_-z_rcSAIo9c3lM?usp=sharing)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2eSvM9zX_2d3" }, "outputs": [], "source": [ "%%capture\n", "!pip install unsloth\n", "# Also get the latest nightly Unsloth!\n", "!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git\n", "\n", "# Install Flash Attention 2 for softcapping support\n", "import torch\n", "if torch.cuda.get_device_capability()[0] >= 8:\n", " !pip install --no-deps packaging ninja einops \"flash-attn>=2.6.3\"" ] }, { "cell_type": "markdown", "metadata": { "id": "r2v_X2fA0Df5" }, "source": [ "* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc\n", "* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.\n", "* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.\n", "* [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)\n", "* [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)\n", "* [**NEW**] We make Mistral NeMo 12B 2x faster and fit in under 12GB of VRAM! [Mistral NeMo notebook](https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 331, "referenced_widgets": [ "c97875e799204d779e18fba0b95fd5bf", "008bc528049245b4ab976672dde740f2", "6e3b7320bb8d435484f5fdc9853766fb", "77637a4b61604515afa02bb6c899488b", "95647d8f43764e6fa94471c74fe2f8dc", "d39f7e18537747e2b00926439bd748e8", "7b6d2a8cac69408ebde121c54b913057", "437a79d742cb4620833cdae592883628", "913b8372b289451d999adaeac262367d", "99e6f1ce7b6340b88cdf66fa746719ad", "947cfedcb67246cca93b745e56831657", "4c72c0615369427e962f09cc45097193", "7f4e45f683994acf8a600d2a33e7503e", "d5ba89c9819d441aafa1778c9cab9788", "432a1d5bdb7042b7aaa2ad50605caf25", "b1ecc3cd4544446a867283bc933dba62", "459ec24d06f647dd9603ab4b472596c3", "3901bcb1f3c140d98335d17abac3076f", "f6076a72476341209a19678eef002957", "010e7bd8b70f4949970bbe588d9d8e9c", "3b4dc22b60f5486683a4ec5205eddcb2", "39545235a26d42118f4a00bed04d9935", "ab8732869fc3471ea8e0487873e48e15", "ae46801c915847f8a68c2e682b171300", "ca4f9478be974521924da6d8fc711f4f", "b28a11d4e7014f58b2489d0af76d03a8", "a564d5a7e76c4911a3d77e133f6cade8", "f58cca528af84097a48cc83f9e3063c9", "0a114acdc8724a2a8cc0a68d08ce318c", "142127d9cb584c63a5ba49b39110980b", "800d6f7702df4df883b1c3165a069c72", "76679887c3694a19aaaa4f4a0c629d3c", "0fc02717997a4c3cbd4dd1af900878a5", "6057012b69af430f804e9ef0fcc85965", "19adcd80d3d347eab5726809cf63c4ec", "9f84e86a514c48cda94512d037604191", "664cb3195726421aad47fa1b510fabe0", "7a7f822959cb4b72826f308846fc22f2", "1d46432ba7384bf7bc88efa7d81d673b", "c0a1c389d5e542e7a204227649b2a901", "265c6115d6724fe28c2ef7d527b49d58", "705e34282c2e480e99549c413cd393b5", "cf8ec709d5744feba9c37036433d2e74", "920a8e0cf14f4df58d33ba222d0ada51", "a8e94a46887745b7b654aa12bc553ac1", "c68a47d0530e4c29900d3ba7de231ad7", "dd422569be2d4971942ad457c82f15fd", "82b5e99c882646148f6df799b4a7a28e", "bce0bdc86b3046d3baf53a9d1a7051ec", "64e589c742f54f8ea22795887b340a1f", "9577de7111c44525af1eb0891603dc08", "bfa500998f894eeb9a394af9269a1ae0", "a22d8797c1c24b4aa43a2ebaa403a50d", "f4c8b2aac3854a338b7edbce6b25a8b4", "53d4611a86854cb9986a1164e964b896", "a391fc6789a74ff3871c9dd840a72369", "ccc8605e5ca14c299e0d462802c5b1e9", "ae01cc62a80f49179bc4a05098faaa2b", "fe7620492e33468289f20487ffbbd59f", "ecd72e97fbbb45b9b88b9c0254cc657d", "f0c6d01928c84826a5af0e35b7d5105b", "c925385cb4f44bf78b85927fea4c59e5", "2e98eddcab764e9c80c4cfc764aeb2d3", "177f2b5ef6f34fad9d304a3c41da1648", "54297e8d76a1417a8472df2b151356fc", "0f04afd664df41a48ae463c63cfef2d3" ] }, "executionInfo": { "elapsed": 50568, "status": "ok", "timestamp": 1722441633360, "user": { "displayName": "Daniel Han-Chen", "userId": "17402123517466114840" }, "user_tz": 420 }, "id": "QmUBVEnvCDJv", "outputId": "62b52782-c29b-4344-bcca-ec56d5e9a617" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", "==((====))== Unsloth 2024.8: Fast Gemma2 patching. Transformers = 4.43.3.\n", " \\\\ /| GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.\n", "O^O/ \\_/ \\ Pytorch: 2.3.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.\n", "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]\n", " \"-____-\" Free Apache license: http://github.com/unslothai/unsloth\n", "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c97875e799204d779e18fba0b95fd5bf", "version_major": 2, "version_minor": 0 }, "text/plain": [ "model.safetensors: 0%| | 0.00/2.22G [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4c72c0615369427e962f09cc45097193", "version_major": 2, "version_minor": 0 }, "text/plain": [ "generation_config.json: 0%| | 0.00/168 [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ab8732869fc3471ea8e0487873e48e15", "version_major": 2, "version_minor": 0 }, "text/plain": [ "tokenizer_config.json: 0%| | 0.00/46.3k [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6057012b69af430f804e9ef0fcc85965", "version_major": 2, "version_minor": 0 }, "text/plain": [ "tokenizer.model: 0%| | 0.00/4.24M [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a8e94a46887745b7b654aa12bc553ac1", "version_major": 2, "version_minor": 0 }, "text/plain": [ "special_tokens_map.json: 0%| | 0.00/555 [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a391fc6789a74ff3871c9dd840a72369", "version_major": 2, "version_minor": 0 }, "text/plain": [ "tokenizer.json: 0%| | 0.00/17.5M [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from unsloth import FastLanguageModel\n", "import torch\n", "max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!\n", "dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+\n", "load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.\n", "\n", "# 4bit pre quantized models we support for 4x faster downloading + no OOMs.\n", "fourbit_models = [\n", " \"unsloth/Meta-Llama-3.1-8B-bnb-4bit\", # Llama-3.1 15 trillion tokens model 2x faster!\n", " \"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit\",\n", " \"unsloth/Meta-Llama-3.1-70B-bnb-4bit\",\n", " \"unsloth/Meta-Llama-3.1-405B-bnb-4bit\", # We also uploaded 4bit for 405b!\n", " \"unsloth/Mistral-Nemo-Base-2407-bnb-4bit\", # New Mistral 12b 2x faster!\n", " \"unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit\",\n", " \"unsloth/mistral-7b-v0.3-bnb-4bit\", # Mistral v3 2x faster!\n", " \"unsloth/mistral-7b-instruct-v0.3-bnb-4bit\",\n", " \"unsloth/Phi-3-mini-4k-instruct\", # Phi-3 2x faster!d\n", " \"unsloth/Phi-3-medium-4k-instruct\",\n", " \"unsloth/gemma-2-9b-bnb-4bit\",\n", " \"unsloth/gemma-2-27b-bnb-4bit\", # Gemma 2x faster!\n", " \"unsloth/gemma-2-2b-bnb-4bit\", # New small Gemma model!\n", "] # More models at https://huggingface.co/unsloth\n", "\n", "model, tokenizer = FastLanguageModel.from_pretrained(\n", " model_name = \"unsloth/gemma-2-2b\",\n", " max_seq_length = max_seq_length,\n", " dtype = dtype,\n", " load_in_4bit = load_in_4bit,\n", " # token = \"hf_...\", # use one if using gated models like meta-llama/Llama-2-7b-hf\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "SXd9bTZd1aaL" }, "source": [ "We now add LoRA adapters so we only need to update 1 to 10% of all parameters!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3929, "status": "ok", "timestamp": 1722441637273, "user": { "displayName": "Daniel Han-Chen", "userId": "17402123517466114840" }, "user_tz": 420 }, "id": "6bZsfBuZDeCL", "outputId": "d32d4c8e-c799-430d-c056-6c1a56fa99fe" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Unsloth 2024.8 patched 26 layers with 26 QKV layers, 26 O layers and 26 MLP layers.\n" ] } ], "source": [ "model = FastLanguageModel.get_peft_model(\n", " model,\n", " r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128\n", " target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", " \"gate_proj\", \"up_proj\", \"down_proj\",],\n", " lora_alpha = 16,\n", " lora_dropout = 0, # Supports any, but = 0 is optimized\n", " bias = \"none\", # Supports any, but = \"none\" is optimized\n", " # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\n", " use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\n", " random_state = 3407,\n", " use_rslora = False, # We support rank stabilized LoRA\n", " loftq_config = None, # And LoftQ\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "vITh0KVJ10qX" }, "source": [ "\n", "### Data Prep\n", "We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.\n", "\n", "**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).\n", "\n", "**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!\n", "\n", "If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).\n", "\n", "For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 145, "referenced_widgets": [ "8b65ab1fa061457e89f295e8a81e635b", "07648f1794c643b5a041db478062037d", "bb2326a055064e86917136d18cb90138", "a9094171050f48b4aaa89073343cb415", "9caa0460ae4547ff95d8b460637d906c", "b1d816037bc0495290f6eb1e5d53475e", "ce7c4b36ff3f42578b38e43329a4e10f", "5c305f1c06434aa5ac5875dea13800be", "84b864f1427743078be3db1acc40cf2b", "5e4d90d2e5a84a6ba6bd981be0f4f59b", "2cb4f9f802b7469bae027a13508e941f", "6b6fe8ee676c4dbeb48ee587c7be367a", "2d90d5ea73a14bc082ac8827efd61c63", "d3d46634cd3946778eeb81adb590a850", "b4ddd6e13d5642578e9dbe6aab79c770", "9364e0939cb2428fb0af37355d7da7ac", "4cc0c1b3630041f9bf6fcdcbdede61d8", "172346525ad14c69a7b32de06b1bc2b7", "d357bc92c10e43199e1df2bda9ba17ef", "892de30da5b042d394b89e69e7cac358", "f77bf9e9af0c42dfa31fa2a1cb89333b", "c0eb5ed4b4fe4947a6545da64eb7eaf8", "4a3ac24b2fbf45349c1a08a167a4fc8d", "f592325e32b14c1bb2a5204e71edf494", "b32e9d6c1ce04dffac373351d4de57b0", "edd99bbc941c4619b1e507ded1e707ce", "3b3e56e3de964d87968425991733b346", "c5e85605d69b47e4bcc96f0c0924eb2b", "6e5c583ad4c440ba94fed81149b609a3", "c62a63b0d7de4770b3a4939718561508", "b345d48ce9e54247abf2491d9c740ddc", "2e7e5cac9f09431c8df4f1ddc13db7d2", "6f2c75d006af439d9997fd07b7717dcd", "956ea575313b4306bed317ed3ea345eb", "1042959a5b104ff3a845e0242f20b498", "3992e3ec773546edb97b9083630a0fe9", "d20b7d7cdb864fd384806aebc1954cc3", "13e37792812241bf9aa2e989b527860d", "65499c1d4b134432ab375971fc83921a", "72fdd0fca68f4b53a7413cf64e6a8949", "a2ac7320880b42df8f5444b035eb7929", "4f352c4601834106b7c371ff26592d49", "b804fd8561674ad8ad766e0fd98a3dc6", "4e7eb7f65fb64ea6b0016788c6dd89de" ] }, "executionInfo": { "elapsed": 4513, "status": "ok", "timestamp": 1722441641775, "user": { "displayName": "Daniel Han-Chen", "userId": "17402123517466114840" }, "user_tz": 420 }, "id": "LjY75GoYUCB8", "outputId": "e65740c0-3ab8-4c01-b308-2a37cc9062d9" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8b65ab1fa061457e89f295e8a81e635b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading readme: 0%| | 0.00/11.6k [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6b6fe8ee676c4dbeb48ee587c7be367a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading data: 0%| | 0.00/44.3M [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4a3ac24b2fbf45349c1a08a167a4fc8d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating train split: 0%| | 0/51760 [00:00, ? examples/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "956ea575313b4306bed317ed3ea345eb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Map: 0%| | 0/51760 [00:00, ? examples/s]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "alpaca_prompt = \"\"\"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n", "\n", "### Instruction:\n", "{}\n", "\n", "### Input:\n", "{}\n", "\n", "### Response:\n", "{}\"\"\"\n", "\n", "EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN\n", "def formatting_prompts_func(examples):\n", " instructions = examples[\"instruction\"]\n", " inputs = examples[\"input\"]\n", " outputs = examples[\"output\"]\n", " texts = []\n", " for instruction, input, output in zip(instructions, inputs, outputs):\n", " # Must add EOS_TOKEN, otherwise your generation will go on forever!\n", " text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN\n", " texts.append(text)\n", " return { \"text\" : texts, }\n", "pass\n", "\n", "from datasets import load_dataset\n", "dataset = load_dataset(\"yahma/alpaca-cleaned\", split = \"train\")\n", "dataset = dataset.map(formatting_prompts_func, batched = True,)" ] }, { "cell_type": "markdown", "metadata": { "id": "idAEIeSQ3xdS" }, "source": [ "\n", "### Train the model\n", "Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 66, "referenced_widgets": [ "92d9648dd6bb4086ad9ce52db641bf87", "ebf2fb459fd44328962a64c2e592112e", "56020c41ce384da78bd36ac4264e9d1f", "336f8c09c4be4a9d86bf47a4f8c4d5a8", "a2c07f1aacee44aa8d70e023b9966a23", "5cb2b3bbb610485ebfd2c81488d2430a", "3e71a9d8abcc47759ddddd3a2ce074eb", "edf0c0af82e7471ab2436f4ad1682205", "b881f555d1be46bd9590d264cc5aa0e5", "50602814313447959aa0596a43d5090e", "d569f15f71c24d82b457084eb0eb5c4f" ] }, "executionInfo": { "elapsed": 34129, "status": "ok", "timestamp": 1722441675895, "user": { "displayName": "Daniel Han-Chen", "userId": "17402123517466114840" }, "user_tz": 420 }, "id": "95_Nn-89DhsL", "outputId": "95dd46be-edc4-4e2c-9a72-e41719229f5d" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "92d9648dd6bb4086ad9ce52db641bf87", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Map (num_proc=2): 0%| | 0/51760 [00:00, ? examples/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "max_steps is given, it will override any value given in num_train_epochs\n" ] } ], "source": [ "from trl import SFTTrainer\n", "from transformers import TrainingArguments\n", "from unsloth import is_bfloat16_supported\n", "\n", "trainer = SFTTrainer(\n", " model = model,\n", " tokenizer = tokenizer,\n", " train_dataset = dataset,\n", " dataset_text_field = \"text\",\n", " max_seq_length = max_seq_length,\n", " dataset_num_proc = 2,\n", " packing = False, # Can make training 5x faster for short sequences.\n", " args = TrainingArguments(\n", " per_device_train_batch_size = 2,\n", " gradient_accumulation_steps = 4,\n", " warmup_steps = 5,\n", " # num_train_epochs = 1, # Set this for 1 full training run.\n", " max_steps = 60,\n", " learning_rate = 2e-4,\n", " fp16 = not is_bfloat16_supported(),\n", " bf16 = is_bfloat16_supported(),\n", " logging_steps = 1,\n", " optim = \"adamw_8bit\",\n", " weight_decay = 0.01,\n", " lr_scheduler_type = \"linear\",\n", " seed = 3407,\n", " output_dir = \"outputs\",\n", " report_to = \"none\", # Use this for WandB etc\n", " ),\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 20, "status": "ok", "timestamp": 1722441675896, "user": { "displayName": "Daniel Han-Chen", "userId": "17402123517466114840" }, "user_tz": 420 }, "id": "2ejIt2xSNKKp", "outputId": "a207a383-cdc1-4a7c-e6b5-b46ae8f1cfbb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GPU = Tesla T4. Max memory = 14.748 GB.\n", "2.697 GB of memory reserved.\n" ] } ], "source": [ "#@title Show current memory stats\n", "gpu_stats = torch.cuda.get_device_properties(0)\n", "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", "print(f\"{start_gpu_memory} GB of memory reserved.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "executionInfo": { "elapsed": 280906, "status": "ok", "timestamp": 1722441956791, "user": { "displayName": "Daniel Han-Chen", "userId": "17402123517466114840" }, "user_tz": 420 }, "id": "yqxqAZ7KJ4oL", "outputId": "9fab2c22-ad76-4a87-dcd6-bbac6353c6c8" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1\n", " \\\\ /| Num examples = 51,760 | Num Epochs = 1\n", "O^O/ \\_/ \\ Batch size per device = 2 | Gradient Accumulation steps = 4\n", "\\ / Total batch size = 8 | Total steps = 60\n", " \"-____-\" Number of trainable parameters = 20,766,720\n" ] }, { "data": { "text/html": [ "\n", "Step | \n", "Training Loss | \n", "
---|---|
1 | \n", "1.854400 | \n", "
2 | \n", "2.406500 | \n", "
3 | \n", "1.755700 | \n", "
4 | \n", "1.990000 | \n", "
5 | \n", "1.632200 | \n", "
6 | \n", "1.651400 | \n", "
7 | \n", "1.220900 | \n", "
8 | \n", "1.359800 | \n", "
9 | \n", "1.127900 | \n", "
10 | \n", "1.266700 | \n", "
11 | \n", "1.028200 | \n", "
12 | \n", "1.021900 | \n", "
13 | \n", "0.990900 | \n", "
14 | \n", "1.156600 | \n", "
15 | \n", "0.968200 | \n", "
16 | \n", "0.961700 | \n", "
17 | \n", "1.075100 | \n", "
18 | \n", "1.302900 | \n", "
19 | \n", "1.028400 | \n", "
20 | \n", "0.925500 | \n", "
21 | \n", "0.960100 | \n", "
22 | \n", "0.975800 | \n", "
23 | \n", "0.900400 | \n", "
24 | \n", "1.033100 | \n", "
25 | \n", "1.105800 | \n", "
26 | \n", "1.113500 | \n", "
27 | \n", "1.090900 | \n", "
28 | \n", "0.939000 | \n", "
29 | \n", "0.883100 | \n", "
30 | \n", "0.941700 | \n", "
31 | \n", "0.914600 | \n", "
32 | \n", "0.920300 | \n", "
33 | \n", "1.024900 | \n", "
34 | \n", "0.868300 | \n", "
35 | \n", "0.964300 | \n", "
36 | \n", "0.908900 | \n", "
37 | \n", "0.908300 | \n", "
38 | \n", "0.806800 | \n", "
39 | \n", "1.139700 | \n", "
40 | \n", "1.216000 | \n", "
41 | \n", "0.963900 | \n", "
42 | \n", "0.984100 | \n", "
43 | \n", "0.945700 | \n", "
44 | \n", "0.923600 | \n", "
45 | \n", "0.974400 | \n", "
46 | \n", "0.971000 | \n", "
47 | \n", "0.925200 | \n", "
48 | \n", "1.234300 | \n", "
49 | \n", "0.932400 | \n", "
50 | \n", "1.085700 | \n", "
51 | \n", "1.062500 | \n", "
52 | \n", "0.971500 | \n", "
53 | \n", "1.000000 | \n", "
54 | \n", "1.244900 | \n", "
55 | \n", "0.856600 | \n", "
56 | \n", "1.071400 | \n", "
57 | \n", "0.925000 | \n", "
58 | \n", "0.845200 | \n", "
59 | \n", "0.895700 | \n", "
60 | \n", "0.958300 | \n", "
"
],
"text/plain": [
"