{"cells":[{"cell_type":"markdown","source":["To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n","
\n","\n","To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth?tab=readme-ov-file#-installation-instructions).\n","\n","You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).\n","\n","**[NEW] Gemma2-9b is trained on 8 trillion tokens! Gemma2-27b is 13 trillion!**"],"metadata":{"id":"IqM-T1RTzY6C"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"2eSvM9zX_2d3"},"outputs":[],"source":["%%capture\n","!pip install unsloth\n","# Also get the latest nightly Unsloth!\n","!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git\n","\n","# Install Flash Attention 2 for softcapping support\n","import torch\n","if torch.cuda.get_device_capability()[0] >= 8:\n"," !pip install --no-deps packaging ninja einops \"flash-attn>=2.6.3\""]},{"cell_type":"markdown","source":["* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc\n","* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.\n","* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.\n","* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.\n","* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)"],"metadata":{"id":"r2v_X2fA0Df5"}},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":366,"referenced_widgets":["a50c887a6acf4f50ad5dd79e4501e9e9","81c7bf29b6ff487fa375920dacf09baa","669b485a61d04371acf8dcbda1ea7b97","0a33f5c0dcd345eb84d41648ec69b2e2","ec75327a4fe8473495256133b6550c50","8adfcad55d4e4c6eb02bf919aa6c3ddb","3e3caa6a08b84dceb28160dcf40df0e4","74dbeae77ab040b48d3c1debb8ed409a","453eba6bfea84833b5b927b1f2f8a6c8","17fed093a6134807bcebc3b37be21cb0","a022f0ec4ec44e969fbe3a52e825e935","de25e838640341feba66b21fcda6413b","56d9139e1e8644a4af4c1b6d41ebc52f","c4146f3241134b90bc1633015d5e5991","914e23a8cf4146d6bdd41bec7620290e","6debbcd0097f49708872dbfe3247e2c5","26d7eb9ef5d5438da534a593898f5ade","5372ceeed134432ebf696a78d3474f86","c3349c0ade704b52a1ba207dc17e8399","30319681a87b4cf5af789aa5e895d6cb","75e6df554c894d869d548282d18dd0f7","69257949f3c44b698d8f9a731d391e24","0be73547156c40f585b68bf6e394c5b8","a83f8458b3fd4fe58b82f104ceb84607","9eed31ea4026422fb834ad2226fa207e","e774a45d9a3e4e1c94218b14be1abbb3","4152d8ac16ee4c31807641c60c253409","90ae83ba38df4865acd682753cfb3635","4c62c5a66a1c437eaa38a254ba631ff9","e9709a43b5e54b93b7b1bac0754c2d05","0d84b141108d4920b8f4dbc97df94baf","b4ad39eb304441fdafd6d20edd5fbc7f","1899cdf6599a4925828d5b74a99ff843","b4b0a2921cce446fa41a6afb690de5f1","cc1a9e6621244e25b90dde0a812b22ad","7560a9d2853b419b807f65d22fc7be4c","281d4090e1fd4ddb8faae5e8ee2003ae","bb7a2f8a3c564c7495ba6db4ec796bed","ce73a99355b14b1c9c48c1cce90a8c49","85082a3719c245689461e4d344ee3208","f6788fc77e1f4bb1871a9a3c311541ab","d4eeb154ba264f8ab4328d508b7db397","d9a941ea9a544c90878d945f326792c3","205742561dbe49fea2a6c6a6a34b7647","d431eca5a995404a8b06290072c6bb80","8ee1f38fc8914174a74f475d26ddaa21","d92efec39052473dabb11c6bed99c529","a3c2b9ea86154fd8804fb0370430aaaf","904ce10c8423463e834a81d543f1db86","4777de01d28b4ce08ab8436beb66f607","adfbbfb6ab3d42f39355a5643151d42c","e208ba01081f4b7d8b70d7158baa7601","bc38aa8c622b4d9e91ef4788b329b2e7","b6fa6dbd11ed42a68bddbaeaea0eef33","6bbeb3daee6340e687d9f028e5d01c50","1651ada0b9ff4b3a8b7646b6ba043ca5","1633f3a5cf4049fbaa72b92c07c8b026","5df7dde322df4170a84fa71b2b9f8083","bac2d229d01b41c4827e958e5628d94e","89aadcf7ff4548bf95fec729c45ae5c6","c625f48e47c8440c8f1ebc783f2beb1d","fd74bd5f52934ff08e606b76170f4731","8143edb45700453b96e9f3a2a79568c4","77eae387824846b2870ca570e7f57710","ae43f6bcd5834defa59a01cce1f7eeca","e1d7f18425e74752acd581e4a3c81b32","ea079ad27d5d4e16a5fa47167efdfc68","4f332dc2761e4799a9422e198f97e8a0","7d01d5aedd8649a2aa6a90a516d64f1e","0fda2794388e4d5c9b4e26259759a7eb","d6fc7d10c8394ef29edcedb9f3e32adc","81e79d2ab04c4677a09c349566a3db4a","37ec440d967b47a8a2e2e56f9742f176","20df02e066244c30b963d4a99d1bce1f","96b94b5982b446afa98e38bb4f609bf5","7545be91c6134baca0d9847b5941cb79","f230a2a40dcc479c9eedb9f34515352c"]},"id":"QmUBVEnvCDJv","executionInfo":{"status":"ok","timestamp":1720290041169,"user_tz":420,"elapsed":106585,"user":{"displayName":"Daniel Han-Chen","userId":"17402123517466114840"}},"outputId":"cf888a1a-6db4-4080-cd7f-9fdc1f9804dd"},"outputs":[{"output_type":"stream","name":"stdout","text":["🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n"]},{"output_type":"display_data","data":{"text/plain":["config.json: 0%| | 0.00/1.38k [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"a50c887a6acf4f50ad5dd79e4501e9e9"}},"metadata":{}},{"output_type":"stream","name":"stdout","text":["==((====))== Unsloth: Fast Gemma2 patching release 2024.7\n"," \\\\ /| GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.\n","O^O/ \\_/ \\ Pytorch: 2.3.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.\n","\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]\n"," \"-____-\" Free Apache license: http://github.com/unslothai/unsloth\n"]},{"output_type":"display_data","data":{"text/plain":["model.safetensors: 0%| | 0.00/6.13G [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"de25e838640341feba66b21fcda6413b"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["generation_config.json: 0%| | 0.00/173 [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"0be73547156c40f585b68bf6e394c5b8"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["tokenizer_config.json: 0%| | 0.00/40.0k [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"b4b0a2921cce446fa41a6afb690de5f1"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["tokenizer.model: 0%| | 0.00/4.24M [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"d431eca5a995404a8b06290072c6bb80"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["special_tokens_map.json: 0%| | 0.00/636 [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"1651ada0b9ff4b3a8b7646b6ba043ca5"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["tokenizer.json: 0%| | 0.00/17.5M [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"ea079ad27d5d4e16a5fa47167efdfc68"}},"metadata":{}}],"source":["from unsloth import FastLanguageModel\n","import torch\n","max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!\n","dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+\n","load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.\n","\n","# 4bit pre quantized models we support for 4x faster downloading + no OOMs.\n","fourbit_models = [\n"," \"unsloth/Meta-Llama-3.1-8B-bnb-4bit\", # Llama-3.1 15 trillion tokens model 2x faster!\n"," \"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit\",\n"," \"unsloth/Meta-Llama-3.1-70B-bnb-4bit\",\n"," \"unsloth/Meta-Llama-3.1-405B-bnb-4bit\", # We also uploaded 4bit for 405b!\n"," \"unsloth/Mistral-Nemo-Base-2407-bnb-4bit\", # New Mistral 12b 2x faster!\n"," \"unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit\",\n"," \"unsloth/mistral-7b-v0.3-bnb-4bit\", # Mistral v3 2x faster!\n"," \"unsloth/mistral-7b-instruct-v0.3-bnb-4bit\",\n"," \"unsloth/Phi-3.5-mini-instruct\", # Phi-3.5 2x faster!\n"," \"unsloth/Phi-3-medium-4k-instruct\",\n"," \"unsloth/gemma-2-9b-bnb-4bit\",\n"," \"unsloth/gemma-2-27b-bnb-4bit\", # Gemma 2x faster!\n","] # More models at https://huggingface.co/unsloth\n","\n","model, tokenizer = FastLanguageModel.from_pretrained(\n"," model_name = \"unsloth/gemma-2-9b\",\n"," max_seq_length = max_seq_length,\n"," dtype = dtype,\n"," load_in_4bit = load_in_4bit,\n"," # token = \"hf_...\", # use one if using gated models like meta-llama/Llama-2-7b-hf\n",")"]},{"cell_type":"markdown","source":["We now add LoRA adapters so we only need to update 1 to 10% of all parameters!"],"metadata":{"id":"SXd9bTZd1aaL"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"6bZsfBuZDeCL","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1720290050415,"user_tz":420,"elapsed":9259,"user":{"displayName":"Daniel Han-Chen","userId":"17402123517466114840"}},"outputId":"6fd98112-17cd-415e-815e-010d3d4f825c"},"outputs":[{"output_type":"stream","name":"stderr","text":["Unsloth 2024.7 patched 42 layers with 42 QKV layers, 42 O layers and 42 MLP layers.\n"]}],"source":["model = FastLanguageModel.get_peft_model(\n"," model,\n"," r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128\n"," target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n"," \"gate_proj\", \"up_proj\", \"down_proj\",],\n"," lora_alpha = 16,\n"," lora_dropout = 0, # Supports any, but = 0 is optimized\n"," bias = \"none\", # Supports any, but = \"none\" is optimized\n"," # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\n"," use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\n"," random_state = 3407,\n"," use_rslora = False, # We support rank stabilized LoRA\n"," loftq_config = None, # And LoftQ\n",")"]},{"cell_type":"markdown","source":["\n","### Data Prep\n","We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.\n","\n","**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).\n","\n","**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!\n","\n","If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).\n","\n","For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)."],"metadata":{"id":"vITh0KVJ10qX"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"LjY75GoYUCB8","colab":{"base_uri":"https://localhost:8080/","height":165,"referenced_widgets":["e8826f60adea4ee282fa333462a8f816","18c18c7fc23b4937b59e0ab0f106688c","740d8c53a690433cab81c56b28a10f93","fdaf85d70c5a40b6b7ebbac1ca04b425","64a46d340b8a44a7a14af3fc5fec1d2f","5fdd0043c309422d864f6a3af65ba041","7563586dd819426aac6f1c9e0617610a","01a1d6f23e134d9da8cb8cb3fd56767d","ec006fb1d51d43808638b49ab49fd9a1","c873a97ce0794170984d7b98955c84ee","8aa4dd0b45704430b50057109a05f1ff","c6719a7dea5d4930bde14849fba0e856","38133b89122340b5823857652c041de0","306ecd3152294d94b848a60d5a6a78c3","4cc564ace4d047e583513a6876ce0b5b","f3c4b810a2494e549aac86f037fefff4","3da3cd67600143bc82919912f34fe898","d76b37a9bd2c4a83a449e93cfee7b7d3","40a7de11c78b487da00acad98de7432a","eb4ba8b428794fd894f844e936de5a00","6dfd012435db48bfbf31867b7565c94e","50290996b8604226a286ce9c641c5453","c3dd5cc73ba544528a4bf9f88e1d4aa3","1c4cd1cb54e1427aaf718939c58c470e","cc18e9fd077d43d8a717b74dacc8fb4a","38479352e62244fbbe21d65f67453732","935535dcc9c1467387537a146856e3e4","da0c2fc935c14e0d8c1db2a5514db10a","8624ff8d0cff47918c2ced44b09f4456","6600d7f0e28d4ae594fb25709663383b","39ab1b564ad1436cb20338acc4b214f1","db61e062f6d1453e81c7739430599848","d67a93216ff64ea6bddcd1b28907d8bd","c089bbdff8b946198e1854b4f296b6e0","302d1fe8eb6145bcb1697c7843538da0","be49753d27ad4f6a83872066c3e6f67c","8898910fec18453fb0eb58ae0065b875","f45a3f90211248dc8a9b22d71387d130","e3f1ae32e62644e1868ef525b9fbca8e","6cbe93969d644819a4569c4339516e71","c285baf7255b44189f9c46a30df8c033","154f0c08ad604a4cbd4bc24f70227d5f","81cd9d127b3c44a29671227868fef67b","8a298e38c820427f9d861cd35d89dca8"]},"executionInfo":{"status":"ok","timestamp":1720290054908,"user_tz":420,"elapsed":4500,"user":{"displayName":"Daniel Han-Chen","userId":"17402123517466114840"}},"outputId":"170624eb-a8f9-46e0-d9c1-d698c89dd88d"},"outputs":[{"output_type":"display_data","data":{"text/plain":["Downloading readme: 0%| | 0.00/11.6k [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"e8826f60adea4ee282fa333462a8f816"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["Downloading data: 0%| | 0.00/44.3M [00:00, ?B/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"c6719a7dea5d4930bde14849fba0e856"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["Generating train split: 0%| | 0/51760 [00:00, ? examples/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"c3dd5cc73ba544528a4bf9f88e1d4aa3"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["Map: 0%| | 0/51760 [00:00, ? examples/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"c089bbdff8b946198e1854b4f296b6e0"}},"metadata":{}}],"source":["alpaca_prompt = \"\"\"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n","\n","### Instruction:\n","{}\n","\n","### Input:\n","{}\n","\n","### Response:\n","{}\"\"\"\n","\n","EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN\n","def formatting_prompts_func(examples):\n"," instructions = examples[\"instruction\"]\n"," inputs = examples[\"input\"]\n"," outputs = examples[\"output\"]\n"," texts = []\n"," for instruction, input, output in zip(instructions, inputs, outputs):\n"," # Must add EOS_TOKEN, otherwise your generation will go on forever!\n"," text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN\n"," texts.append(text)\n"," return { \"text\" : texts, }\n","pass\n","\n","from datasets import load_dataset\n","dataset = load_dataset(\"yahma/alpaca-cleaned\", split = \"train\")\n","dataset = dataset.map(formatting_prompts_func, batched = True,)"]},{"cell_type":"markdown","source":["\n","### Train the model\n","Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!"],"metadata":{"id":"idAEIeSQ3xdS"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"95_Nn-89DhsL","colab":{"base_uri":"https://localhost:8080/","height":122,"referenced_widgets":["ae1c9079cc9646a2ae9afc9f324bc251","e89a480a4f5549dbb631f178e23b4dc4","dfe5ea9ac4f84d92b6a7c80f5919782a","8d7e0f823d38431c8778975b7bb50e66","802f7bf749cd43af9ee1d27c35837993","6d7971ac2f4b4729a97183b853401dc6","766edabcd75147dfbbaad8e68e136e46","ff4cecec726b4d84ae64de2689571a94","2ebdc0a1284c41fd86915e0961e7f147","7247c69c93f047a3b619ce2d04d269bb","17032787871d40598d5676ed49f466c8"]},"executionInfo":{"status":"ok","timestamp":1720290104659,"user_tz":420,"elapsed":49754,"user":{"displayName":"Daniel Han-Chen","userId":"17402123517466114840"}},"outputId":"99bf52b8-cc4c-4c5f-81b4-0b9463c8d6f8"},"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.10/dist-packages/multiprocess/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.\n"," self.pid = os.fork()\n"]},{"output_type":"display_data","data":{"text/plain":["Map (num_proc=2): 0%| | 0/51760 [00:00, ? examples/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"ae1c9079cc9646a2ae9afc9f324bc251"}},"metadata":{}},{"output_type":"stream","name":"stderr","text":["max_steps is given, it will override any value given in num_train_epochs\n"]}],"source":["from trl import SFTTrainer\n","from transformers import TrainingArguments\n","from unsloth import is_bfloat16_supported\n","\n","trainer = SFTTrainer(\n"," model = model,\n"," tokenizer = tokenizer,\n"," train_dataset = dataset,\n"," dataset_text_field = \"text\",\n"," max_seq_length = max_seq_length,\n"," dataset_num_proc = 2,\n"," packing = False, # Can make training 5x faster for short sequences.\n"," args = TrainingArguments(\n"," per_device_train_batch_size = 2,\n"," gradient_accumulation_steps = 4,\n"," warmup_steps = 5,\n"," max_steps = 60,\n"," learning_rate = 2e-4,\n"," fp16 = not is_bfloat16_supported(),\n"," bf16 = is_bfloat16_supported(),\n"," logging_steps = 1,\n"," optim = \"adamw_8bit\",\n"," weight_decay = 0.01,\n"," lr_scheduler_type = \"linear\",\n"," seed = 3407,\n"," output_dir = \"outputs\",\n"," report_to = \"none\", # Use this for WandB etc\n"," ),\n",")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"2ejIt2xSNKKp","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1720290104661,"user_tz":420,"elapsed":12,"user":{"displayName":"Daniel Han-Chen","userId":"17402123517466114840"}},"outputId":"d8ead8a5-c59c-44c0-f0cb-750dc7446838","cellView":"form"},"outputs":[{"output_type":"stream","name":"stdout","text":["GPU = Tesla T4. Max memory = 14.748 GB.\n","6.576 GB of memory reserved.\n"]}],"source":["#@title Show current memory stats\n","gpu_stats = torch.cuda.get_device_properties(0)\n","start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n","max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n","print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n","print(f\"{start_gpu_memory} GB of memory reserved.\")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"yqxqAZ7KJ4oL","colab":{"base_uri":"https://localhost:8080/","height":1000},"outputId":"90a67466-dbf8-420f-949d-80d9098dee5d","executionInfo":{"status":"ok","timestamp":1720290810606,"user_tz":420,"elapsed":705951,"user":{"displayName":"Daniel Han-Chen","userId":"17402123517466114840"}}},"outputs":[{"output_type":"stream","name":"stderr","text":["==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1\n"," \\\\ /| Num examples = 51,760 | Num Epochs = 1\n","O^O/ \\_/ \\ Batch size per device = 2 | Gradient Accumulation steps = 4\n","\\ / Total batch size = 8 | Total steps = 60\n"," \"-____-\" Number of trainable parameters = 54,018,048\n"]},{"output_type":"display_data","data":{"text/plain":["Step | \n","Training Loss | \n","
---|---|
1 | \n","1.700600 | \n","
2 | \n","2.211500 | \n","
3 | \n","1.566800 | \n","
4 | \n","1.718900 | \n","
5 | \n","1.358100 | \n","
6 | \n","1.300900 | \n","
7 | \n","0.918900 | \n","
8 | \n","1.106300 | \n","
9 | \n","0.929300 | \n","
10 | \n","1.019100 | \n","
11 | \n","0.845400 | \n","
12 | \n","0.816100 | \n","
13 | \n","0.786700 | \n","
14 | \n","0.981800 | \n","
15 | \n","0.773800 | \n","
16 | \n","0.793300 | \n","
17 | \n","0.925400 | \n","
18 | \n","1.172100 | \n","
19 | \n","0.891100 | \n","
20 | \n","0.785500 | \n","
21 | \n","0.795200 | \n","
22 | \n","0.829000 | \n","
23 | \n","0.797400 | \n","
24 | \n","0.884400 | \n","
25 | \n","0.981700 | \n","
26 | \n","0.956000 | \n","
27 | \n","0.950900 | \n","
28 | \n","0.819500 | \n","
29 | \n","0.769700 | \n","
30 | \n","0.806200 | \n","
31 | \n","0.771100 | \n","
32 | \n","0.812600 | \n","
33 | \n","0.905800 | \n","
34 | \n","0.771800 | \n","
35 | \n","0.842200 | \n","
36 | \n","0.790200 | \n","
37 | \n","0.818000 | \n","
38 | \n","0.687600 | \n","
39 | \n","1.010300 | \n","
40 | \n","1.016700 | \n","
41 | \n","0.830200 | \n","
42 | \n","0.872900 | \n","
43 | \n","0.865800 | \n","
44 | \n","0.819300 | \n","
45 | \n","0.854800 | \n","
46 | \n","0.889700 | \n","
47 | \n","0.815900 | \n","
48 | \n","1.092300 | \n","
49 | \n","0.807500 | \n","
50 | \n","0.966100 | \n","
51 | \n","0.924000 | \n","
52 | \n","0.844700 | \n","
53 | \n","0.908700 | \n","
54 | \n","1.085800 | \n","
55 | \n","0.731100 | \n","
56 | \n","0.965500 | \n","
57 | \n","0.815700 | \n","
58 | \n","0.749700 | \n","
59 | \n","0.781100 | \n","
60 | \n","0.836000 | \n","
"]},"metadata":{}}],"source":["trainer_stats = trainer.train()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"pCqnaKmlO1U9","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1720290810607,"user_tz":420,"elapsed":15,"user":{"displayName":"Daniel Han-Chen","userId":"17402123517466114840"}},"outputId":"14dc05ed-3554-496f-b5dc-a2f2ff6fb096","cellView":"form"},"outputs":[{"output_type":"stream","name":"stdout","text":["702.5893 seconds used for training.\n","11.71 minutes used for training.\n","Peak reserved memory = 9.383 GB.\n","Peak reserved memory for training = 2.807 GB.\n","Peak reserved memory % of max memory = 63.622 %.\n","Peak reserved memory for training % of max memory = 19.033 %.\n"]}],"source":["#@title Show final memory and time stats\n","used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n","used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n","used_percentage = round(used_memory /max_memory*100, 3)\n","lora_percentage = round(used_memory_for_lora/max_memory*100, 3)\n","print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n","print(f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\")\n","print(f\"Peak reserved memory = {used_memory} GB.\")\n","print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n","print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n","print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")"]},{"cell_type":"markdown","source":["\n","### Inference\n","Let's run the model! You can change the instruction and input - leave the output blank!"],"metadata":{"id":"ekOmTR1hSNcr"}},{"cell_type":"code","source":["# alpaca_prompt = Copied from above\n","FastLanguageModel.for_inference(model) # Enable native 2x faster inference\n","inputs = tokenizer(\n","[\n"," alpaca_prompt.format(\n"," \"Continue the fibonnaci sequence.\", # instruction\n"," \"1, 1, 2, 3, 5, 8\", # input\n"," \"\", # output - leave this blank for generation!\n"," )\n","], return_tensors = \"pt\").to(\"cuda\")\n","\n","outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)\n","tokenizer.batch_decode(outputs)"],"metadata":{"id":"kR3gIAX-SM2q","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1720290833251,"user_tz":420,"elapsed":22653,"user":{"displayName":"Daniel Han-Chen","userId":"17402123517466114840"}},"outputId":"ed794792-b09b-42dd-8c63-7a63f40c4bba"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["['