{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "nc0g2NLpUSGr" }, "source": [ "# Fine-tune SmolVLM2 on Video Captioning\n", "In this notebook we will fine-tune SmolVLM2-500M-Video-Instruct on Video Feedback dataset. It is ran on a Colab A100 for full fine-tuning, but you can squeeze it to L4 with QLoRA." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "WIhA1lQ7j0kw", "outputId": "928f2f4e-6cd8-452b-d621-605550fdd33c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m163.5/163.5 kB\u001b[0m \u001b[31m5.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n" ] } ], "source": [ "!pip install -q accelerate datasets peft bitsandbytes tensorboard pyav num2words" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "FCYgmJtDRElR" }, "outputs": [], "source": [ "!pip install -q git+https://github.com/huggingface/transformers.git" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XyJaqZZ3uYYl" }, "outputs": [], "source": [ "!pip install -q flash-attn --no-build-isolation" ] }, { "cell_type": "markdown", "metadata": { "id": "wAeMA0heVBjT" }, "source": [ "We will push out model to Hub so we need to authenticate ourselves." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17, "referenced_widgets": [ "112da28d935543069e7a1a2abc22f9f4", "0d22c009aa584ca1a71e32336a7985e0", "ad17e30049cb4b5aa4046d94690f87d3", "e77d3520a2d64f9a840652669c9a0ba1", "1852745b0de44f4281cea0cbb3508459", "166c19ec6d9f4455a56a0f146d1c0abc", "f6362bc7b5b24dd592d35a76a1fbf26b", "e99fbdfc8a22408a8c728a36c8744b24", "0fee30c9bf2b4bdfad7a37261f92db64", "4cd8babc92cc4aeba74d2147f28dee7d", "a4fbf37fe0fe44cfbf72ca1e82af3467", "be50e04c5629463eb18d029d045f25b3", "5490c69c251144c4979e346c66ac1e53", "44d0e1db5f664b3fb7c146c216566776", "7af918a10ec745d7a3f4a883dbdc8b6a", "4156b6897089446984196606ef0d3461", "cf4b5a9cefe84fd9a4d120ab1da6f3f4", "484155e67e36453c9d1ebd2ea1768eca", "48bb89c434284b639f45b5929cf8d1a9", "0ead4ab9bb7648c69352094bfbcb8800" ] }, "id": "yKd5xtSGj7cm", "outputId": "a6e841d8-f2d6-44a8-d44d-c0c244d95f9b" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "112da28d935543069e7a1a2abc22f9f4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='
Step | \n", "Training Loss | \n", "
---|---|
25 | \n", "3.345600 | \n", "
50 | \n", "0.709500 | \n", "
75 | \n", "0.341000 | \n", "
100 | \n", "0.272200 | \n", "
125 | \n", "0.250600 | \n", "
150 | \n", "0.290400 | \n", "
175 | \n", "0.261100 | \n", "
200 | \n", "0.258000 | \n", "
225 | \n", "0.276500 | \n", "
250 | \n", "0.265900 | \n", "
275 | \n", "0.301500 | \n", "
300 | \n", "0.277900 | \n", "
325 | \n", "0.282800 | \n", "
350 | \n", "0.264100 | \n", "
375 | \n", "0.235500 | \n", "
400 | \n", "0.251400 | \n", "
425 | \n", "0.242500 | \n", "
450 | \n", "0.281100 | \n", "
475 | \n", "0.261000 | \n", "
500 | \n", "0.231800 | \n", "
525 | \n", "0.232200 | \n", "
550 | \n", "0.268100 | \n", "
575 | \n", "0.222400 | \n", "
600 | \n", "0.246600 | \n", "
625 | \n", "0.251700 | \n", "
650 | \n", "0.257800 | \n", "
675 | \n", "0.241000 | \n", "
700 | \n", "0.229000 | \n", "
725 | \n", "0.236600 | \n", "
750 | \n", "0.220900 | \n", "
775 | \n", "0.271400 | \n", "
800 | \n", "0.259900 | \n", "
825 | \n", "0.243900 | \n", "
850 | \n", "0.236400 | \n", "
875 | \n", "0.227200 | \n", "
900 | \n", "0.227900 | \n", "
925 | \n", "0.263300 | \n", "
950 | \n", "0.255200 | \n", "
975 | \n", "0.250000 | \n", "
1000 | \n", "0.244400 | \n", "
"
],
"text/plain": [
"
Copy a token from your Hugging Face\ntokens page and paste it below.
Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file.