tuandunghcmut
/

vlm_clone_2

Model card Files Files and versions Community

tuandunghcmut commited on Apr 11

Commit

db08794

verified ·

1 Parent(s): d756736

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
a_mllm_notebooks/.ipynb_checkpoints/serve-checkpoint.sh +30 -0
a_mllm_notebooks/langchain/image.jpg +3 -0
a_mllm_notebooks/lmdeploy/api_server.ipynb +568 -0
a_mllm_notebooks/lmdeploy/api_server.md +265 -0
a_mllm_notebooks/lmdeploy/api_server_vl.ipynb +199 -0
a_mllm_notebooks/lmdeploy/api_server_vl.md +155 -0
a_mllm_notebooks/lmdeploy/download_md.ipynb +211 -0
a_mllm_notebooks/lmdeploy/get_started_vl.ipynb +517 -0
a_mllm_notebooks/lmdeploy/get_started_vl.md +204 -0
a_mllm_notebooks/lmdeploy/internvl_25.ipynb +355 -0
a_mllm_notebooks/lmdeploy/kv_quant.ipynb +114 -0
a_mllm_notebooks/lmdeploy/kv_quant.md +82 -0
a_mllm_notebooks/lmdeploy/links.txt +8 -0
a_mllm_notebooks/lmdeploy/lmdeploy_deepseek_vl.ipynb +665 -0
a_mllm_notebooks/lmdeploy/lmdeploy_info.ipynb +132 -0
a_mllm_notebooks/lmdeploy/lmdeploy_serve.sh +47 -0
a_mllm_notebooks/lmdeploy/long_context.ipynb +169 -0
a_mllm_notebooks/lmdeploy/long_context.md +119 -0
a_mllm_notebooks/lmdeploy/pipeline.ipynb +570 -0
a_mllm_notebooks/lmdeploy/pipeline.md +205 -0
a_mllm_notebooks/lmdeploy/proxy_server.ipynb +248 -0
a_mllm_notebooks/lmdeploy/proxy_server.md +97 -0
a_mllm_notebooks/lmdeploy/pytorch_new_model.ipynb +261 -0
a_mllm_notebooks/lmdeploy/pytorch_new_model.md +181 -0
a_mllm_notebooks/lmdeploy/tiger.jpeg +0 -0
a_mllm_notebooks/lmdeploy/turbomind.ipynb +88 -0
a_mllm_notebooks/lmdeploy/turbomind.md +68 -0
a_mllm_notebooks/lmdeploy/w4a16.ipynb +174 -0
a_mllm_notebooks/lmdeploy/w4a16.md +130 -0
a_mllm_notebooks/lmdeploy/w8a8.ipynb +75 -0
a_mllm_notebooks/lmdeploy/w8a8.md +55 -0
a_mllm_notebooks/openai/.ipynb_checkpoints/infer-checkpoint.py +167 -0
a_mllm_notebooks/openai/.ipynb_checkpoints/langchain_openai_api-checkpoint.ipynb +0 -0
a_mllm_notebooks/openai/.ipynb_checkpoints/load_synth_pedes-checkpoint.ipynb +96 -0
a_mllm_notebooks/openai/.ipynb_checkpoints/openai_api-checkpoint.ipynb +408 -0
a_mllm_notebooks/openai/.ipynb_checkpoints/ping_server-checkpoint.ipynb +292 -0
a_mllm_notebooks/openai/.ipynb_checkpoints/serve-checkpoint.sh +60 -0
a_mllm_notebooks/openai/.ipynb_checkpoints/temp-checkpoint.sh +25 -0
a_mllm_notebooks/openai/combine_chinese_output.ipynb +526 -0
a_mllm_notebooks/openai/openai_api.ipynb +408 -0
a_mllm_notebooks/tensorrt-llm/bert/.gitignore +2 -0
a_mllm_notebooks/tensorrt-llm/bert/README.md +79 -0
a_mllm_notebooks/tensorrt-llm/bert/base_benchmark/config.json +22 -0
a_mllm_notebooks/tensorrt-llm/bert/base_with_attention_plugin_benchmark/config.json +22 -0
a_mllm_notebooks/tensorrt-llm/bert/build.py +354 -0
a_mllm_notebooks/tensorrt-llm/bert/large_benchmark/config.json +22 -0
a_mllm_notebooks/tensorrt-llm/bert/large_with_attention_plugin_benchmark/config.json +22 -0
a_mllm_notebooks/tensorrt-llm/bert/run.py +128 -0
a_mllm_notebooks/tensorrt-llm/bert/run_remove_input_padding.py +153 -0

.gitattributes CHANGED Viewed

@@ -452,3 +452,5 @@ recognize-anything/images/demo/.ipynb_checkpoints/demo4-checkpoint.jpg filter=lf
 recognize-anything/images/demo/.ipynb_checkpoints/demo2-checkpoint.jpg filter=lfs diff=lfs merge=lfs -text
 a_mllm_notebooks/vllm/cat.jpg filter=lfs diff=lfs merge=lfs -text
 a_mllm_notebooks/openai/image.jpg filter=lfs diff=lfs merge=lfs -text

 recognize-anything/images/demo/.ipynb_checkpoints/demo2-checkpoint.jpg filter=lfs diff=lfs merge=lfs -text
 a_mllm_notebooks/vllm/cat.jpg filter=lfs diff=lfs merge=lfs -text
 a_mllm_notebooks/openai/image.jpg filter=lfs diff=lfs merge=lfs -text
+a_mllm_notebooks/langchain/image.jpg filter=lfs diff=lfs merge=lfs -text
+a_mllm_notebooks/vllm/.ipynb_checkpoints/cat-checkpoint.jpg filter=lfs diff=lfs merge=lfs -text

a_mllm_notebooks/.ipynb_checkpoints/serve-checkpoint.sh ADDED Viewed

	@@ -0,0 +1,30 @@

+eval "$(conda shell.bash hook)"
+conda activate lmdeploy
+MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct-AWQ
+# PORT_LIST=(2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031)
+# PORT_LIST from 3063 to 3099
+PORT_LIST=( $(seq 19500 1 19590) )
+# PORT_LIST=(9898)
+# PROXY_URL=0.0.0.0
+# lmdeploy serve proxy --server-name $PROXY_URL --server-port 8080 --strategy \
+# min_observed_latency &
+# "min_expected_latency" \
+# &
+for PORT in "${PORT_LIST[@]}"; do
+  # get random device id from 0 to 3
+  # RANDOM_DEVICE_ID=$((RANDOM % 3))
+  RANDOM_DEVICE_ID=1
+  # CUDA_VISIBLE_DEVICES=$RANDOM_DEVICE_ID \
+    # CUDA_VISIBLE_DEVICES=0,1 \
+    # CUDA_VISIBLE_DEVICES=2,3 \
+  lmdeploy serve api_server $MODEL_NAME \
+  --server-port $PORT \
+  --backend turbomind \
+  --dtype float16 --proxy-url http://0.0.0.0:8080 \
+  --cache-max-entry-count 0.1 --tp 1 &
+done

a_mllm_notebooks/langchain/image.jpg ADDED Viewed

Git LFS Details

SHA256: dea9e7ef97386345f7cff32f9055da4982da5471c48d575146c796ab4563b04e
Pointer size: 131 Bytes
Size of remote file: 173 kB

a_mllm_notebooks/lmdeploy/api_server.ipynb ADDED Viewed

	@@ -0,0 +1,568 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c1639fe1",
+   "metadata": {},
+   "source": [
+    "# OpenAI Compatible Server\n",
+    "\n",
+    "This article primarily discusses the deployment of a single LLM model across multiple GPUs on a single node, providing a service that is compatible with the OpenAI interface, as well as the usage of the service API.\n",
+    "For the sake of convenience, we refer to this service as `api_server`. Regarding parallel services with multiple models, please refer to the guide about [Request Distribution Server](proxy_server.md).\n",
+    "\n",
+    "In the following sections, we will first introduce methods for starting the service, choosing the appropriate one based on your application scenario.\n",
+    "\n",
+    "Next, we focus on the definition of the service's RESTful API, explore the various ways to interact with the interface, and demonstrate how to try the service through the Swagger UI or LMDeploy CLI tools.\n",
+    "\n",
+    "Finally, we showcase how to integrate the service into a WebUI, providing you with a reference to easily set up a demonstration demo.\n",
+    "\n",
+    "## Launch Service\n",
+    "\n",
+    "Take the [internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat) model hosted on huggingface hub as an example, you can choose one the following methods to start the service.\n",
+    "\n",
+    "### Option 1: Launching with lmdeploy CLI\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve api_server internlm/internlm2_5-7b-chat --server-port 23333\n",
+    "```\n",
+    "\n",
+    "The arguments of `api_server` can be viewed through the command `lmdeploy serve api_server -h`, for instance, `--tp` to set tensor parallelism, `--session-len` to specify the max length of the context window, `--cache-max-entry-count` to adjust the GPU mem ratio for k/v cache etc.\n",
+    "\n",
+    "### Option 2: Deploying with docker\n",
+    "\n",
+    "With LMDeploy [official docker image](https://hub.docker.com/r/openmmlab/lmdeploy/tags), you can run OpenAI compatible server as follows:\n",
+    "\n",
+    "```shell\n",
+    "docker run --runtime nvidia --gpus all \\\n",
+    "    -v ~/.cache/huggingface:/root/.cache/huggingface \\\n",
+    "    --env \"HUGGING_FACE_HUB_TOKEN=<secret>\" \\\n",
+    "    -p 23333:23333 \\\n",
+    "    --ipc=host \\\n",
+    "    openmmlab/lmdeploy:latest \\\n",
+    "    lmdeploy serve api_server internlm/internlm2_5-7b-chat\n",
+    "```\n",
+    "\n",
+    "The parameters of `api_server` are the same with that mentioned in \"[option 1](#option-1-launching-with-lmdeploy-cli)\" section\n",
+    "\n",
+    "### Option 3: Deploying to Kubernetes cluster\n",
+    "\n",
+    "Connect to a running Kubernetes cluster and deploy the internlm2_5-7b-chat model service with [kubectl](https://kubernetes.io/docs/reference/kubectl/) command-line tool (replace `<your token>` with your huggingface hub token):\n",
+    "\n",
+    "```shell\n",
+    "sed 's/{{HUGGING_FACE_HUB_TOKEN}}/<your token>/' k8s/deployment.yaml | kubectl create -f - \\\n",
+    "    && kubectl create -f k8s/service.yaml\n",
+    "```\n",
+    "\n",
+    "In the example above the model data is placed on the local disk of the node (hostPath). Consider replacing it with high-availability shared storage if multiple replicas are desired, and the storage can be mounted into container using [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/).\n",
+    "\n",
+    "## RESTful API\n",
+    "\n",
+    "LMDeploy's RESTful API is compatible with the following three OpenAI interfaces:\n",
+    "\n",
+    "- /v1/chat/completions\n",
+    "- /v1/models\n",
+    "- /v1/completions\n",
+    "\n",
+    "Additionally, LMDeploy also defines `/v1/chat/interactive` to support interactive inference. The feature of interactive inference is that there's no need to pass the user conversation history as required by `v1/chat/completions`, since the conversation history will be cached on the server side. This method boasts excellent performance during multi-turn long context inference.\n",
+    "\n",
+    "You can overview and try out the offered RESTful APIs by the website `http://0.0.0.0:23333` as shown in the below image after launching the service successfully.\n",
+    "\n",
+    "![swagger_ui](https://github.com/InternLM/lmdeploy/assets/4560679/b891dd90-3ffa-4333-92b2-fb29dffa1459)\n",
+    "\n",
+    "Or, you can use the LMDeploy's built-in CLI tool to verify the service correctness right from the console.\n",
+    "\n",
+    "```shell\n",
+    "# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333\n",
+    "lmdeploy serve api_client ${api_server_url}\n",
+    "```\n",
+    "\n",
+    "If you need to integrate the service into your own projects or products, we recommend the following approach:\n",
+    "\n",
+    "### Integrate with `OpenAI`\n",
+    "\n",
+    "Here is an example of interaction with the endpoint `v1/chat/completions` service via the openai package.\n",
+    "Before running it, please install the openai package by `pip install openai`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "be8a8067",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Fetching 32 files: 100%|██████████████████████████████████████| 32/32 [00:00<00:00, 8631.92it/s]\n",
+      "InternLM2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.\n",
+      "  - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes\n",
+      "  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).\n",
+      "  - If you are not the owner of the model architecture class, please contact the model code owner to update it.\n",
+      "Convert to turbomind format:   0%|                                       | 0/48 [00:00<?, ?it/s]"
+     ]
+    }
+   ],
+   "source": [
+    "command = '''lmdeploy serve api_server \\\n",
+    "OpenGVLab/InternVL2_5-26B-AWQ \\\n",
+    "--server-port 23333 \\\n",
+    "--model-format awq \\\n",
+    "--backend turbomind \\\n",
+    "--tp 4 \\\n",
+    "--dtype float16 \\\n",
+    "&\n",
+    "'''\n",
+    "\n",
+    "import os\n",
+    "os.system(command)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "063f3c9f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# kill all the process having lmdeploy in the name\n",
+    "# !ps aux|grep 'lmdeploy' | awk '{print $2}'| xargs kill -9"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "63db3f55",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fri Dec 20 08:33:19 2024       \n",
+      "+---------------------------------------------------------------------------------------+\n",
+      "| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |\n",
+      "|-----------------------------------------+----------------------+----------------------+\n",
+      "| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
+      "| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |\n",
+      "|                                         |                      |               MIG M. |\n",
+      "|=========================================+======================+======================|\n",
+      "|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:14:00.0 Off |                    0 |\n",
+      "| N/A   32C    P0              36W / 250W |  33714MiB / 40960MiB |      0%      Default |\n",
+      "|                                         |                      |             Disabled |\n",
+      "+-----------------------------------------+----------------------+----------------------+\n",
+      "|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:15:00.0 Off |                    0 |\n",
+      "| N/A   33C    P0              38W / 250W |  33934MiB / 40960MiB |      0%      Default |\n",
+      "|                                         |                      |             Disabled |\n",
+      "+-----------------------------------------+----------------------+----------------------+\n",
+      "|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:39:00.0 Off |                    0 |\n",
+      "| N/A   32C    P0              36W / 250W |  33934MiB / 40960MiB |      0%      Default |\n",
+      "|                                         |                      |             Disabled |\n",
+      "+-----------------------------------------+----------------------+----------------------+\n",
+      "|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:3A:00.0 Off |                    0 |\n",
+      "| N/A   33C    P0              39W / 250W |  34604MiB / 40960MiB |      0%      Default |\n",
+      "|                                         |                      |             Disabled |\n",
+      "+-----------------------------------------+----------------------+----------------------+\n",
+      "                                                                                         \n",
+      "+---------------------------------------------------------------------------------------+\n",
+      "| Processes:                                                                            |\n",
+      "|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |\n",
+      "|        ID   ID                                                             Usage      |\n",
+      "|=======================================================================================|\n",
+      "+---------------------------------------------------------------------------------------+\n"
+     ]
+    }
+   ],
+   "source": [
+    "!nvidia-smi"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "0d54a9a2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:     127.0.0.1:43526 - \"GET /v1/models HTTP/1.1\" 200 OK\n",
+      "INFO:     127.0.0.1:43526 - \"POST /v1/chat/completions HTTP/1.1\" 200 OK\n",
+      "ChatCompletion(id='1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1. Prioritize tasks: Make a list of tasks you need to complete and prioritize them based on their importance and urgency. This will help you focus on the most critical tasks first and ensure that you complete them on time.\\n\\n2. Set realistic deadlines: Set realistic deadlines for each task based on the time required to complete it. This will help you avoid procrastination and ensure that you complete tasks on time.\\n\\n3. Use time management tools: Use time management tools like calendars, to-do lists, and time-tracking apps to help you manage your time more effectively. These tools can help you stay organized, focused, and on track with your tasks and deadlines.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1734683216, model='OpenGVLab/InternVL2_5-26B-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=138, prompt_tokens=27, total_tokens=165, completion_tokens_details=None))\n"
+     ]
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:23333/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "        {\"role\": \"user\", \"content\": \" provide three suggestions about time management\"},\n",
+    "    ],\n",
+    "    temperature=0.8,\n",
+    "    top_p=0.8,\n",
+    ")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6cad3dc4",
+   "metadata": {},
+   "source": [
+    "If you want to use async functions, may try the following example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "056d55bf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "from openai import AsyncOpenAI\n",
+    "\n",
+    "\n",
+    "async def main():\n",
+    "    client = AsyncOpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:23333/v1\")\n",
+    "    model_cards = await client.models.list()._get_page()\n",
+    "    response = await client.chat.completions.create(\n",
+    "        model=model_cards.data[0].id,\n",
+    "        messages=[\n",
+    "            {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \" provide three suggestions about time management\",\n",
+    "            },\n",
+    "        ],\n",
+    "        temperature=0.8,\n",
+    "        top_p=0.8,\n",
+    "    )\n",
+    "    print(response)\n",
+    "\n",
+    "\n",
+    "asyncio.run(main())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bb5033b",
+   "metadata": {},
+   "source": [
+    "You can invoke other OpenAI interfaces using similar methods. For more detailed information, please refer to the [OpenAI API guide](https://platform.openai.com/docs/guides/text-generation)\n",
+    "\n",
+    "### Integrate with lmdeploy `APIClient`\n",
+    "\n",
+    "Below are some examples demonstrating how to visit the service through `APIClient`\n",
+    "\n",
+    "If you want to use the `/v1/chat/completions` endpoint, you can try the following code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "cc032c11",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "server_ip = \"0.0.0.0\"\n",
+    "server_port = 23333"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "85a8aca4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:     127.0.0.1:48416 - \"GET /v1/models HTTP/1.1\" 200 OK\n",
+      "INFO:     127.0.0.1:48426 - \"POST /v1/chat/completions HTTP/1.1\" 200 OK\n",
+      "{'id': '2', 'object': 'chat.completion', 'created': 1734683311, 'model': 'OpenGVLab/InternVL2_5-26B-AWQ', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': \"Hello! It looks like you're testing out the system. How can I assist you today? If you have any questions or need help with something specific, feel free to ask!\", 'tool_calls': None}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 53, 'total_tokens': 90, 'completion_tokens': 37}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy.serve.openai.api_client import APIClient\n",
+    "\n",
+    "api_client = APIClient(f\"http://{server_ip}:{server_port}\")\n",
+    "model_name = api_client.available_models[0]\n",
+    "messages = [{\"role\": \"user\", \"content\": \"Say this is a test!\"}]\n",
+    "for item in api_client.chat_completions_v1(model=model_name, messages=messages):\n",
+    "    print(item)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3f47b52",
+   "metadata": {},
+   "source": [
+    "For the `/v1/completions` endpoint, you can try:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "4095a0e2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:     127.0.0.1:45650 - \"GET /v1/models HTTP/1.1\" 200 OK\n",
+      "INFO:     127.0.0.1:45660 - \"POST /v1/completions HTTP/1.1\" 200 OK\n",
+      "{'id': '3', 'object': 'text_completion', 'created': 1734683319, 'model': 'OpenGVLab/InternVL2_5-26B-AWQ', 'choices': [{'index': 0, 'text': '. I need help with a math problem. \\n\\nFind the smallest value of 2', 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 2, 'total_tokens': 19, 'completion_tokens': 17}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy.serve.openai.api_client import APIClient\n",
+    "\n",
+    "api_client = APIClient(f\"http://{server_ip}:{server_port}\")\n",
+    "model_name = api_client.available_models[0]\n",
+    "for item in api_client.completions_v1(model=model_name, prompt=\"hi\"):\n",
+    "    print(item)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7ac9e5f",
+   "metadata": {},
+   "source": [
+    "As for `/v1/chat/interactive`，we disable the feature by default. Please open it by setting `interactive_mode = True`. If you don't, it falls back to openai compatible interfaces.\n",
+    "\n",
+    "Keep in mind that `session_id` indicates an identical sequence and all requests belonging to the same sequence must share the same `session_id`.\n",
+    "For instance, in a sequence with 10 rounds of chatting requests, the `session_id` in each request should be the same."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "4b7c4695",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:     127.0.0.1:52202 - \"POST /v1/chat/interactive HTTP/1.1\" 200 OK\n",
+      "{'text': \"Hello! I'm an AI assistant, here to help you with any questions or tasks you have. How can I assist you today?\", 'tokens': 28, 'input_tokens': 54, 'history_tokens': 0, 'finish_reason': 'stop'}\n",
+      "INFO:     127.0.0.1:52218 - \"POST /v1/chat/interactive HTTP/1.1\" 200 OK\n",
+      "{'text': 'I was developed by SenseTime, a leading artificial intelligence company. SenseTime focuses on developing advanced AI technologies and applications across various industries. If you have any specific questions or need assistance with something, feel free to ask!', 'tokens': 45, 'input_tokens': 14, 'history_tokens': 82, 'finish_reason': 'stop'}\n",
+      "INFO:     127.0.0.1:52228 - \"POST /v1/chat/interactive HTTP/1.1\" 200 OK\n",
+      "{'text': \"SenseTime is a prominent artificial intelligence company based in China, known for its cutting-edge research and development in AI technologies. Here are some key points about SenseTime:\\n\\n1. **Founding and Growth**:\\n   - SenseTime was founded in 2014 and has since grown to become one of the leading AI companies globally.\\n   - It has a significant presence in China and has expanded its operations internationally.\\n\\n2. **Research and Development**:\\n   - SenseTime is at the forefront of AI research, focusing on areas such as computer vision, deep learning, and natural language processing.\\n   - The company has developed various AI applications, including facial recognition, augmented reality, and autonomous driving technologies.\\n\\n3. **Industry Applications**:\\n   - SenseTime's AI technologies are applied across various industries, including healthcare, education, finance, and entertainment.\\n   - For example, in healthcare, their AI solutions help in medical imaging and diagnostics.\\n\\n4. **Partnerships and Collaborations**:\\n   - SenseTime collaborates with numerous academic institutions, research labs, and industry partners to advance AI technologies.\\n   - They have partnerships with major tech companies and organizations to integrate their AI solutions into broader ecosystems.\\n\\n5. **Innovation and Impact**:\\n   - SenseTime is committed to driving innovation in AI and making it accessible to a wide range of users.\\n   - Their work has a significant impact on improving efficiency, accuracy, and decision-making in various fields.\\n\\n6. **Awards and Recognition**:\\n   - SenseTime has received numerous awards and recognitions for its contributions to AI and technology.\\n   - The company is often cited in media and industry reports for its advancements and leadership in AI.\\n\\nIf you have any specific questions about SenseTime or need more detailed information, feel free to ask!\", 'tokens': 365, 'input_tokens': 16, 'history_tokens': 141, 'finish_reason': 'stop'}\n",
+      "INFO:     127.0.0.1:52230 - \"POST /v1/chat/interactive HTTP/1.1\" 200 OK\n",
+      "{'text': \"Certainly! Here's a summary of our conversation so far:\\n\\n1. **Introduction**:\\n   - I am an AI assistant developed by SenseTime, a leading AI company.\\n\\n2. **About SenseTime**:\\n   - SenseTime is a prominent AI company based in China, founded in 2014.\\n   - It focuses on advanced AI technologies, including computer vision, deep learning, and natural language processing.\\n   - The company develops AI applications for various industries such as healthcare, education, finance, and entertainment.\\n   - SenseTime collaborates with academic institutions, research labs, and industry partners to advance AI technologies.\\n   - They have received numerous awards and recognitions for their contributions to AI and technology.\\n\\nIf you need more specific information or have any other questions, feel free to ask!\", 'tokens': 163, 'input_tokens': 20, 'history_tokens': 522, 'finish_reason': 'stop'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy.serve.openai.api_client import APIClient\n",
+    "\n",
+    "api_client = APIClient(f\"http://{server_ip}:{server_port}\")\n",
+    "messages = [\n",
+    "    \"hi, what's your name?\",\n",
+    "    \"who developed you?\",\n",
+    "    \"Tell me more about your developers\",\n",
+    "    \"Summarize the information we've talked so far\",\n",
+    "]\n",
+    "for message in messages:\n",
+    "    for item in api_client.chat_interactive_v1(\n",
+    "        prompt=message, session_id=1, interactive_mode=True, stream=False\n",
+    "    ):\n",
+    "        print(item)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "278265b6",
+   "metadata": {},
+   "source": [
+    "### Tools\n",
+    "\n",
+    "May refer to [api_server_tools](./api_server_tools.md).\n",
+    "\n",
+    "### Integrate with Java/Golang/Rust\n",
+    "\n",
+    "May use [openapi-generator-cli](https://github.com/OpenAPITools/openapi-generator-cli) to convert `http://{server_ip}:{server_port}/openapi.json` to java/rust/golang client.\n",
+    "Here is an example:\n",
+    "\n",
+    "```shell\n",
+    "$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust\n",
+    "\n",
+    "$ ls rust/*\n",
+    "rust/Cargo.toml  rust/git_push.sh  rust/README.md\n",
+    "\n",
+    "rust/docs:\n",
+    "ChatCompletionRequest.md  EmbeddingsRequest.md  HttpValidationError.md  LocationInner.md  Prompt.md\n",
+    "DefaultApi.md             GenerateRequest.md    Input.md                Messages.md       ValidationError.md\n",
+    "\n",
+    "rust/src:\n",
+    "apis  lib.rs  models\n",
+    "```\n",
+    "\n",
+    "### Integrate with cURL\n",
+    "\n",
+    "cURL is a tool for observing the output of the RESTful APIs.\n",
+    "\n",
+    "- list served models `v1/models`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "a1411963",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:     127.0.0.1:41826 - \"GET /v1/models HTTP/1.1\" 200 OK\n",
+      "{\"object\":\"list\",\"data\":[{\"id\":\"OpenGVLab/InternVL2_5-26B-AWQ\",\"object\":\"model\",\"created\":1734683385,\"owned_by\":\"lmdeploy\",\"root\":\"OpenGVLab/InternVL2_5-26B-AWQ\",\"parent\":null,\"permission\":[{\"id\":\"modelperm-iFPz3naHoQtF4of9cmFLoL\",\"object\":\"model_permission\",\"created\":1734683385,\"allow_create_engine\":false,\"allow_sampling\":true,\"allow_logprobs\":true,\"allow_search_indices\":true,\"allow_view\":true,\"allow_fine_tuning\":false,\"organization\":\"*\",\"group\":null,\"is_blocking\":false}]}]}"
+     ]
+    }
+   ],
+   "source": [
+    "# %%bash\n",
+    "!curl http://{server_ip}:{server_port}/v1/models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce68340d",
+   "metadata": {},
+   "source": [
+    "- chat `v1/chat/completions`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8bc660df",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "curl http://{server_ip}:{server_port}/v1/chat/completions \\\n",
+    "  -H \"Content-Type: application/json\" \\\n",
+    "  -d '{\n",
+    "    \"model\": \"internlm-chat-7b\",\n",
+    "    \"messages\": [{\"role\": \"user\", \"content\": \"Hello! How are you?\"}]\n",
+    "  }'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d5552dd",
+   "metadata": {},
+   "source": [
+    "- text completions `v1/completions`\n",
+    "\n",
+    "```shell\n",
+    "curl http://{server_ip}:{server_port}/v1/completions \\\n",
+    "  -H 'Content-Type: application/json' \\\n",
+    "  -d '{\n",
+    "  \"model\": \"llama\",\n",
+    "  \"prompt\": \"two steps to build a house:\"\n",
+    "}'\n",
+    "```\n",
+    "\n",
+    "- interactive chat `v1/chat/interactive`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "082c7709",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "curl http://{server_ip}:{server_port}/v1/chat/interactive \\\n",
+    "  -H \"Content-Type: application/json\" \\\n",
+    "  -d '{\n",
+    "    \"prompt\": \"Hello! How are you?\",\n",
+    "    \"session_id\": 1,\n",
+    "    \"interactive_mode\": true\n",
+    "  }'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec3c5eca",
+   "metadata": {},
+   "source": [
+    "## Integrate with WebUI\n",
+    "\n",
+    "```shell\n",
+    "# api_server_url is what printed in api_server.py, e.g. http://localhost:23333\n",
+    "# server_ip and server_port here are for gradio ui\n",
+    "# example: lmdeploy serve gradio http://localhost:23333 --server-name localhost --server-port 6006\n",
+    "lmdeploy serve gradio api_server_url --server-name ${gradio_ui_ip} --server-port ${gradio_ui_port}\n",
+    "```\n",
+    "\n",
+    "## FAQ\n",
+    "\n",
+    "1. When user got `\"finish_reason\":\"length\"`, it means the session is too long to be continued. The session length can be\n",
+    "   modified by passing `--session_len` to api_server.\n",
+    "\n",
+    "2. When OOM appeared at the server side, please reduce the `cache_max_entry_count` of `backend_config` when launching the service.\n",
+    "\n",
+    "3. When the request with the same `session_id` to `/v1/chat/interactive` got a empty return value and a negative `tokens`, please consider setting `interactive_mode=false` to restart the session.\n",
+    "\n",
+    "4. The `/v1/chat/interactive` api disables engaging in multiple rounds of conversation by default. The input argument `prompt` consists of either single strings or entire chat histories.\n",
+    "\n",
+    "5. Regarding the stop words, we only support characters that encode into a single index. Furthermore, there may be multiple indexes that decode into results containing the stop word. In such cases, if the number of these indexes is too large, we will only use the index encoded by the tokenizer. If you want use a stop symbol that encodes into multiple indexes, you may consider performing string matching on the streaming client side. Once a successful match is found, you can then break out of the streaming loop.\n",
+    "\n",
+    "6. To customize a chat template, please refer to [chat_template.md](../advance/chat_template.md)."
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  },
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/api_server.md ADDED Viewed

	@@ -0,0 +1,265 @@

+# OpenAI Compatible Server
+This article primarily discusses the deployment of a single LLM model across multiple GPUs on a single node, providing a service that is compatible with the OpenAI interface, as well as the usage of the service API.
+For the sake of convenience, we refer to this service as `api_server`. Regarding parallel services with multiple models, please refer to the guide about [Request Distribution Server](proxy_server.md).
+In the following sections, we will first introduce methods for starting the service, choosing the appropriate one based on your application scenario.
+Next, we focus on the definition of the service's RESTful API, explore the various ways to interact with the interface, and demonstrate how to try the service through the Swagger UI or LMDeploy CLI tools.
+Finally, we showcase how to integrate the service into a WebUI, providing you with a reference to easily set up a demonstration demo.
+## Launch Service
+Take the [internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat) model hosted on huggingface hub as an example, you can choose one the following methods to start the service.
+### Option 1: Launching with lmdeploy CLI
+```shell
+lmdeploy serve api_server internlm/internlm2_5-7b-chat --server-port 23333
+```
+The arguments of `api_server` can be viewed through the command `lmdeploy serve api_server -h`, for instance, `--tp` to set tensor parallelism, `--session-len` to specify the max length of the context window, `--cache-max-entry-count` to adjust the GPU mem ratio for k/v cache etc.
+### Option 2: Deploying with docker
+With LMDeploy [official docker image](https://hub.docker.com/r/openmmlab/lmdeploy/tags), you can run OpenAI compatible server as follows:
+```shell
+docker run --runtime nvidia --gpus all \
+    -v ~/.cache/huggingface:/root/.cache/huggingface \
+    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
+    -p 23333:23333 \
+    --ipc=host \
+    openmmlab/lmdeploy:latest \
+    lmdeploy serve api_server internlm/internlm2_5-7b-chat
+```
+The parameters of `api_server` are the same with that mentioned in "[option 1](#option-1-launching-with-lmdeploy-cli)" section
+### Option 3: Deploying to Kubernetes cluster
+Connect to a running Kubernetes cluster and deploy the internlm2_5-7b-chat model service with [kubectl](https://kubernetes.io/docs/reference/kubectl/) command-line tool (replace `<your token>` with your huggingface hub token):
+```shell
+sed 's/{{HUGGING_FACE_HUB_TOKEN}}/<your token>/' k8s/deployment.yaml | kubectl create -f - \
+    && kubectl create -f k8s/service.yaml
+```
+In the example above the model data is placed on the local disk of the node (hostPath). Consider replacing it with high-availability shared storage if multiple replicas are desired, and the storage can be mounted into container using [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/).
+## RESTful API
+LMDeploy's RESTful API is compatible with the following three OpenAI interfaces:
+- /v1/chat/completions
+- /v1/models
+- /v1/completions
+Additionally, LMDeploy also defines `/v1/chat/interactive` to support interactive inference. The feature of interactive inference is that there's no need to pass the user conversation history as required by `v1/chat/completions`, since the conversation history will be cached on the server side. This method boasts excellent performance during multi-turn long context inference.
+You can overview and try out the offered RESTful APIs by the website `http://0.0.0.0:23333` as shown in the below image after launching the service successfully.
+![swagger_ui](https://github.com/InternLM/lmdeploy/assets/4560679/b891dd90-3ffa-4333-92b2-fb29dffa1459)
+Or, you can use the LMDeploy's built-in CLI tool to verify the service correctness right from the console.
+```shell
+# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
+lmdeploy serve api_client ${api_server_url}
+```
+If you need to integrate the service into your own projects or products, we recommend the following approach:
+### Integrate with `OpenAI`
+Here is an example of interaction with the endpoint `v1/chat/completions` service via the openai package.
+Before running it, please install the openai package by `pip install openai`
+```python
+from openai import OpenAI
+client = OpenAI(
+    api_key='YOUR_API_KEY',
+    base_url="http://0.0.0.0:23333/v1"
+)
+model_name = client.models.list().data[0].id
+response = client.chat.completions.create(
+  model=model_name,
+  messages=[
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": " provide three suggestions about time management"},
+  ],
+    temperature=0.8,
+    top_p=0.8
+)
+print(response)
+```
+If you want to use async functions, may try the following example:
+```python
+import asyncio
+from openai import AsyncOpenAI
+async def main():
+    client = AsyncOpenAI(api_key='YOUR_API_KEY',
+                         base_url='http://0.0.0.0:23333/v1')
+    model_cards = await client.models.list()._get_page()
+    response = await client.chat.completions.create(
+        model=model_cards.data[0].id,
+        messages=[
+            {
+                'role': 'system',
+                'content': 'You are a helpful assistant.'
+            },
+            {
+                'role': 'user',
+                'content': ' provide three suggestions about time management'
+            },
+        ],
+        temperature=0.8,
+        top_p=0.8)
+    print(response)
+asyncio.run(main())
+```
+You can invoke other OpenAI interfaces using similar methods. For more detailed information, please refer to the [OpenAI API guide](https://platform.openai.com/docs/guides/text-generation)
+### Integrate with lmdeploy `APIClient`
+Below are some examples demonstrating how to visit the service through `APIClient`
+If you want to use the `/v1/chat/completions` endpoint, you can try the following code:
+```python
+from lmdeploy.serve.openai.api_client import APIClient
+api_client = APIClient('http://{server_ip}:{server_port}')
+model_name = api_client.available_models[0]
+messages = [{"role": "user", "content": "Say this is a test!"}]
+for item in api_client.chat_completions_v1(model=model_name, messages=messages):
+    print(item)
+```
+For the `/v1/completions` endpoint, you can try:
+```python
+from lmdeploy.serve.openai.api_client import APIClient
+api_client = APIClient('http://{server_ip}:{server_port}')
+model_name = api_client.available_models[0]
+for item in api_client.completions_v1(model=model_name, prompt='hi'):
+    print(item)
+```
+As for `/v1/chat/interactive`，we disable the feature by default. Please open it by setting `interactive_mode = True`. If you don't, it falls back to openai compatible interfaces.
+Keep in mind that `session_id` indicates an identical sequence and all requests belonging to the same sequence must share the same `session_id`.
+For instance, in a sequence with 10 rounds of chatting requests, the `session_id` in each request should be the same.
+```python
+from lmdeploy.serve.openai.api_client import APIClient
+api_client = APIClient(f'http://{server_ip}:{server_port}')
+messages = [
+    "hi, what's your name?",
+    "who developed you?",
+    "Tell me more about your developers",
+    "Summarize the information we've talked so far"
+]
+for message in messages:
+    for item in api_client.chat_interactive_v1(prompt=message,
+                                               session_id=1,
+                                               interactive_mode=True,
+                                               stream=False):
+        print(item)
+```
+### Tools
+May refer to [api_server_tools](./api_server_tools.md).
+### Integrate with Java/Golang/Rust
+May use [openapi-generator-cli](https://github.com/OpenAPITools/openapi-generator-cli) to convert `http://{server_ip}:{server_port}/openapi.json` to java/rust/golang client.
+Here is an example:
+```shell
+$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust
+$ ls rust/*
+rust/Cargo.toml  rust/git_push.sh  rust/README.md
+rust/docs:
+ChatCompletionRequest.md  EmbeddingsRequest.md  HttpValidationError.md  LocationInner.md  Prompt.md
+DefaultApi.md             GenerateRequest.md    Input.md                Messages.md       ValidationError.md
+rust/src:
+apis  lib.rs  models
+```
+### Integrate with cURL
+cURL is a tool for observing the output of the RESTful APIs.
+- list served models `v1/models`
+```bash
+curl http://{server_ip}:{server_port}/v1/models
+```
+- chat `v1/chat/completions`
+```bash
+curl http://{server_ip}:{server_port}/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "internlm-chat-7b",
+    "messages": [{"role": "user", "content": "Hello! How are you?"}]
+  }'
+```
+- text completions `v1/completions`
+```shell
+curl http://{server_ip}:{server_port}/v1/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "model": "llama",
+  "prompt": "two steps to build a house:"
+}'
+```
+- interactive chat `v1/chat/interactive`
+```bash
+curl http://{server_ip}:{server_port}/v1/chat/interactive \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompt": "Hello! How are you?",
+    "session_id": 1,
+    "interactive_mode": true
+  }'
+```
+## Integrate with WebUI
+```shell
+# api_server_url is what printed in api_server.py, e.g. http://localhost:23333
+# server_ip and server_port here are for gradio ui
+# example: lmdeploy serve gradio http://localhost:23333 --server-name localhost --server-port 6006
+lmdeploy serve gradio api_server_url --server-name ${gradio_ui_ip} --server-port ${gradio_ui_port}
+```
+## FAQ
+1. When user got `"finish_reason":"length"`, it means the session is too long to be continued. The session length can be
+   modified by passing `--session_len` to api_server.
+2. When OOM appeared at the server side, please reduce the `cache_max_entry_count` of `backend_config` when launching the service.
+3. When the request with the same `session_id` to `/v1/chat/interactive` got a empty return value and a negative `tokens`, please consider setting `interactive_mode=false` to restart the session.
+4. The `/v1/chat/interactive` api disables engaging in multiple rounds of conversation by default. The input argument `prompt` consists of either single strings or entire chat histories.
+5. Regarding the stop words, we only support characters that encode into a single index. Furthermore, there may be multiple indexes that decode into results containing the stop word. In such cases, if the number of these indexes is too large, we will only use the index encoded by the tokenizer. If you want use a stop symbol that encodes into multiple indexes, you may consider performing string matching on the streaming client side. Once a successful match is found, you can then break out of the streaming loop.
+6. To customize a chat template, please refer to [chat_template.md](../advance/chat_template.md).

a_mllm_notebooks/lmdeploy/api_server_vl.ipynb ADDED Viewed

	@@ -0,0 +1,199 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "bce4dd82",
+   "metadata": {},
+   "source": [
+    "# OpenAI Compatible Server\n",
+    "\n",
+    "This article primarily discusses the deployment of a single large vision language model across multiple GPUs on a single node, providing a service that is compatible with the OpenAI interface, as well as the usage of the service API.\n",
+    "For the sake of convenience, we refer to this service as `api_server`. Regarding parallel services with multiple models, please refer to the guide about [Request Distribution Server](../llm/proxy_server.md).\n",
+    "\n",
+    "In the following sections, we will first introduce two methods for starting the service, choosing the appropriate one based on your application scenario.\n",
+    "\n",
+    "Next, we focus on the definition of the service's RESTful API, explore the various ways to interact with the interface, and demonstrate how to try the service through the Swagger UI or LMDeploy CLI tools.\n",
+    "\n",
+    "Finally, we showcase how to integrate the service into a WebUI, providing you with a reference to easily set up a demonstration demo.\n",
+    "\n",
+    "## Launch Service\n",
+    "\n",
+    "Take the [llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model hosted on huggingface hub as an example, you can choose one the following methods to start the service.\n",
+    "\n",
+    "### Option 1: Launching with lmdeploy CLI\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve api_server liuhaotian/llava-v1.6-vicuna-7b --server-port 23333\n",
+    "```\n",
+    "\n",
+    "The arguments of `api_server` can be viewed through the command `lmdeploy serve api_server -h`, for instance, `--tp` to set tensor parallelism, `--session-len` to specify the max length of the context window, `--cache-max-entry-count` to adjust the GPU mem ratio for k/v cache etc.\n",
+    "\n",
+    "### Option 2: Deploying with docker\n",
+    "\n",
+    "With LMDeploy [official docker image](https://hub.docker.com/r/openmmlab/lmdeploy/tags), you can run OpenAI compatible server as follows:\n",
+    "\n",
+    "```shell\n",
+    "docker run --runtime nvidia --gpus all \\\n",
+    "    -v ~/.cache/huggingface:/root/.cache/huggingface \\\n",
+    "    --env \"HUGGING_FACE_HUB_TOKEN=<secret>\" \\\n",
+    "    -p 23333:23333 \\\n",
+    "    --ipc=host \\\n",
+    "    openmmlab/lmdeploy:latest \\\n",
+    "    lmdeploy serve api_server liuhaotian/llava-v1.6-vicuna-7b\n",
+    "```\n",
+    "\n",
+    "The parameters of `api_server` are the same with that mentioned in \"[option 1](#option-1-launching-with-lmdeploy-cli)\" section\n",
+    "\n",
+    "Each model may require specific dependencies not included in the Docker image. If you run into issues, you may need to install those yourself\n",
+    "on a case-by-case basis. If in doubt, refer to the specific model's project for documentation.\n",
+    "\n",
+    "For example, for Llava:\n",
+    "\n",
+    "```\n",
+    "FROM openmmlab/lmdeploy:latest\n",
+    "\n",
+    "RUN apt-get update && apt-get install -y python3 python3-pip git\n",
+    "\n",
+    "WORKDIR /app\n",
+    "\n",
+    "RUN pip3 install --upgrade pip\n",
+    "RUN pip3 install timm\n",
+    "RUN pip3 install git+https://github.com/haotian-liu/LLaVA.git --no-deps\n",
+    "\n",
+    "COPY . .\n",
+    "\n",
+    "CMD [\"lmdeploy\", \"serve\", \"api_server\", \"liuhaotian/llava-v1.6-34b\"]\n",
+    "```\n",
+    "\n",
+    "## RESTful API\n",
+    "\n",
+    "LMDeploy's RESTful API is compatible with the following three OpenAI interfaces:\n",
+    "\n",
+    "- /v1/chat/completions\n",
+    "- /v1/models\n",
+    "- /v1/completions\n",
+    "\n",
+    "The interface for image interaction is `/v1/chat/completions`, which is consistent with OpenAI.\n",
+    "\n",
+    "You can overview and try out the offered RESTful APIs by the website `http://0.0.0.0:23333` as shown in the below image after launching the service successfully.\n",
+    "\n",
+    "![swagger_ui](https://github.com/InternLM/lmdeploy/assets/4560679/b891dd90-3ffa-4333-92b2-fb29dffa1459)\n",
+    "\n",
+    "If you need to integrate the service into your own projects or products, we recommend the following approach:\n",
+    "\n",
+    "### Integrate with `OpenAI`\n",
+    "\n",
+    "Here is an example of interaction with the endpoint `v1/chat/completions` service via the openai package.\n",
+    "Before running it, please install the openai package by `pip install openai`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c5e123fd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1')\n",
+    "model_name = client.models.list().data[0].id\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[{\n",
+    "        'role':\n",
+    "        'user',\n",
+    "        'content': [{\n",
+    "            'type': 'text',\n",
+    "            'text': 'Describe the image please',\n",
+    "        }, {\n",
+    "            'type': 'image_url',\n",
+    "            'image_url': {\n",
+    "                'url':\n",
+    "                'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg',\n",
+    "            },\n",
+    "        }],\n",
+    "    }],\n",
+    "    temperature=0.8,\n",
+    "    top_p=0.8)\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1478f29",
+   "metadata": {},
+   "source": [
+    "### Integrate with lmdeploy `APIClient`\n",
+    "\n",
+    "Below are some examples demonstrating how to visit the service through `APIClient`\n",
+    "\n",
+    "If you want to use the `/v1/chat/completions` endpoint, you can try the following code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "01388ff3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy.serve.openai.api_client import APIClient\n",
+    "\n",
+    "api_client = APIClient(f'http://0.0.0.0:23333')\n",
+    "model_name = api_client.available_models[0]\n",
+    "messages = [{\n",
+    "    'role':\n",
+    "    'user',\n",
+    "    'content': [{\n",
+    "        'type': 'text',\n",
+    "        'text': 'Describe the image please',\n",
+    "    }, {\n",
+    "        'type': 'image_url',\n",
+    "        'image_url': {\n",
+    "            'url':\n",
+    "            'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg',\n",
+    "        },\n",
+    "    }]\n",
+    "}]\n",
+    "for item in api_client.chat_completions_v1(model=model_name,\n",
+    "                                           messages=messages):\n",
+    "    print(item)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39cd5080",
+   "metadata": {},
+   "source": [
+    "### Integrate with Java/Golang/Rust\n",
+    "\n",
+    "May use [openapi-generator-cli](https://github.com/OpenAPITools/openapi-generator-cli) to convert `http://{server_ip}:{server_port}/openapi.json` to java/rust/golang client.\n",
+    "Here is an example:\n",
+    "\n",
+    "```shell\n",
+    "$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust\n",
+    "\n",
+    "$ ls rust/*\n",
+    "rust/Cargo.toml  rust/git_push.sh  rust/README.md\n",
+    "\n",
+    "rust/docs:\n",
+    "ChatCompletionRequest.md  EmbeddingsRequest.md  HttpValidationError.md  LocationInner.md  Prompt.md\n",
+    "DefaultApi.md             GenerateRequest.md    Input.md                Messages.md       ValidationError.md\n",
+    "\n",
+    "rust/src:\n",
+    "apis  lib.rs  models\n",
+    "```"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/api_server_vl.md ADDED Viewed

	@@ -0,0 +1,155 @@

+# OpenAI Compatible Server
+This article primarily discusses the deployment of a single large vision language model across multiple GPUs on a single node, providing a service that is compatible with the OpenAI interface, as well as the usage of the service API.
+For the sake of convenience, we refer to this service as `api_server`. Regarding parallel services with multiple models, please refer to the guide about [Request Distribution Server](../llm/proxy_server.md).
+In the following sections, we will first introduce two methods for starting the service, choosing the appropriate one based on your application scenario.
+Next, we focus on the definition of the service's RESTful API, explore the various ways to interact with the interface, and demonstrate how to try the service through the Swagger UI or LMDeploy CLI tools.
+Finally, we showcase how to integrate the service into a WebUI, providing you with a reference to easily set up a demonstration demo.
+## Launch Service
+Take the [llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model hosted on huggingface hub as an example, you can choose one the following methods to start the service.
+### Option 1: Launching with lmdeploy CLI
+```shell
+lmdeploy serve api_server liuhaotian/llava-v1.6-vicuna-7b --server-port 23333
+```
+The arguments of `api_server` can be viewed through the command `lmdeploy serve api_server -h`, for instance, `--tp` to set tensor parallelism, `--session-len` to specify the max length of the context window, `--cache-max-entry-count` to adjust the GPU mem ratio for k/v cache etc.
+### Option 2: Deploying with docker
+With LMDeploy [official docker image](https://hub.docker.com/r/openmmlab/lmdeploy/tags), you can run OpenAI compatible server as follows:
+```shell
+docker run --runtime nvidia --gpus all \
+    -v ~/.cache/huggingface:/root/.cache/huggingface \
+    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
+    -p 23333:23333 \
+    --ipc=host \
+    openmmlab/lmdeploy:latest \
+    lmdeploy serve api_server liuhaotian/llava-v1.6-vicuna-7b
+```
+The parameters of `api_server` are the same with that mentioned in "[option 1](#option-1-launching-with-lmdeploy-cli)" section
+Each model may require specific dependencies not included in the Docker image. If you run into issues, you may need to install those yourself
+on a case-by-case basis. If in doubt, refer to the specific model's project for documentation.
+For example, for Llava:
+```
+FROM openmmlab/lmdeploy:latest
+RUN apt-get update && apt-get install -y python3 python3-pip git
+WORKDIR /app
+RUN pip3 install --upgrade pip
+RUN pip3 install timm
+RUN pip3 install git+https://github.com/haotian-liu/LLaVA.git --no-deps
+COPY . .
+CMD ["lmdeploy", "serve", "api_server", "liuhaotian/llava-v1.6-34b"]
+```
+## RESTful API
+LMDeploy's RESTful API is compatible with the following three OpenAI interfaces:
+- /v1/chat/completions
+- /v1/models
+- /v1/completions
+The interface for image interaction is `/v1/chat/completions`, which is consistent with OpenAI.
+You can overview and try out the offered RESTful APIs by the website `http://0.0.0.0:23333` as shown in the below image after launching the service successfully.
+![swagger_ui](https://github.com/InternLM/lmdeploy/assets/4560679/b891dd90-3ffa-4333-92b2-fb29dffa1459)
+If you need to integrate the service into your own projects or products, we recommend the following approach:
+### Integrate with `OpenAI`
+Here is an example of interaction with the endpoint `v1/chat/completions` service via the openai package.
+Before running it, please install the openai package by `pip install openai`
+```python
+from openai import OpenAI
+client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1')
+model_name = client.models.list().data[0].id
+response = client.chat.completions.create(
+    model=model_name,
+    messages=[{
+        'role':
+        'user',
+        'content': [{
+            'type': 'text',
+            'text': 'Describe the image please',
+        }, {
+            'type': 'image_url',
+            'image_url': {
+                'url':
+                'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg',
+            },
+        }],
+    }],
+    temperature=0.8,
+    top_p=0.8)
+print(response)
+```
+### Integrate with lmdeploy `APIClient`
+Below are some examples demonstrating how to visit the service through `APIClient`
+If you want to use the `/v1/chat/completions` endpoint, you can try the following code:
+```python
+from lmdeploy.serve.openai.api_client import APIClient
+api_client = APIClient(f'http://0.0.0.0:23333')
+model_name = api_client.available_models[0]
+messages = [{
+    'role':
+    'user',
+    'content': [{
+        'type': 'text',
+        'text': 'Describe the image please',
+    }, {
+        'type': 'image_url',
+        'image_url': {
+            'url':
+            'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg',
+        },
+    }]
+}]
+for item in api_client.chat_completions_v1(model=model_name,
+                                           messages=messages):
+    print(item)
+```
+### Integrate with Java/Golang/Rust
+May use [openapi-generator-cli](https://github.com/OpenAPITools/openapi-generator-cli) to convert `http://{server_ip}:{server_port}/openapi.json` to java/rust/golang client.
+Here is an example:
+```shell
+$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust
+$ ls rust/*
+rust/Cargo.toml  rust/git_push.sh  rust/README.md
+rust/docs:
+ChatCompletionRequest.md  EmbeddingsRequest.md  HttpValidationError.md  LocationInner.md  Prompt.md
+DefaultApi.md             GenerateRequest.md    Input.md                Messages.md       ValidationError.md
+rust/src:
+apis  lib.rs  models
+```

a_mllm_notebooks/lmdeploy/download_md.ipynb ADDED Viewed

	@@ -0,0 +1,211 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# https://github.com/InternLM/lmdeploy/blob/main/docs/en/llm/pipeline.md\n",
+    "\n",
+    "# download then convert to jupyter notebook\n",
+    "\n",
+    "import os\n",
+    "import sys\n",
+    "import json\n",
+    "import requests\n",
+    "# import jupyter_text\n",
+    "\n",
+    "\n",
+    "def download_markdown_and_save(url, filename):\n",
+    "    # remove existing file\n",
+    "    if os.path.exists(filename):\n",
+    "        os.remove(filename)\n",
+    "    \n",
+    "    import wget \n",
+    "    # preprocess url to downloadable url\n",
+    "    url = url.replace(\"github.com\", \"raw.githubusercontent.com\")\n",
+    "    url = url.replace(\"blob/\", \"\")\n",
+    "    print(f\"Downloading {url}\")\n",
+    "    wget.download(url, filename)\n",
+    "    print(f\"Downloaded {filename}\")\n",
+    "    \n",
+    "    \n",
+    "    \n",
+    "# !jupytext --to notebook your_markdown_file.md\n",
+    "\n",
+    "def convert_markdown_to_jupyter_notebook(filename):\n",
+    "    os.system(f\"jupytext --to notebook {filename}\")\n",
+    "    print(f\"Converted {filename} to jupyter notebook.\")\n",
+    "    \n",
+    "    \n",
+    "def markdown2jupyter(url, filename):\n",
+    "    download_markdown_and_save(url, filename)\n",
+    "    convert_markdown_to_jupyter_notebook(filename)\n",
+    "\n",
+    "\n",
+    "# def main():\n",
+    "#     url = \"https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/llm/pipeline.md\"\n",
+    "#     filename = \"pipeline.md\"\n",
+    "#     download_markdown_and_save(url, filename)\n",
+    "#     convert_markdown_to_jupyter_notebook(filename)\n",
+    "    \n",
+    "        \n",
+    "# if __name__ == \"__main__\":\n",
+    "#     main()\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/get_started/get_started.md\n",
+      "Downloaded get_started_vl.md\n",
+      "[jupytext] Reading get_started_vl.md in format md\n",
+      "[jupytext] Writing get_started_vl.ipynb\n",
+      "Converted get_started_vl.md to jupyter notebook.\n"
+     ]
+    }
+   ],
+   "source": [
+    "markdown2jupyter(\n",
+    "    'https://github.com/InternLM/lmdeploy/blob/main/docs/en/get_started/get_started.md',\n",
+    "    'get_started_vl.md'\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Overwriting links.txt\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%writefile links.txt\n",
+    "'https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/kv_quant.md'\n",
+    "'https://github.com/InternLM/lmdeploy/blob/main/docs/en/advance/pytorch_new_model.md'\n",
+    "'https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/turbomind.md'\n",
+    "'https://github.com/InternLM/lmdeploy/blob/main/docs/en/multi_modal/api_server_vl.md'\n",
+    "'https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/w4a16.md'\n",
+    "'https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/w8a8.md'\n",
+    "'https://github.com/InternLM/lmdeploy/blob/main/docs/en/llm/proxy_server.md'\n",
+    "'https://github.com/InternLM/lmdeploy/blob/main/docs/en/advance/long_context.md'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "list_url = []\n",
+    "with open('links.txt') as f:\n",
+    "    list_url = f.readlines()\n",
+    "for i in range(len(list_url)):\n",
+    "    list_url[i] = eval(list_url[i])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/quantization/kv_quant.md\n",
+      "Downloaded kv_quant.md\n",
+      "[jupytext] Reading kv_quant.md in format md\n",
+      "[jupytext] Writing kv_quant.ipynb (destination file replaced [use --update to preserve cell outputs and ids])\n",
+      "Converted kv_quant.md to jupyter notebook.\n",
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/advance/pytorch_new_model.md\n",
+      "Downloaded pytorch_new_model.md\n",
+      "[jupytext] Reading pytorch_new_model.md in format md\n",
+      "[jupytext] Writing pytorch_new_model.ipynb (destination file replaced [use --update to preserve cell outputs and ids])\n",
+      "Converted pytorch_new_model.md to jupyter notebook.\n",
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/inference/turbomind.md\n",
+      "Downloaded turbomind.md\n",
+      "[jupytext] Reading turbomind.md in format md\n",
+      "[jupytext] Writing turbomind.ipynb (destination file replaced [use --update to preserve cell outputs and ids])\n",
+      "Converted turbomind.md to jupyter notebook.\n",
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/multi_modal/api_server_vl.md\n",
+      "Downloaded api_server_vl.md\n",
+      "[jupytext] Reading api_server_vl.md in format md\n",
+      "[jupytext] Writing api_server_vl.ipynb (destination file replaced [use --update to preserve cell outputs and ids])\n",
+      "Converted api_server_vl.md to jupyter notebook.\n",
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/quantization/w4a16.md\n",
+      "Downloaded w4a16.md\n",
+      "[jupytext] Reading w4a16.md in format md\n",
+      "[jupytext] Writing w4a16.ipynb (destination file replaced [use --update to preserve cell outputs and ids])\n",
+      "Converted w4a16.md to jupyter notebook.\n",
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/quantization/w8a8.md\n",
+      "Downloaded w8a8.md\n",
+      "[jupytext] Reading w8a8.md in format md\n",
+      "[jupytext] Writing w8a8.ipynb (destination file replaced [use --update to preserve cell outputs and ids])\n",
+      "Converted w8a8.md to jupyter notebook.\n",
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/llm/proxy_server.md\n",
+      "Downloaded proxy_server.md\n",
+      "[jupytext] Reading proxy_server.md in format md\n",
+      "[jupytext] Writing proxy_server.ipynb (destination file replaced [use --update to preserve cell outputs and ids])\n",
+      "Converted proxy_server.md to jupyter notebook.\n",
+      "Downloading https://raw.githubusercontent.com/InternLM/lmdeploy/main/docs/en/advance/long_context.md\n",
+      "Downloaded long_context.md\n",
+      "[jupytext] Reading long_context.md in format md\n",
+      "[jupytext] Writing long_context.ipynb (destination file replaced [use --update to preserve cell outputs and ids])\n",
+      "Converted long_context.md to jupyter notebook.\n"
+     ]
+    }
+   ],
+   "source": [
+    "for i in range(len(list_url)):\n",
+    "    url = list_url[i]\n",
+    "    name = url.split('/')[-1]\n",
+    "    markdown2jupyter(url, name)\n",
+    "    \n",
+    "# delete all file{i}.md"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_mllm_notebooks/lmdeploy/get_started_vl.ipynb ADDED Viewed

	@@ -0,0 +1,517 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a210e718",
+   "metadata": {},
+   "source": [
+    "# Quick Start\n",
+    "\n",
+    "This tutorial shows the usage of LMDeploy on CUDA platform:\n",
+    "\n",
+    "- Offline inference of LLM model and VLM model\n",
+    "- Serve a LLM or VLM model by the OpenAI compatible server\n",
+    "- Console CLI to interactively chat with LLM model\n",
+    "\n",
+    "Before reading further, please ensure that you have installed lmdeploy as outlined in the [installation guide](installation.md)\n",
+    "\n",
+    "## Offline batch inference\n",
+    "\n",
+    "### LLM inference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1e86fd28",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline\n",
+    "\n",
+    "pipe = pipeline(\"internlm/internlm2_5-7b-chat\")\n",
+    "response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"])\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3c14b37",
+   "metadata": {},
+   "source": [
+    "When constructing the `pipeline`, if an inference engine is not designated between the TurboMind Engine and the PyTorch Engine, LMDeploy will automatically assign one based on [their respective capabilities](../supported_models/supported_models.md), with the TurboMind Engine taking precedence by default.\n",
+    "\n",
+    "However, you have the option to manually select an engine. For instance,"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2b71c8bb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, TurbomindEngineConfig\n",
+    "\n",
+    "pipe = pipeline(\n",
+    "    \"internlm/internlm2_5-7b-chat\",\n",
+    "    backend_config=TurbomindEngineConfig(\n",
+    "        max_batch_size=32,\n",
+    "        enable_prefix_caching=True,\n",
+    "        cache_max_entry_count=0.8,\n",
+    "        session_len=8192,\n",
+    "    ),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c34d729a",
+   "metadata": {},
+   "source": [
+    "or,"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4878141f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, PytorchEngineConfig\n",
+    "\n",
+    "pipe = pipeline(\n",
+    "    \"internlm/internlm2_5-7b-chat\",\n",
+    "    backend_config=PytorchEngineConfig(\n",
+    "        max_batch_size=32,\n",
+    "        enable_prefix_caching=True,\n",
+    "        cache_max_entry_count=0.8,\n",
+    "        session_len=8192,\n",
+    "    ),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca986d53",
+   "metadata": {},
+   "source": [
+    "```{note}\n",
+    "The parameter \"cache_max_entry_count\" significantly influences the GPU memory usage.\n",
+    "It means the proportion of FREE GPU memory occupied by the K/V cache after the model weights are loaded.\n",
+    "\n",
+    "The default value is 0.8. The K/V cache memory is allocated once and reused repeatedly, which is why it is observed that the built pipeline and the \"api_server\" mentioned later in the next consumes a substantial amount of GPU memory.\n",
+    "\n",
+    "If you encounter an Out-of-Memory(OOM) error, you may need to consider lowering the value of \"cache_max_entry_count\".\n",
+    "```\n",
+    "\n",
+    "When use the callable `pipe()` to perform token generation with given prompts, you can set the sampling parameters via `GenerationConfig` as below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bd007ca1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import GenerationConfig, pipeline\n",
+    "\n",
+    "pipe = pipeline(\"internlm/internlm2_5-7b-chat\")\n",
+    "prompts = [\"Hi, pls intro yourself\", \"Shanghai is\"]\n",
+    "response = pipe(\n",
+    "    prompts,\n",
+    "    gen_config=GenerationConfig(\n",
+    "        max_new_tokens=1024, top_p=0.8, top_k=40, temperature=0.6\n",
+    "    ),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4b9ce5d",
+   "metadata": {},
+   "source": [
+    "In the `GenerationConfig`, `top_k=1` or `temperature=0.0` indicates greedy search.\n",
+    "\n",
+    "For more information about pipeline, please read the [detailed tutorial](../llm/pipeline.md)\n",
+    "\n",
+    "### VLM inference\n",
+    "\n",
+    "The usage of VLM inference pipeline is akin to that of LLMs, with the additional capability of processing image data with the pipeline.\n",
+    "For example, you can utilize the following code snippet to perform the inference with an InternVL model:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "926fad07",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline\n",
+    "from lmdeploy.vl import load_image\n",
+    "\n",
+    "pipe = pipeline(\"OpenGVLab/InternVL2-8B\")\n",
+    "\n",
+    "image = load_image(\n",
+    "    \"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg\"\n",
+    ")\n",
+    "response = pipe((\"describe this image\", image))\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3f0a6a0",
+   "metadata": {},
+   "source": [
+    "In VLM pipeline, the default image processing batch size is 1. This can be adjusted by `VisionConfig`. For instance, you might set it like this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0fcd88e9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    },
+    {
+     "ename": "",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
+      "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
+      "\u001b[1;31mClick <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. \n",
+      "\u001b[1;31mView Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details."
+     ]
+    }
+   ],
+   "source": [
+    "# %pip install nest_asyncio\n",
+    "import nest_asyncio\n",
+    "nest_asyncio.apply()\n",
+    "from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig\n",
+    "# backend_config = TurbomindEngineConfig(tp=4, cache_max_entry_count=0.2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b12e46c5",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Fetching 32 files: 100%|█████████████████████████████████████| 32/32 [00:00<00:00, 27296.67it/s]\n",
+      "InternLM2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.\n",
+      "  - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes\n",
+      "  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).\n",
+      "  - If you are not the owner of the model architecture class, please contact the model code owner to update it.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2024-12-20 09:07:32,076 - lmdeploy - \u001b[33mWARNING\u001b[0m - tokenizer.py:243 - The current version of `transformers` is transformers==4.46.3, which is lower than the required version transformers==4.47.0. Please upgrade to the required version.\n",
+      "2024-12-20 09:07:34,912 - lmdeploy - \u001b[33mWARNING\u001b[0m - turbomind.py:231 - get 2985 model params\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[TM][WARNING] [LlamaTritonModel] `max_context_token_num` is not set, default to 32768.\n",
+      "[TM][WARNING] pad vocab size from 92553 to 92556\n",
+      "[TM][WARNING] pad embed size from 92556 to 92556\n",
+      "[TM][WARNING] pad vocab size from 92553 to 92556\n",
+      "[TM][WARNING] pad embed size from 92556 to 92556\n",
+      "[TM][WARNING] pad vocab size from 92553 to 92556\n",
+      "[TM][WARNING] pad embed size from 92556 to 92556\n",
+      "[TM][WARNING] pad vocab size from 92553 to 92556\n",
+      "[TM][WARNING] pad embed size from 92556 to 92556\n",
+      "                                                                                                \r"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
+      "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
+      "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
+      "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
+      "2024-12-20 09:08:10,327 - lmdeploy - \u001b[33mWARNING\u001b[0m - async_engine.py:505 - GenerationConfig: GenerationConfig(n=1, max_new_tokens=256, do_sample=False, top_p=0.8, top_k=40, min_p=0.0, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, stop_token_ids=[92542, 92540], bad_token_ids=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, response_format=None, logits_processors=None)\n",
+      "2024-12-20 09:08:10,328 - lmdeploy - \u001b[33mWARNING\u001b[0m - async_engine.py:506 - Since v0.6.0, lmdeploy add `do_sample` in GenerationConfig. It defaults to False, meaning greedy decoding. Please set `do_sample=True` if sampling  decoding is needed\n"
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy import pipeline, VisionConfig\n",
+    "from lmdeploy.vl import load_image\n",
+    "gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0.8, max_new_tokens=256)\n",
+    "\n",
+    "pipe = pipeline(\n",
+    "    \"OpenGVLab/InternVL2_5-26B-AWQ\", vision_config=VisionConfig(max_batch_size=1),\n",
+    "    backend_config=TurbomindEngineConfig(tp=4, cache_max_entry_count=0.4),\n",
+    ")\n",
+    "\n",
+    "image = load_image(\n",
+    "    \"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg\"\n",
+    ")\n",
+    "response = pipe((\"describe this image\", image), gen_config=gen_config)\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b83a357",
+   "metadata": {},
+   "source": [
+    "However, the larger the image batch size, the greater risk of an OOM error, because the LLM component within the VLM model pre-allocates a massive amount of memory in advance.\n",
+    "\n",
+    "We encourage you to manually choose between the TurboMind Engine and the PyTorch Engine based on their respective capabilities, as detailed in [the supported-models matrix](../supported_models/supported_models.md).\n",
+    "Additionally, follow the instructions in [LLM Inference](#llm-inference) section to reduce the values of memory-related parameters\n",
+    "\n",
+    "## Serving\n",
+    "\n",
+    "As demonstrated in the previous [offline batch inference](#offline-batch-inference) section, this part presents the respective serving methods for LLMs and VLMs.\n",
+    "\n",
+    "### Serve a LLM model\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve api_server internlm/internlm2_5-7b-chat\n",
+    "```\n",
+    "\n",
+    "This command will launch an OpenAI-compatible server on the localhost at port `23333`. You can specify a different server port by using the `--server-port` option.\n",
+    "For more options, consult the help documentation by running `lmdeploy serve api_server --help`. Most of these options align with the engine configuration.\n",
+    "\n",
+    "To access the service, you can utilize the official OpenAI Python package `pip install openai`. Below is an example demonstrating how to use the entrypoint `v1/chat/completions`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3e625411",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:23333/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "        {\"role\": \"user\", \"content\": \" provide three suggestions about time management\"},\n",
+    "    ],\n",
+    "    temperature=0.8,\n",
+    "    top_p=0.8,\n",
+    ")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24be9e23",
+   "metadata": {},
+   "source": [
+    "We encourage you to refer to the detailed guide for more comprehensive information about [serving with Docker](../llm/api_server.md), [function calls](../llm/api_server_tools.md) and other topics\n",
+    "\n",
+    "### Serve a VLM model\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve api_server OpenGVLab/InternVL2-8B\n",
+    "```\n",
+    "\n",
+    "```{note}\n",
+    "LMDeploy reuses the vision component from upstream VLM repositories. Each upstream VLM model may have different dependencies.\n",
+    "Consequently, LMDeploy has decided not to include the dependencies of the upstream VLM repositories in its own dependency list.\n",
+    "If you encounter an \"ImportError\" when using LMDeploy for inference with VLM models, please install the relevant dependencies yourself.\n",
+    "```\n",
+    "\n",
+    "After the service is launched successfully, you can access the VLM service in a manner similar to how you would access the `gptv4` service by modifying the `api_key` and `base_url` parameters:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "02236cc9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--2024-12-20 08:55:15--  https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg\n",
+      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...\n",
+      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.\n",
+      "HTTP request sent, awaiting response... 200 OK\n",
+      "Length: 13929 (14K) [image/jpeg]\n",
+      "Saving to: ‘tiger.jpeg’\n",
+      "\n",
+      "tiger.jpeg          100%[===================>]  13.60K  --.-KB/s    in 0.003s  \n",
+      "\n",
+      "2024-12-20 08:55:16 (3.97 MB/s) - ‘tiger.jpeg’ saved [13929/13929]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# download \"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg\" to local\n",
+    "!wget https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "df43b1ea",
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "APIConnectionError",
+     "evalue": "Connection error.",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mConnectError\u001b[0m                              Traceback (most recent call last)",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpx/_transports/default.py:72\u001b[0m, in \u001b[0;36mmap_httpcore_exceptions\u001b[0;34m()\u001b[0m\n\u001b[1;32m     71\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m---> 72\u001b[0m     \u001b[38;5;28;01myield\u001b[39;00m\n\u001b[1;32m     73\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exc:\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpx/_transports/default.py:236\u001b[0m, in \u001b[0;36mHTTPTransport.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    235\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m map_httpcore_exceptions():\n\u001b[0;32m--> 236\u001b[0m     resp \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_pool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandle_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreq\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    238\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(resp\u001b[38;5;241m.\u001b[39mstream, typing\u001b[38;5;241m.\u001b[39mIterable)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpcore/_sync/connection_pool.py:216\u001b[0m, in \u001b[0;36mConnectionPool.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    215\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_close_connections(closing)\n\u001b[0;32m--> 216\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m exc \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m    218\u001b[0m \u001b[38;5;66;03m# Return the response. Note that in this case we still have to manage\u001b[39;00m\n\u001b[1;32m    219\u001b[0m \u001b[38;5;66;03m# the point at which the response is closed.\u001b[39;00m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpcore/_sync/connection_pool.py:196\u001b[0m, in \u001b[0;36mConnectionPool.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    194\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    195\u001b[0m     \u001b[38;5;66;03m# Send the request on the assigned connection.\u001b[39;00m\n\u001b[0;32m--> 196\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[43mconnection\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandle_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    197\u001b[0m \u001b[43m        \u001b[49m\u001b[43mpool_request\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrequest\u001b[49m\n\u001b[1;32m    198\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    199\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m ConnectionNotAvailable:\n\u001b[1;32m    200\u001b[0m     \u001b[38;5;66;03m# In some cases a connection may initially be available to\u001b[39;00m\n\u001b[1;32m    201\u001b[0m     \u001b[38;5;66;03m# handle a request, but then become unavailable.\u001b[39;00m\n\u001b[1;32m    202\u001b[0m     \u001b[38;5;66;03m#\u001b[39;00m\n\u001b[1;32m    203\u001b[0m     \u001b[38;5;66;03m# In this case we clear the connection and try again.\u001b[39;00m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpcore/_sync/connection.py:99\u001b[0m, in \u001b[0;36mHTTPConnection.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m     98\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_connect_failed \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[0;32m---> 99\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m exc\n\u001b[1;32m    101\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_connection\u001b[38;5;241m.\u001b[39mhandle_request(request)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpcore/_sync/connection.py:76\u001b[0m, in \u001b[0;36mHTTPConnection.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m     75\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_connection \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m---> 76\u001b[0m     stream \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_connect\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     78\u001b[0m     ssl_object \u001b[38;5;241m=\u001b[39m stream\u001b[38;5;241m.\u001b[39mget_extra_info(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mssl_object\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpcore/_sync/connection.py:122\u001b[0m, in \u001b[0;36mHTTPConnection._connect\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    121\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m Trace(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mconnect_tcp\u001b[39m\u001b[38;5;124m\"\u001b[39m, logger, request, kwargs) \u001b[38;5;28;01mas\u001b[39;00m trace:\n\u001b[0;32m--> 122\u001b[0m     stream \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_network_backend\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mconnect_tcp\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    123\u001b[0m     trace\u001b[38;5;241m.\u001b[39mreturn_value \u001b[38;5;241m=\u001b[39m stream\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpcore/_backends/sync.py:213\u001b[0m, in \u001b[0;36mSyncBackend.connect_tcp\u001b[0;34m(self, host, port, timeout, local_address, socket_options)\u001b[0m\n\u001b[1;32m    212\u001b[0m         sock\u001b[38;5;241m.\u001b[39msetsockopt(\u001b[38;5;241m*\u001b[39moption)  \u001b[38;5;66;03m# pragma: no cover\u001b[39;00m\n\u001b[0;32m--> 213\u001b[0m     sock\u001b[38;5;241m.\u001b[39msetsockopt(socket\u001b[38;5;241m.\u001b[39mIPPROTO_TCP, socket\u001b[38;5;241m.\u001b[39mTCP_NODELAY, \u001b[38;5;241m1\u001b[39m)\n\u001b[1;32m    214\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m SyncStream(sock)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/contextlib.py:131\u001b[0m, in \u001b[0;36m_GeneratorContextManager.__exit__\u001b[0;34m(self, type, value, traceback)\u001b[0m\n\u001b[1;32m    130\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 131\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgen\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mthrow\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mtype\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtraceback\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    132\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mStopIteration\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[1;32m    133\u001b[0m     \u001b[38;5;66;03m# Suppress StopIteration *unless* it's the same exception that\u001b[39;00m\n\u001b[1;32m    134\u001b[0m     \u001b[38;5;66;03m# was passed to throw().  This prevents a StopIteration\u001b[39;00m\n\u001b[1;32m    135\u001b[0m     \u001b[38;5;66;03m# raised inside the \"with\" statement from being suppressed.\u001b[39;00m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpcore/_exceptions.py:14\u001b[0m, in \u001b[0;36mmap_exceptions\u001b[0;34m(map)\u001b[0m\n\u001b[1;32m     13\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(exc, from_exc):\n\u001b[0;32m---> 14\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m to_exc(exc) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mexc\u001b[39;00m\n\u001b[1;32m     15\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m\n",
+      "\u001b[0;31mConnectError\u001b[0m: [Errno 111] Connection refused",
+      "\nThe above exception was the direct cause of the following exception:\n",
+      "\u001b[0;31mConnectError\u001b[0m                              Traceback (most recent call last)",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:981\u001b[0m, in \u001b[0;36mSyncAPIClient._request\u001b[0;34m(self, cast_to, options, retries_taken, stream, stream_cls)\u001b[0m\n\u001b[1;32m    980\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 981\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_client\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msend\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    982\u001b[0m \u001b[43m        \u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    983\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_should_stream_response_body\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    984\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    985\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    986\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m httpx\u001b[38;5;241m.\u001b[39mTimeoutException \u001b[38;5;28;01mas\u001b[39;00m err:\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpx/_client.py:926\u001b[0m, in \u001b[0;36mClient.send\u001b[0;34m(self, request, stream, auth, follow_redirects)\u001b[0m\n\u001b[1;32m    924\u001b[0m auth \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_build_request_auth(request, auth)\n\u001b[0;32m--> 926\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_send_handling_auth\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    927\u001b[0m \u001b[43m    \u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    928\u001b[0m \u001b[43m    \u001b[49m\u001b[43mauth\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mauth\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    929\u001b[0m \u001b[43m    \u001b[49m\u001b[43mfollow_redirects\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mfollow_redirects\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    930\u001b[0m \u001b[43m    \u001b[49m\u001b[43mhistory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    931\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    932\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpx/_client.py:954\u001b[0m, in \u001b[0;36mClient._send_handling_auth\u001b[0;34m(self, request, auth, follow_redirects, history)\u001b[0m\n\u001b[1;32m    953\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n\u001b[0;32m--> 954\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_send_handling_redirects\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    955\u001b[0m \u001b[43m        \u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    956\u001b[0m \u001b[43m        \u001b[49m\u001b[43mfollow_redirects\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mfollow_redirects\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    957\u001b[0m \u001b[43m        \u001b[49m\u001b[43mhistory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mhistory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    958\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    959\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpx/_client.py:991\u001b[0m, in \u001b[0;36mClient._send_handling_redirects\u001b[0;34m(self, request, follow_redirects, history)\u001b[0m\n\u001b[1;32m    989\u001b[0m     hook(request)\n\u001b[0;32m--> 991\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_send_single_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    992\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpx/_client.py:1027\u001b[0m, in \u001b[0;36mClient._send_single_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m   1026\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m request_context(request\u001b[38;5;241m=\u001b[39mrequest):\n\u001b[0;32m-> 1027\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[43mtransport\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandle_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1029\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(response\u001b[38;5;241m.\u001b[39mstream, SyncByteStream)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpx/_transports/default.py:236\u001b[0m, in \u001b[0;36mHTTPTransport.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    235\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m map_httpcore_exceptions():\n\u001b[0;32m--> 236\u001b[0m     resp \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_pool\u001b[38;5;241m.\u001b[39mhandle_request(req)\n\u001b[1;32m    238\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(resp\u001b[38;5;241m.\u001b[39mstream, typing\u001b[38;5;241m.\u001b[39mIterable)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/contextlib.py:131\u001b[0m, in \u001b[0;36m_GeneratorContextManager.__exit__\u001b[0;34m(self, type, value, traceback)\u001b[0m\n\u001b[1;32m    130\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 131\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgen\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mthrow\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mtype\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtraceback\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    132\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mStopIteration\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[1;32m    133\u001b[0m     \u001b[38;5;66;03m# Suppress StopIteration *unless* it's the same exception that\u001b[39;00m\n\u001b[1;32m    134\u001b[0m     \u001b[38;5;66;03m# was passed to throw().  This prevents a StopIteration\u001b[39;00m\n\u001b[1;32m    135\u001b[0m     \u001b[38;5;66;03m# raised inside the \"with\" statement from being suppressed.\u001b[39;00m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/httpx/_transports/default.py:89\u001b[0m, in \u001b[0;36mmap_httpcore_exceptions\u001b[0;34m()\u001b[0m\n\u001b[1;32m     88\u001b[0m message \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mstr\u001b[39m(exc)\n\u001b[0;32m---> 89\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m mapped_exc(message) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mexc\u001b[39;00m\n",
+      "\u001b[0;31mConnectError\u001b[0m: [Errno 111] Connection refused",
+      "\nThe above exception was the direct cause of the following exception:\n",
+      "\u001b[0;31mAPIConnectionError\u001b[0m                        Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[6], line 4\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mopenai\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m OpenAI\n\u001b[1;32m      3\u001b[0m client \u001b[38;5;241m=\u001b[39m OpenAI(api_key\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mYOUR_API_KEY\u001b[39m\u001b[38;5;124m\"\u001b[39m, base_url\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhttp://0.0.0.0:23333/v1\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m----> 4\u001b[0m model_name \u001b[38;5;241m=\u001b[39m \u001b[43mclient\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmodels\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlist\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241m.\u001b[39mdata[\u001b[38;5;241m0\u001b[39m]\u001b[38;5;241m.\u001b[39mid\n\u001b[1;32m      5\u001b[0m response \u001b[38;5;241m=\u001b[39m client\u001b[38;5;241m.\u001b[39mchat\u001b[38;5;241m.\u001b[39mcompletions\u001b[38;5;241m.\u001b[39mcreate(\n\u001b[1;32m      6\u001b[0m     model\u001b[38;5;241m=\u001b[39mmodel_name,\n\u001b[1;32m      7\u001b[0m     messages\u001b[38;5;241m=\u001b[39m[\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     25\u001b[0m     top_p\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m0.8\u001b[39m,\n\u001b[1;32m     26\u001b[0m )\n\u001b[1;32m     27\u001b[0m \u001b[38;5;28mprint\u001b[39m(response)\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/resources/models.py:91\u001b[0m, in \u001b[0;36mModels.list\u001b[0;34m(self, extra_headers, extra_query, extra_body, timeout)\u001b[0m\n\u001b[1;32m     77\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mlist\u001b[39m(\n\u001b[1;32m     78\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m     79\u001b[0m     \u001b[38;5;241m*\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     85\u001b[0m     timeout: \u001b[38;5;28mfloat\u001b[39m \u001b[38;5;241m|\u001b[39m httpx\u001b[38;5;241m.\u001b[39mTimeout \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m|\u001b[39m NotGiven \u001b[38;5;241m=\u001b[39m NOT_GIVEN,\n\u001b[1;32m     86\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m SyncPage[Model]:\n\u001b[1;32m     87\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m     88\u001b[0m \u001b[38;5;124;03m    Lists the currently available models, and provides basic information about each\u001b[39;00m\n\u001b[1;32m     89\u001b[0m \u001b[38;5;124;03m    one such as the owner and availability.\u001b[39;00m\n\u001b[1;32m     90\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[0;32m---> 91\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_get_api_list\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     92\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m/models\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m     93\u001b[0m \u001b[43m        \u001b[49m\u001b[43mpage\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mSyncPage\u001b[49m\u001b[43m[\u001b[49m\u001b[43mModel\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     94\u001b[0m \u001b[43m        \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmake_request_options\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     95\u001b[0m \u001b[43m            \u001b[49m\u001b[43mextra_headers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_headers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mextra_query\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_query\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mextra_body\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_body\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtimeout\u001b[49m\n\u001b[1;32m     96\u001b[0m \u001b[43m        \u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     97\u001b[0m \u001b[43m        \u001b[49m\u001b[43mmodel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mModel\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     98\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:1317\u001b[0m, in \u001b[0;36mSyncAPIClient.get_api_list\u001b[0;34m(self, path, model, page, body, options, method)\u001b[0m\n\u001b[1;32m   1306\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mget_api_list\u001b[39m(\n\u001b[1;32m   1307\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m   1308\u001b[0m     path: \u001b[38;5;28mstr\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1314\u001b[0m     method: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mget\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m   1315\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m SyncPageT:\n\u001b[1;32m   1316\u001b[0m     opts \u001b[38;5;241m=\u001b[39m FinalRequestOptions\u001b[38;5;241m.\u001b[39mconstruct(method\u001b[38;5;241m=\u001b[39mmethod, url\u001b[38;5;241m=\u001b[39mpath, json_data\u001b[38;5;241m=\u001b[39mbody, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39moptions)\n\u001b[0;32m-> 1317\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request_api_list\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpage\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mopts\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:1168\u001b[0m, in \u001b[0;36mSyncAPIClient._request_api_list\u001b[0;34m(self, model, page, options)\u001b[0m\n\u001b[1;32m   1164\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m resp\n\u001b[1;32m   1166\u001b[0m options\u001b[38;5;241m.\u001b[39mpost_parser \u001b[38;5;241m=\u001b[39m _parser\n\u001b[0;32m-> 1168\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpage\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:945\u001b[0m, in \u001b[0;36mSyncAPIClient.request\u001b[0;34m(self, cast_to, options, remaining_retries, stream, stream_cls)\u001b[0m\n\u001b[1;32m    942\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    943\u001b[0m     retries_taken \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m\n\u001b[0;32m--> 945\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    946\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    947\u001b[0m \u001b[43m    \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    948\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    949\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    950\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    951\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:1005\u001b[0m, in \u001b[0;36mSyncAPIClient._request\u001b[0;34m(self, cast_to, options, retries_taken, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1002\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mEncountered Exception\u001b[39m\u001b[38;5;124m\"\u001b[39m, exc_info\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m   1004\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m remaining_retries \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[0;32m-> 1005\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_retry_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1006\u001b[0m \u001b[43m        \u001b[49m\u001b[43minput_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1007\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1008\u001b[0m \u001b[43m        \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1009\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1010\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1011\u001b[0m \u001b[43m        \u001b[49m\u001b[43mresponse_headers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m   1012\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1014\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mRaising connection error\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m   1015\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m APIConnectionError(request\u001b[38;5;241m=\u001b[39mrequest) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:1083\u001b[0m, in \u001b[0;36mSyncAPIClient._retry_request\u001b[0;34m(self, options, cast_to, retries_taken, response_headers, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1079\u001b[0m \u001b[38;5;66;03m# In a synchronous context we are blocking the entire thread. Up to the library user to run the client in a\u001b[39;00m\n\u001b[1;32m   1080\u001b[0m \u001b[38;5;66;03m# different thread if necessary.\u001b[39;00m\n\u001b[1;32m   1081\u001b[0m time\u001b[38;5;241m.\u001b[39msleep(timeout)\n\u001b[0;32m-> 1083\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1084\u001b[0m \u001b[43m    \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1085\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1086\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1087\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1088\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1089\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:1005\u001b[0m, in \u001b[0;36mSyncAPIClient._request\u001b[0;34m(self, cast_to, options, retries_taken, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1002\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mEncountered Exception\u001b[39m\u001b[38;5;124m\"\u001b[39m, exc_info\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m   1004\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m remaining_retries \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[0;32m-> 1005\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_retry_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1006\u001b[0m \u001b[43m        \u001b[49m\u001b[43minput_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1007\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1008\u001b[0m \u001b[43m        \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1009\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1010\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1011\u001b[0m \u001b[43m        \u001b[49m\u001b[43mresponse_headers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m   1012\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1014\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mRaising connection error\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m   1015\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m APIConnectionError(request\u001b[38;5;241m=\u001b[39mrequest) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:1083\u001b[0m, in \u001b[0;36mSyncAPIClient._retry_request\u001b[0;34m(self, options, cast_to, retries_taken, response_headers, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1079\u001b[0m \u001b[38;5;66;03m# In a synchronous context we are blocking the entire thread. Up to the library user to run the client in a\u001b[39;00m\n\u001b[1;32m   1080\u001b[0m \u001b[38;5;66;03m# different thread if necessary.\u001b[39;00m\n\u001b[1;32m   1081\u001b[0m time\u001b[38;5;241m.\u001b[39msleep(timeout)\n\u001b[0;32m-> 1083\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1084\u001b[0m \u001b[43m    \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1085\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1086\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1087\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1088\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1089\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/openai/_base_client.py:1015\u001b[0m, in \u001b[0;36mSyncAPIClient._request\u001b[0;34m(self, cast_to, options, retries_taken, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1005\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_retry_request(\n\u001b[1;32m   1006\u001b[0m             input_options,\n\u001b[1;32m   1007\u001b[0m             cast_to,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1011\u001b[0m             response_headers\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   1012\u001b[0m         )\n\u001b[1;32m   1014\u001b[0m     log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mRaising connection error\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m-> 1015\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m APIConnectionError(request\u001b[38;5;241m=\u001b[39mrequest) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n\u001b[1;32m   1017\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\n\u001b[1;32m   1018\u001b[0m     \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mHTTP Response: \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m%i\u001b[39;00m\u001b[38;5;124m \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m   1019\u001b[0m     request\u001b[38;5;241m.\u001b[39mmethod,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1023\u001b[0m     response\u001b[38;5;241m.\u001b[39mheaders,\n\u001b[1;32m   1024\u001b[0m )\n\u001b[1;32m   1025\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrequest_id: \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m\"\u001b[39m, response\u001b[38;5;241m.\u001b[39mheaders\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx-request-id\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n",
+      "\u001b[0;31mAPIConnectionError\u001b[0m: Connection error."
+     ]
+    },
+    {
+     "ename": "",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
+      "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
+      "\u001b[1;31mClick <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. \n",
+      "\u001b[1;31mView Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details."
+     ]
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:23333/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": [\n",
+    "                {\n",
+    "                    \"type\": \"text\",\n",
+    "                    \"text\": \"Describe the image please\",\n",
+    "                },\n",
+    "                {\n",
+    "                    \"type\": \"image_url\",\n",
+    "                    \"image_url\": {\n",
+    "                        \"url\": \"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg\",\n",
+    "                    },\n",
+    "                },\n",
+    "            ],\n",
+    "        }\n",
+    "    ],\n",
+    "    temperature=0.8,\n",
+    "    top_p=0.8,\n",
+    ")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "545bbd85",
+   "metadata": {},
+   "source": [
+    "## Inference with Command line Interface\n",
+    "\n",
+    "LMDeploy offers a very convenient CLI tool for users to chat with the LLM model locally. For example:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy chat internlm/internlm2_5-7b-chat --backend turbomind\n",
+    "```\n",
+    "\n",
+    "It is designed to assist users in checking and verifying whether LMDeploy supports their model, whether the chat template is applied correctly, and whether the inference results are delivered smoothly.\n",
+    "\n",
+    "Another tool, `lmdeploy check_env`, aims to gather the essential environment information. It is crucial when reporting an issue to us, as it helps us diagnose and resolve the problem more effectively.\n",
+    "\n",
+    "If you have any doubt about their usage, you can try using the `--help` option to obtain detailed information."
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  },
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/get_started_vl.md ADDED Viewed

	@@ -0,0 +1,204 @@

+# Quick Start
+This tutorial shows the usage of LMDeploy on CUDA platform:
+- Offline inference of LLM model and VLM model
+- Serve a LLM or VLM model by the OpenAI compatible server
+- Console CLI to interactively chat with LLM model
+Before reading further, please ensure that you have installed lmdeploy as outlined in the [installation guide](installation.md)
+## Offline batch inference
+### LLM inference
+```python
+from lmdeploy import pipeline
+pipe = pipeline('internlm/internlm2_5-7b-chat')
+response = pipe(['Hi, pls intro yourself', 'Shanghai is'])
+print(response)
+```
+When constructing the `pipeline`, if an inference engine is not designated between the TurboMind Engine and the PyTorch Engine, LMDeploy will automatically assign one based on [their respective capabilities](../supported_models/supported_models.md), with the TurboMind Engine taking precedence by default.
+However, you have the option to manually select an engine. For instance,
+```python
+from lmdeploy import pipeline, TurbomindEngineConfig
+pipe = pipeline('internlm/internlm2_5-7b-chat',
+                backend_config=TurbomindEngineConfig(
+                    max_batch_size=32,
+                    enable_prefix_caching=True,
+                    cache_max_entry_count=0.8,
+                    session_len=8192,
+                ))
+```
+or,
+```python
+from lmdeploy import pipeline, PytorchEngineConfig
+pipe = pipeline('internlm/internlm2_5-7b-chat',
+                backend_config=PytorchEngineConfig(
+                    max_batch_size=32,
+                    enable_prefix_caching=True,
+                    cache_max_entry_count=0.8,
+                    session_len=8192,
+                ))
+```
+```{note}
+The parameter "cache_max_entry_count" significantly influences the GPU memory usage.
+It means the proportion of FREE GPU memory occupied by the K/V cache after the model weights are loaded.
+The default value is 0.8. The K/V cache memory is allocated once and reused repeatedly, which is why it is observed that the built pipeline and the "api_server" mentioned later in the next consumes a substantial amount of GPU memory.
+If you encounter an Out-of-Memory(OOM) error, you may need to consider lowering the value of "cache_max_entry_count".
+```
+When use the callable `pipe()` to perform token generation with given prompts, you can set the sampling parameters via `GenerationConfig` as below:
+```python
+from lmdeploy import GenerationConfig, pipeline
+pipe = pipeline('internlm/internlm2_5-7b-chat')
+prompts = ['Hi, pls intro yourself', 'Shanghai is']
+response = pipe(prompts,
+                gen_config=GenerationConfig(
+                    max_new_tokens=1024,
+                    top_p=0.8,
+                    top_k=40,
+                    temperature=0.6
+                ))
+```
+In the `GenerationConfig`, `top_k=1` or `temperature=0.0` indicates greedy search.
+For more information about pipeline, please read the [detailed tutorial](../llm/pipeline.md)
+### VLM inference
+The usage of VLM inference pipeline is akin to that of LLMs, with the additional capability of processing image data with the pipeline.
+For example, you can utilize the following code snippet to perform the inference with an InternVL model:
+```python
+from lmdeploy import pipeline
+from lmdeploy.vl import load_image
+pipe = pipeline('OpenGVLab/InternVL2-8B')
+image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
+response = pipe(('describe this image', image))
+print(response)
+```
+In VLM pipeline, the default image processing batch size is 1. This can be adjusted by `VisionConfig`. For instance, you might set it like this:
+```python
+from lmdeploy import pipeline, VisionConfig
+from lmdeploy.vl import load_image
+pipe = pipeline('OpenGVLab/InternVL2-8B',
+                vision_config=VisionConfig(
+                    max_batch_size=8
+                ))
+image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
+response = pipe(('describe this image', image))
+print(response)
+```
+However, the larger the image batch size, the greater risk of an OOM error, because the LLM component within the VLM model pre-allocates a massive amount of memory in advance.
+We encourage you to manually choose between the TurboMind Engine and the PyTorch Engine based on their respective capabilities, as detailed in [the supported-models matrix](../supported_models/supported_models.md).
+Additionally, follow the instructions in [LLM Inference](#llm-inference) section to reduce the values of memory-related parameters
+## Serving
+As demonstrated in the previous [offline batch inference](#offline-batch-inference) section, this part presents the respective serving methods for LLMs and VLMs.
+### Serve a LLM model
+```shell
+lmdeploy serve api_server internlm/internlm2_5-7b-chat
+```
+This command will launch an OpenAI-compatible server on the localhost at port `23333`. You can specify a different server port by using the `--server-port` option.
+For more options, consult the help documentation by running `lmdeploy serve api_server --help`. Most of these options align with the engine configuration.
+To access the service, you can utilize the official OpenAI Python package `pip install openai`. Below is an example demonstrating how to use the entrypoint `v1/chat/completions`
+```python
+from openai import OpenAI
+client = OpenAI(
+    api_key='YOUR_API_KEY',
+    base_url="http://0.0.0.0:23333/v1"
+)
+model_name = client.models.list().data[0].id
+response = client.chat.completions.create(
+  model=model_name,
+  messages=[
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": " provide three suggestions about time management"},
+  ],
+    temperature=0.8,
+    top_p=0.8
+)
+print(response)
+```
+We encourage you to refer to the detailed guide for more comprehensive information about [serving with Docker](../llm/api_server.md), [function calls](../llm/api_server_tools.md) and other topics
+### Serve a VLM model
+```shell
+lmdeploy serve api_server OpenGVLab/InternVL2-8B
+```
+```{note}
+LMDeploy reuses the vision component from upstream VLM repositories. Each upstream VLM model may have different dependencies.
+Consequently, LMDeploy has decided not to include the dependencies of the upstream VLM repositories in its own dependency list.
+If you encounter an "ImportError" when using LMDeploy for inference with VLM models, please install the relevant dependencies yourself.
+```
+After the service is launched successfully, you can access the VLM service in a manner similar to how you would access the `gptv4` service by modifying the `api_key` and `base_url` parameters:
+```python
+from openai import OpenAI
+client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1')
+model_name = client.models.list().data[0].id
+response = client.chat.completions.create(
+    model=model_name,
+    messages=[{
+        'role':
+        'user',
+        'content': [{
+            'type': 'text',
+            'text': 'Describe the image please',
+        }, {
+            'type': 'image_url',
+            'image_url': {
+                'url':
+                'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg',
+            },
+        }],
+    }],
+    temperature=0.8,
+    top_p=0.8)
+print(response)
+```
+## Inference with Command line Interface
+LMDeploy offers a very convenient CLI tool for users to chat with the LLM model locally. For example:
+```shell
+lmdeploy chat internlm/internlm2_5-7b-chat --backend turbomind
+```
+It is designed to assist users in checking and verifying whether LMDeploy supports their model, whether the chat template is applied correctly, and whether the inference results are delivered smoothly.
+Another tool, `lmdeploy check_env`, aims to gather the essential environment information. It is crucial when reporting an issue to us, as it helps us diagnose and resolve the problem more effectively.
+If you have any doubt about their usage, you can try using the `--help` option to obtain detailed information.

a_mllm_notebooks/lmdeploy/internvl_25.ipynb ADDED Viewed

	@@ -0,0 +1,355 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/dscilab_dungvo/workspace/huggingface_cache\n",
+      "models--AIDC-AI--Ovis1.6-Gemma2-27B\n",
+      "models--FoundationVision--groma-7b-pretrain\n",
+      "models--MBZUAI--GLaMM-FullScope\n",
+      "models--OpenGVLab--InternVL2_5-26B-AWQ\n",
+      "models--OpenGVLab--InternVL2_5-38B-AWQ\n",
+      "models--OpenGVLab--InternVL2_5-78B-AWQ\n",
+      "models--Qwen--Qwen2-VL-2B-Instruct\n",
+      "models--Qwen--Qwen2-VL-72B-Instruct-AWQ\n",
+      "models--Qwen--Qwen2-VL-7B-Instruct\n",
+      "models--Qwen--Qwen2.5-7B-Instruct\n",
+      "models--meta-llama--Llama-3.2-90B-Vision-Instruct\n",
+      "models--opengvlab--internvl2_5-26B-AWQ\n",
+      "models--opengvlab--internvl2_5-38B-AWQ\n",
+      "models--vinai--phobert-base-v2\n",
+      "version.txt\n"
+     ]
+    }
+   ],
+   "source": [
+    "!echo $HF_HOME\n",
+    "!ls $HF_HOME/hub"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The supported chat template names are:\n",
+      "baichuan2\n",
+      "base\n",
+      "chatglm\n",
+      "chatglm3\n",
+      "codegeex4\n",
+      "codellama\n",
+      "cogvlm\n",
+      "cogvlm2\n",
+      "dbrx\n",
+      "deepseek\n",
+      "deepseek-coder\n",
+      "deepseek-vl\n",
+      "falcon\n",
+      "gemma\n",
+      "glm4\n",
+      "internlm\n",
+      "internlm-xcomposer2\n",
+      "internlm-xcomposer2d5\n",
+      "internlm2\n",
+      "internvl-internlm2\n",
+      "internvl-phi3\n",
+      "internvl-zh\n",
+      "internvl-zh-hermes2\n",
+      "internvl2-internlm2\n",
+      "internvl2-phi3\n",
+      "internvl2_5\n",
+      "llama\n",
+      "llama2\n",
+      "llama3\n",
+      "llama3_1\n",
+      "llama3_2\n",
+      "llava-chatml\n",
+      "llava-v1\n",
+      "mini-gemini-vicuna\n",
+      "minicpm3\n",
+      "minicpmv-2d6\n",
+      "mistral\n",
+      "mixtral\n",
+      "molmo\n",
+      "phi-3\n",
+      "puyu\n",
+      "qwen\n",
+      "qwen2d5\n",
+      "solar\n",
+      "ultracm\n",
+      "ultralm\n",
+      "vicuna\n",
+      "wizardlm\n",
+      "yi\n",
+      "yi-vl\n"
+     ]
+    }
+   ],
+   "source": [
+    "!lmdeploy list"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "usage: lmdeploy lite [-h] {auto_awq,auto_gptq,calibrate,smooth_quant} ...\n",
+      "\n",
+      "Compressing and accelerating LLMs with lmdeploy.lite module\n",
+      "\n",
+      "optional arguments:\n",
+      "  -h, --help            show this help message and exit\n",
+      "\n",
+      "Commands:\n",
+      "  This group has the following commands:\n",
+      "\n",
+      "  {auto_awq,auto_gptq,calibrate,smooth_quant}\n",
+      "    auto_awq            Perform weight quantization using AWQ algorithm.\n",
+      "    auto_gptq           Perform weight quantization using GPTQ algorithm.\n",
+      "    calibrate           Perform calibration on a given dataset.\n",
+      "    smooth_quant        Perform w8a8 quantization using SmoothQuant.\n"
+     ]
+    }
+   ],
+   "source": [
+    "!lmdeploy lite --help"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "models--opengvlab--internvl2_5-26B-AWQ\n",
+      "/dscilab_dungvo/workspace/huggingface_cache/hub/models--opengvlab--internvl2_5-26B-AWQ\n"
+     ]
+    }
+   ],
+   "source": [
+    "model_name = \"OpenGVLab/InternVL2_5-26B-AWQ\"\n",
+    "\n",
+    "def convertname2path(name):\n",
+    "    name = \"models/\" + name\n",
+    "    name = name.lower()\n",
+    "    name = name.replace(\"b-\", \"B-\")\n",
+    "    name = name.replace(\"-awq\", \"-AWQ\")\n",
+    "    name = name.replace(\"/\", \"--\")\n",
+    "    import os\n",
+    "    HF_HOME = os.environ.get(\"HF_HOME\")\n",
+    "    print(name)\n",
+    "    return f\"{HF_HOME}/hub/{name}\"\n",
+    "\n",
+    "model_path = convertname2path(model_name)\n",
+    "print(model_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "models--opengvlab--internvl2_5-26B-AWQ\n"
+     ]
+    },
+    {
+     "ename": "RuntimeError",
+     "evalue": "Could not find model architecture from config: {'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': None, 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': None, 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': None, 'pad_token_id': None, 'eos_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '', '_attn_implementation_autoset': False, 'transformers_version': '4.46.3', 'model_type': ''}",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[9], line 6\u001b[0m\n\u001b[1;32m      3\u001b[0m     model_path \u001b[38;5;241m=\u001b[39m convertname2path(model_name)\n\u001b[1;32m      4\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m pipeline(model_path)\n\u001b[0;32m----> 6\u001b[0m pipe \u001b[38;5;241m=\u001b[39m \u001b[43mget_pipe\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_name\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m      7\u001b[0m \u001b[38;5;66;03m# response = pipe(['Hi, pls intro yourself', 'Shanghai is'])\u001b[39;00m\n\u001b[1;32m      8\u001b[0m \u001b[38;5;66;03m# print(response)\u001b[39;00m\n",
+      "Cell \u001b[0;32mIn[9], line 4\u001b[0m, in \u001b[0;36mget_pipe\u001b[0;34m(model_name)\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mget_pipe\u001b[39m(model_name):\n\u001b[1;32m      3\u001b[0m     model_path \u001b[38;5;241m=\u001b[39m convertname2path(model_name)\n\u001b[0;32m----> 4\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mpipeline\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_path\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/api.py:72\u001b[0m, in \u001b[0;36mpipeline\u001b[0;34m(model_path, backend_config, chat_template_config, log_level, max_log_len, **kwargs)\u001b[0m\n\u001b[1;32m     68\u001b[0m     revision \u001b[38;5;241m=\u001b[39m backend_config\u001b[38;5;241m.\u001b[39mrevision \\\n\u001b[1;32m     69\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m backend_config \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m     70\u001b[0m     model_path \u001b[38;5;241m=\u001b[39m get_model(model_path, download_dir, revision)\n\u001b[0;32m---> 72\u001b[0m task, pipeline_class \u001b[38;5;241m=\u001b[39m \u001b[43mget_task\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     73\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m task \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvlm\u001b[39m\u001b[38;5;124m'\u001b[39m:\n\u001b[1;32m     74\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m backend_config \u001b[38;5;129;01mand\u001b[39;00m backend_config\u001b[38;5;241m.\u001b[39menable_prefix_caching:\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/archs.py:145\u001b[0m, in \u001b[0;36mget_task\u001b[0;34m(model_path)\u001b[0m\n\u001b[1;32m    142\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mexists(os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(model_path, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtriton_models\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mweights\u001b[39m\u001b[38;5;124m'\u001b[39m)):\n\u001b[1;32m    143\u001b[0m     \u001b[38;5;66;03m# workspace model\u001b[39;00m\n\u001b[1;32m    144\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mllm\u001b[39m\u001b[38;5;124m'\u001b[39m, AsyncEngine\n\u001b[0;32m--> 145\u001b[0m _, config \u001b[38;5;241m=\u001b[39m \u001b[43mget_model_arch\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    146\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m check_vl_llm(config\u001b[38;5;241m.\u001b[39mto_dict()):\n\u001b[1;32m    147\u001b[0m     \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlmdeploy\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mserve\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mvl_async_engine\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m VLAsyncEngine\n",
+      "File \u001b[0;32m/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/archs.py:193\u001b[0m, in \u001b[0;36mget_model_arch\u001b[0;34m(model_path)\u001b[0m\n\u001b[1;32m    191\u001b[0m     arch \u001b[38;5;241m=\u001b[39m _cfg[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mauto_map\u001b[39m\u001b[38;5;124m'\u001b[39m][\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mAutoModelForCausalLM\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;241m.\u001b[39msplit(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m'\u001b[39m)[\u001b[38;5;241m-\u001b[39m\u001b[38;5;241m1\u001b[39m]\n\u001b[1;32m    192\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m--> 193\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(\n\u001b[1;32m    194\u001b[0m         \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mCould not find model architecture from config: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m_cfg\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m    195\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m arch, cfg\n",
+      "\u001b[0;31mRuntimeError\u001b[0m: Could not find model architecture from config: {'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': None, 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': None, 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': None, 'pad_token_id': None, 'eos_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '', '_attn_implementation_autoset': False, 'transformers_version': '4.46.3', 'model_type': ''}"
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy import pipeline\n",
+    "def get_pipe(model_name):\n",
+    "    model_path = convertname2path(model_name)\n",
+    "    return pipeline(model_path)\n",
+    "\n",
+    "pipe = get_pipe(model_name)\n",
+    "# response = pipe(['Hi, pls intro yourself', 'Shanghai is'])\n",
+    "# print(response)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[0;31mSignature:\u001b[0m\n",
+      "\u001b[0mpipeline\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmodel_path\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mbackend_config\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mlmdeploy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmessages\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTurbomindEngineConfig\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlmdeploy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmessages\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mPytorchEngineConfig\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mNoneType\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mchat_template_config\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mlmdeploy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mChatTemplateConfig\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mNoneType\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mlog_level\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'WARNING'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmax_log_len\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mSource:\u001b[0m   \n",
+      "\u001b[0;32mdef\u001b[0m \u001b[0mpipeline\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel_path\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m             \u001b[0mbackend_config\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOptional\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mTurbomindEngineConfig\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m                                            \u001b[0mPytorchEngineConfig\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m             \u001b[0mchat_template_config\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOptional\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mChatTemplateConfig\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m             \u001b[0mlog_level\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'WARNING'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m             \u001b[0mmax_log_len\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m             \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;34m\"\"\"\u001b[0m\n",
+      "\u001b[0;34m    Args:\u001b[0m\n",
+      "\u001b[0;34m        model_path (str): the path of a model.\u001b[0m\n",
+      "\u001b[0;34m            It could be one of the following options:\u001b[0m\n",
+      "\u001b[0;34m                - i) A local directory path of a turbomind model which is\u001b[0m\n",
+      "\u001b[0;34m                    converted by `lmdeploy convert` command or download from\u001b[0m\n",
+      "\u001b[0;34m                    ii) and iii).\u001b[0m\n",
+      "\u001b[0;34m                - ii) The model_id of a lmdeploy-quantized model hosted\u001b[0m\n",
+      "\u001b[0;34m                    inside a model repo on huggingface.co, such as\u001b[0m\n",
+      "\u001b[0;34m                    \"InternLM/internlm-chat-20b-4bit\",\u001b[0m\n",
+      "\u001b[0;34m                    \"lmdeploy/llama2-chat-70b-4bit\", etc.\u001b[0m\n",
+      "\u001b[0;34m                - iii) The model_id of a model hosted inside a model repo\u001b[0m\n",
+      "\u001b[0;34m                    on huggingface.co, such as \"internlm/internlm-chat-7b\",\u001b[0m\n",
+      "\u001b[0;34m                    \"Qwen/Qwen-7B-Chat \", \"baichuan-inc/Baichuan2-7B-Chat\"\u001b[0m\n",
+      "\u001b[0;34m                    and so on.\u001b[0m\n",
+      "\u001b[0;34m        backend_config (TurbomindEngineConfig | PytorchEngineConfig): backend\u001b[0m\n",
+      "\u001b[0;34m            config instance. Default to None.\u001b[0m\n",
+      "\u001b[0;34m        chat_template_config (ChatTemplateConfig): chat template configuration.\u001b[0m\n",
+      "\u001b[0;34m            Default to None.\u001b[0m\n",
+      "\u001b[0;34m        log_level(str): set log level whose value among [CRITICAL, ERROR,\u001b[0m\n",
+      "\u001b[0;34m            WARNING, INFO, DEBUG]\u001b[0m\n",
+      "\u001b[0;34m        max_log_len(int): Max number of prompt characters or prompt tokens\u001b[0m\n",
+      "\u001b[0;34m            being printed in log\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m    Examples:\u001b[0m\n",
+      "\u001b[0;34m        >>> # LLM\u001b[0m\n",
+      "\u001b[0;34m        >>> import lmdeploy\u001b[0m\n",
+      "\u001b[0;34m        >>> pipe = lmdeploy.pipeline('internlm/internlm-chat-7b')\u001b[0m\n",
+      "\u001b[0;34m        >>> response = pipe(['hi','say this is a test'])\u001b[0m\n",
+      "\u001b[0;34m        >>> print(response)\u001b[0m\n",
+      "\u001b[0;34m        >>>\u001b[0m\n",
+      "\u001b[0;34m        >>> # VLM\u001b[0m\n",
+      "\u001b[0;34m        >>> from lmdeploy.vl import load_image\u001b[0m\n",
+      "\u001b[0;34m        >>> from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig\u001b[0m\n",
+      "\u001b[0;34m        >>> pipe = pipeline('liuhaotian/llava-v1.5-7b',\u001b[0m\n",
+      "\u001b[0;34m        ...                 backend_config=TurbomindEngineConfig(session_len=8192),\u001b[0m\n",
+      "\u001b[0;34m        ...                 chat_template_config=ChatTemplateConfig(model_name='vicuna'))\u001b[0m\n",
+      "\u001b[0;34m        >>> im = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')\u001b[0m\n",
+      "\u001b[0;34m        >>> response = pipe([('describe this image', [im])])\u001b[0m\n",
+      "\u001b[0;34m        >>> print(response)\u001b[0m\n",
+      "\u001b[0;34m    \"\"\"\u001b[0m \u001b[0;31m# noqa E501\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;32mif\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetenv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'TM_LOG_LEVEL'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0menviron\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'TM_LOG_LEVEL'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlog_level\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;32mfrom\u001b[0m \u001b[0mlmdeploy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mutils\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mget_logger\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mget_model\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mlogger\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_logger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'lmdeploy'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msetLevel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlog_level\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;31m# model_path is not local path.\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel_path\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0mdownload_dir\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mbackend_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdownload_dir\u001b[0m \\\n",
+      "            \u001b[0;32mif\u001b[0m \u001b[0mbackend_config\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0mrevision\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mbackend_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrevision\u001b[0m \\\n",
+      "            \u001b[0;32mif\u001b[0m \u001b[0mbackend_config\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0mmodel_path\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_model\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel_path\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdownload_dir\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrevision\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mtask\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpipeline_class\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_task\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel_path\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;32mif\u001b[0m \u001b[0mtask\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'vlm'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;32mif\u001b[0m \u001b[0mbackend_config\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mbackend_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0menable_prefix_caching\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m            \u001b[0mbackend_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0menable_prefix_caching\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m            \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwarning\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'VLM does not support prefix caching.'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;32mif\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbackend_config\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mPytorchEngineConfig\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;31m# set auto backend mode\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0mbackend_config\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mautoget_backend_config\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel_path\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbackend_config\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mbackend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'pytorch'\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0mbackend_config\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mPytorchEngineConfig\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;34m'turbomind'\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf'Using {backend} engine'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;32mreturn\u001b[0m \u001b[0mpipeline_class\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel_path\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m                          \u001b[0mbackend\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mbackend\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m                          \u001b[0mbackend_config\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mbackend_config\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m                          \u001b[0mchat_template_config\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchat_template_config\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m                          \u001b[0mmax_log_len\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmax_log_len\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m                          \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mFile:\u001b[0m      /dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/api.py\n",
+      "\u001b[0;31mType:\u001b[0m      function"
+     ]
+    }
+   ],
+   "source": [
+    "pipeline??"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_mllm_notebooks/lmdeploy/kv_quant.ipynb ADDED Viewed

	@@ -0,0 +1,114 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7fece453",
+   "metadata": {},
+   "source": [
+    "# INT4/INT8 KV Cache\n",
+    "\n",
+    "Since v0.4.0, LMDeploy has supported **online** key-value (kv) cache quantization with int4 and int8 numerical precision, utilizing an asymmetric quantization method that is applied on a per-head, per-token basis. The original kv offline quantization method has been removed.\n",
+    "\n",
+    "Intuitively, quantization is beneficial for increasing the number of kv block. Compared to fp16, the number of kv block for int4/int8 kv can be increased by 4 times and 2 times respectively. This means that under the same memory conditions, the system can support a significantly increased number of concurrent operations after kv quantization, thereby ultimately enhancing throughput.\n",
+    "\n",
+    "However, quantization typically brings in some loss of model accuracy. We have used OpenCompass to evaluate the accuracy of several models after applying int4/int8 quantization. int8 kv keeps the accuracy while int4 kv has slight loss. The detailed results are presented in the [Evaluation](#evaluation) section. You can refer to the information and choose wisely based on your requirements.\n",
+    "\n",
+    "LMDeploy inference with quantized kv supports the following NVIDIA GPU models:\n",
+    "\n",
+    "- Volta architecture (sm70): V100\n",
+    "- Turing architecture (sm75): 20 series, T4\n",
+    "- Ampere architecture (sm80, sm86): 30 series, A10, A16, A30, A100\n",
+    "- Ada Lovelace architecture (sm89): 40 series\n",
+    "- Hopper architecture (sm90): H100, H200\n",
+    "\n",
+    "In summary, LMDeploy kv quantization has the following advantages:\n",
+    "\n",
+    "1. data-free online quantization\n",
+    "2. Supports all nvidia GPU models with Volta architecture (sm70) and above\n",
+    "3. KV int8 quantization has almost lossless accuracy, and KV int4 quantization accuracy is within an acceptable range\n",
+    "4. Efficient inference, with int8/int4 kv quantization applied to llama2-7b, RPS is improved by round 30% and 40% respectively compared to fp16\n",
+    "\n",
+    "In the next section, we will take `internlm2-chat-7b` model as an example, introducing the usage of kv quantization and inference of lmdeploy. But before that, please ensure that lmdeploy is installed.\n",
+    "\n",
+    "```shell\n",
+    "pip install lmdeploy\n",
+    "```\n",
+    "\n",
+    "## Usage\n",
+    "\n",
+    "Applying kv quantization and inference via LMDeploy is quite straightforward. Simply set the `quant_policy` parameter.\n",
+    "\n",
+    "**LMDeploy specifies that `quant_policy=4` stands for 4-bit kv, whereas `quant_policy=8` indicates 8-bit kv.**\n",
+    "\n",
+    "### Offline inference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fae395aa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, TurbomindEngineConfig\n",
+    "engine_config = TurbomindEngineConfig(quant_policy=8)\n",
+    "pipe = pipeline(\"internlm/internlm2_5-7b-chat\", backend_config=engine_config)\n",
+    "response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"])\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1e29acd",
+   "metadata": {},
+   "source": [
+    "### Serving\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve api_server internlm/internlm2_5-7b-chat --quant-policy 8\n",
+    "```\n",
+    "\n",
+    "## Evaluation\n",
+    "\n",
+    "We apply kv quantization of LMDeploy to several LLM models and utilize OpenCompass to evaluate the inference accuracy. The results are shown in the table below:\n",
+    "\n",
+    "| -           | -       | -             | llama2-7b-chat | -       | -       | internlm2-chat-7b | -       | -       | internlm2.5-chat-7b | -       | -       | qwen1.5-7b-chat | -       | -       |\n",
+    "| ----------- | ------- | ------------- | -------------- | ------- | ------- | ----------------- | ------- | ------- | ------------------- | ------- | ------- | --------------- | ------- | ------- |\n",
+    "| dataset     | version | metric        | kv fp16        | kv int8 | kv int4 | kv fp16           | kv int8 | kv int4 | kv fp16             | kv int8 | kv int4 | fp16            | kv int8 | kv int4 |\n",
+    "| ceval       | -       | naive_average | 28.42          | 27.96   | 27.58   | 60.45             | 60.88   | 60.28   | 78.06               | 77.87   | 77.05   | 70.56           | 70.49   | 68.62   |\n",
+    "| mmlu        | -       | naive_average | 35.64          | 35.58   | 34.79   | 63.91             | 64      | 62.36   | 72.30               | 72.27   | 71.17   | 61.48           | 61.56   | 60.65   |\n",
+    "| triviaqa    | 2121ce  | score         | 56.09          | 56.13   | 53.71   | 58.73             | 58.7    | 58.18   | 65.09               | 64.87   | 63.28   | 44.62           | 44.77   | 44.04   |\n",
+    "| gsm8k       | 1d7fe4  | accuracy      | 28.2           | 28.05   | 27.37   | 70.13             | 69.75   | 66.87   | 85.67               | 85.44   | 83.78   | 54.97           | 56.41   | 54.74   |\n",
+    "| race-middle | 9a54b6  | accuracy      | 41.57          | 41.78   | 41.23   | 88.93             | 88.93   | 88.93   | 92.76               | 92.83   | 92.55   | 87.33           | 87.26   | 86.28   |\n",
+    "| race-high   | 9a54b6  | accuracy      | 39.65          | 39.77   | 40.77   | 85.33             | 85.31   | 84.62   | 90.51               | 90.42   | 90.42   | 82.53           | 82.59   | 82.02   |\n",
+    "\n",
+    "For detailed evaluation methods, please refer to [this](../benchmark/evaluate_with_opencompass.md) guide. Remember to pass `quant_policy` to the inference engine in the config file.\n",
+    "\n",
+    "## Performance\n",
+    "\n",
+    "| model             | kv type | test settings                            | RPS   | v.s. kv fp16 |\n",
+    "| ----------------- | ------- | ---------------------------------------- | ----- | ------------ |\n",
+    "| llama2-chat-7b    | fp16    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 14.98 | 1.0          |\n",
+    "| -                 | int8    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 19.01 | 1.27         |\n",
+    "| -                 | int4    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 20.81 | 1.39         |\n",
+    "| llama2-chat-13b   | fp16    | tp1 / ratio 0.9 / bs 128 / prompts 10000 | 8.55  | 1.0          |\n",
+    "| -                 | int8    | tp1 / ratio 0.9 / bs 256 / prompts 10000 | 10.96 | 1.28         |\n",
+    "| -                 | int4    | tp1 / ratio 0.9 / bs 256 / prompts 10000 | 11.91 | 1.39         |\n",
+    "| internlm2-chat-7b | fp16    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 24.13 | 1.0          |\n",
+    "| -                 | int8    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 25.28 | 1.05         |\n",
+    "| -                 | int4    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 25.80 | 1.07         |\n",
+    "\n",
+    "The performance data is obtained by `benchmark/profile_throughput.py`"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/kv_quant.md ADDED Viewed

	@@ -0,0 +1,82 @@

+# INT4/INT8 KV Cache
+Since v0.4.0, LMDeploy has supported **online** key-value (kv) cache quantization with int4 and int8 numerical precision, utilizing an asymmetric quantization method that is applied on a per-head, per-token basis. The original kv offline quantization method has been removed.
+Intuitively, quantization is beneficial for increasing the number of kv block. Compared to fp16, the number of kv block for int4/int8 kv can be increased by 4 times and 2 times respectively. This means that under the same memory conditions, the system can support a significantly increased number of concurrent operations after kv quantization, thereby ultimately enhancing throughput.
+However, quantization typically brings in some loss of model accuracy. We have used OpenCompass to evaluate the accuracy of several models after applying int4/int8 quantization. int8 kv keeps the accuracy while int4 kv has slight loss. The detailed results are presented in the [Evaluation](#evaluation) section. You can refer to the information and choose wisely based on your requirements.
+LMDeploy inference with quantized kv supports the following NVIDIA GPU models:
+- Volta architecture (sm70): V100
+- Turing architecture (sm75): 20 series, T4
+- Ampere architecture (sm80, sm86): 30 series, A10, A16, A30, A100
+- Ada Lovelace architecture (sm89): 40 series
+- Hopper architecture (sm90): H100, H200
+In summary, LMDeploy kv quantization has the following advantages:
+1. data-free online quantization
+2. Supports all nvidia GPU models with Volta architecture (sm70) and above
+3. KV int8 quantization has almost lossless accuracy, and KV int4 quantization accuracy is within an acceptable range
+4. Efficient inference, with int8/int4 kv quantization applied to llama2-7b, RPS is improved by round 30% and 40% respectively compared to fp16
+In the next section, we will take `internlm2-chat-7b` model as an example, introducing the usage of kv quantization and inference of lmdeploy. But before that, please ensure that lmdeploy is installed.
+```shell
+pip install lmdeploy
+```
+## Usage
+Applying kv quantization and inference via LMDeploy is quite straightforward. Simply set the `quant_policy` parameter.
+**LMDeploy specifies that `quant_policy=4` stands for 4-bit kv, whereas `quant_policy=8` indicates 8-bit kv.**
+### Offline inference
+```python
+from lmdeploy import pipeline, TurbomindEngineConfig
+engine_config = TurbomindEngineConfig(quant_policy=8)
+pipe = pipeline("internlm/internlm2_5-7b-chat", backend_config=engine_config)
+response = pipe(["Hi, pls intro yourself", "Shanghai is"])
+print(response)
+```
+### Serving
+```shell
+lmdeploy serve api_server internlm/internlm2_5-7b-chat --quant-policy 8
+```
+## Evaluation
+We apply kv quantization of LMDeploy to several LLM models and utilize OpenCompass to evaluate the inference accuracy. The results are shown in the table below:
+| -           | -       | -             | llama2-7b-chat | -       | -       | internlm2-chat-7b | -       | -       | internlm2.5-chat-7b | -       | -       | qwen1.5-7b-chat | -       | -       |
+| ----------- | ------- | ------------- | -------------- | ------- | ------- | ----------------- | ------- | ------- | ------------------- | ------- | ------- | --------------- | ------- | ------- |
+| dataset     | version | metric        | kv fp16        | kv int8 | kv int4 | kv fp16           | kv int8 | kv int4 | kv fp16             | kv int8 | kv int4 | fp16            | kv int8 | kv int4 |
+| ceval       | -       | naive_average | 28.42          | 27.96   | 27.58   | 60.45             | 60.88   | 60.28   | 78.06               | 77.87   | 77.05   | 70.56           | 70.49   | 68.62   |
+| mmlu        | -       | naive_average | 35.64          | 35.58   | 34.79   | 63.91             | 64      | 62.36   | 72.30               | 72.27   | 71.17   | 61.48           | 61.56   | 60.65   |
+| triviaqa    | 2121ce  | score         | 56.09          | 56.13   | 53.71   | 58.73             | 58.7    | 58.18   | 65.09               | 64.87   | 63.28   | 44.62           | 44.77   | 44.04   |
+| gsm8k       | 1d7fe4  | accuracy      | 28.2           | 28.05   | 27.37   | 70.13             | 69.75   | 66.87   | 85.67               | 85.44   | 83.78   | 54.97           | 56.41   | 54.74   |
+| race-middle | 9a54b6  | accuracy      | 41.57          | 41.78   | 41.23   | 88.93             | 88.93   | 88.93   | 92.76               | 92.83   | 92.55   | 87.33           | 87.26   | 86.28   |
+| race-high   | 9a54b6  | accuracy      | 39.65          | 39.77   | 40.77   | 85.33             | 85.31   | 84.62   | 90.51               | 90.42   | 90.42   | 82.53           | 82.59   | 82.02   |
+For detailed evaluation methods, please refer to [this](../benchmark/evaluate_with_opencompass.md) guide. Remember to pass `quant_policy` to the inference engine in the config file.
+## Performance
+| model             | kv type | test settings                            | RPS   | v.s. kv fp16 |
+| ----------------- | ------- | ---------------------------------------- | ----- | ------------ |
+| llama2-chat-7b    | fp16    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 14.98 | 1.0          |
+| -                 | int8    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 19.01 | 1.27         |
+| -                 | int4    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 20.81 | 1.39         |
+| llama2-chat-13b   | fp16    | tp1 / ratio 0.9 / bs 128 / prompts 10000 | 8.55  | 1.0          |
+| -                 | int8    | tp1 / ratio 0.9 / bs 256 / prompts 10000 | 10.96 | 1.28         |
+| -                 | int4    | tp1 / ratio 0.9 / bs 256 / prompts 10000 | 11.91 | 1.39         |
+| internlm2-chat-7b | fp16    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 24.13 | 1.0          |
+| -                 | int8    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 25.28 | 1.05         |
+| -                 | int4    | tp1 / ratio 0.8 / bs 256 / prompts 10000 | 25.80 | 1.07         |
+The performance data is obtained by `benchmark/profile_throughput.py`

a_mllm_notebooks/lmdeploy/links.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+'https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/kv_quant.md'
+'https://github.com/InternLM/lmdeploy/blob/main/docs/en/advance/pytorch_new_model.md'
+'https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/turbomind.md'
+'https://github.com/InternLM/lmdeploy/blob/main/docs/en/multi_modal/api_server_vl.md'
+'https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/w4a16.md'
+'https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/w8a8.md'
+'https://github.com/InternLM/lmdeploy/blob/main/docs/en/llm/proxy_server.md'
+'https://github.com/InternLM/lmdeploy/blob/main/docs/en/advance/long_context.md'

a_mllm_notebooks/lmdeploy/lmdeploy_deepseek_vl.ipynb ADDED Viewed

	@@ -0,0 +1,665 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU",
+    "widgets": {
+      "application/vnd.jupyter.widget-state+json": {
+        "998fbdaa144d466b8973bda101228f84": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "HBoxModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "HBoxModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "HBoxView",
+            "box_style": "",
+            "children": [
+              "IPY_MODEL_3628d06a3bcb451aa7866b52dd553dc4",
+              "IPY_MODEL_4690b670bfae4dc0b81c08c774bfbd9a",
+              "IPY_MODEL_d2430af0eaa4457491a294e252104c11"
+            ],
+            "layout": "IPY_MODEL_88e413f539ac4bfa95d2954178a8df00"
+          }
+        },
+        "3628d06a3bcb451aa7866b52dd553dc4": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "HTMLModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "HTMLModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "HTMLView",
+            "description": "",
+            "description_tooltip": null,
+            "layout": "IPY_MODEL_79d4ad1a55e64291b67f7a2ed2e82bfc",
+            "placeholder": "",
+            "style": "IPY_MODEL_0d32734804454f3fa1511a8be9facd5b",
+            "value": "Fetching 9 files: 100%"
+          }
+        },
+        "4690b670bfae4dc0b81c08c774bfbd9a": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "FloatProgressModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "FloatProgressModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "ProgressView",
+            "bar_style": "success",
+            "description": "",
+            "description_tooltip": null,
+            "layout": "IPY_MODEL_5250b56c84e34862ac892d395730218f",
+            "max": 9,
+            "min": 0,
+            "orientation": "horizontal",
+            "style": "IPY_MODEL_5bf6228dfb8f4cc1a542590545f68338",
+            "value": 9
+          }
+        },
+        "d2430af0eaa4457491a294e252104c11": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "HTMLModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "HTMLModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "HTMLView",
+            "description": "",
+            "description_tooltip": null,
+            "layout": "IPY_MODEL_f05724a0721f4321ac7f41129682e232",
+            "placeholder": "",
+            "style": "IPY_MODEL_20029df435c44a44a6b1c61552cf8a25",
+            "value": " 9/9 [00:00&lt;00:00, 141.93it/s]"
+          }
+        },
+        "88e413f539ac4bfa95d2954178a8df00": {
+          "model_module": "@jupyter-widgets/base",
+          "model_name": "LayoutModel",
+          "model_module_version": "1.2.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "79d4ad1a55e64291b67f7a2ed2e82bfc": {
+          "model_module": "@jupyter-widgets/base",
+          "model_name": "LayoutModel",
+          "model_module_version": "1.2.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "0d32734804454f3fa1511a8be9facd5b": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "DescriptionStyleModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "DescriptionStyleModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "StyleView",
+            "description_width": ""
+          }
+        },
+        "5250b56c84e34862ac892d395730218f": {
+          "model_module": "@jupyter-widgets/base",
+          "model_name": "LayoutModel",
+          "model_module_version": "1.2.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "5bf6228dfb8f4cc1a542590545f68338": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "ProgressStyleModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "ProgressStyleModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "StyleView",
+            "bar_color": null,
+            "description_width": ""
+          }
+        },
+        "f05724a0721f4321ac7f41129682e232": {
+          "model_module": "@jupyter-widgets/base",
+          "model_name": "LayoutModel",
+          "model_module_version": "1.2.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "20029df435c44a44a6b1c61552cf8a25": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "DescriptionStyleModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "DescriptionStyleModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "StyleView",
+            "description_width": ""
+          }
+        }
+      }
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Install lmdeploy\n",
+        "Below, we will introduce how to use LMDeploy to run the inference of deepseek-ai/deepseek-vl-1.3b-chat model on a T4 GPU."
+      ],
+      "metadata": {
+        "id": "LvQjS_1PHeSh"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "myQIIbTXkXxm",
+        "outputId": "4c6ff0ff-3572-4757-ecdd-a742fda07ff2"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Requirement already satisfied: lmdeploy in /usr/local/lib/python3.10/dist-packages (0.4.0)\n",
+            "Requirement already satisfied: einops in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.7.0)\n",
+            "Requirement already satisfied: fastapi in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.110.2)\n",
+            "Requirement already satisfied: fire in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.6.0)\n",
+            "Requirement already satisfied: mmengine-lite in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.10.4)\n",
+            "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (1.25.2)\n",
+            "Requirement already satisfied: peft<=0.9.0 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.9.0)\n",
+            "Requirement already satisfied: pillow in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (9.4.0)\n",
+            "Requirement already satisfied: protobuf in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (3.20.3)\n",
+            "Requirement already satisfied: pydantic>2.0.0 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (2.7.0)\n",
+            "Requirement already satisfied: pynvml in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (11.5.0)\n",
+            "Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.4.3)\n",
+            "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.1.99)\n",
+            "Requirement already satisfied: shortuuid in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (1.0.13)\n",
+            "Requirement already satisfied: tiktoken in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.6.0)\n",
+            "Requirement already satisfied: torch<=2.2.2,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (2.2.1+cu121)\n",
+            "Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (4.40.0)\n",
+            "Requirement already satisfied: triton<=2.2.0,>=2.1.0 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (2.2.0)\n",
+            "Requirement already satisfied: uvicorn in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.29.0)\n",
+            "Requirement already satisfied: nvidia-nccl-cu12 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (2.19.3)\n",
+            "Requirement already satisfied: nvidia-cuda-runtime-cu12 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (12.1.105)\n",
+            "Requirement already satisfied: nvidia-cublas-cu12 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (12.1.3.1)\n",
+            "Requirement already satisfied: nvidia-curand-cu12 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (10.3.2.106)\n",
+            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from peft<=0.9.0->lmdeploy) (24.0)\n",
+            "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from peft<=0.9.0->lmdeploy) (5.9.5)\n",
+            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from peft<=0.9.0->lmdeploy) (6.0.1)\n",
+            "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from peft<=0.9.0->lmdeploy) (4.66.2)\n",
+            "Requirement already satisfied: accelerate>=0.21.0 in /usr/local/lib/python3.10/dist-packages (from peft<=0.9.0->lmdeploy) (0.29.3)\n",
+            "Requirement already satisfied: huggingface-hub>=0.17.0 in /usr/local/lib/python3.10/dist-packages (from peft<=0.9.0->lmdeploy) (0.20.3)\n",
+            "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>2.0.0->lmdeploy) (0.6.0)\n",
+            "Requirement already satisfied: pydantic-core==2.18.1 in /usr/local/lib/python3.10/dist-packages (from pydantic>2.0.0->lmdeploy) (2.18.1)\n",
+            "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic>2.0.0->lmdeploy) (4.11.0)\n",
+            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (3.13.4)\n",
+            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (1.12)\n",
+            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (3.3)\n",
+            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (3.1.3)\n",
+            "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (2023.6.0)\n",
+            "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (12.1.105)\n",
+            "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (12.1.105)\n",
+            "Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (8.9.2.26)\n",
+            "Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (11.0.2.54)\n",
+            "Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (11.4.5.107)\n",
+            "Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (12.1.0.106)\n",
+            "Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (12.1.105)\n",
+            "Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch<=2.2.2,>=2.0.0->lmdeploy) (12.4.127)\n",
+            "Requirement already satisfied: starlette<0.38.0,>=0.37.2 in /usr/local/lib/python3.10/dist-packages (from fastapi->lmdeploy) (0.37.2)\n",
+            "Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fire->lmdeploy) (1.16.0)\n",
+            "Requirement already satisfied: termcolor in /usr/local/lib/python3.10/dist-packages (from fire->lmdeploy) (2.4.0)\n",
+            "Requirement already satisfied: addict in /usr/local/lib/python3.10/dist-packages (from mmengine-lite->lmdeploy) (2.4.0)\n",
+            "Requirement already satisfied: rich in /usr/local/lib/python3.10/dist-packages (from mmengine-lite->lmdeploy) (13.7.1)\n",
+            "Requirement already satisfied: yapf in /usr/local/lib/python3.10/dist-packages (from mmengine-lite->lmdeploy) (0.40.2)\n",
+            "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken->lmdeploy) (2023.12.25)\n",
+            "Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.10/dist-packages (from tiktoken->lmdeploy) (2.31.0)\n",
+            "Requirement already satisfied: tokenizers<0.20,>=0.19 in /usr/local/lib/python3.10/dist-packages (from transformers->lmdeploy) (0.19.1)\n",
+            "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn->lmdeploy) (8.1.7)\n",
+            "Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.10/dist-packages (from uvicorn->lmdeploy) (0.14.0)\n",
+            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->lmdeploy) (3.3.2)\n",
+            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->lmdeploy) (3.7)\n",
+            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->lmdeploy) (2.0.7)\n",
+            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->lmdeploy) (2024.2.2)\n",
+            "Requirement already satisfied: anyio<5,>=3.4.0 in /usr/local/lib/python3.10/dist-packages (from starlette<0.38.0,>=0.37.2->fastapi->lmdeploy) (3.7.1)\n",
+            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch<=2.2.2,>=2.0.0->lmdeploy) (2.1.5)\n",
+            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich->mmengine-lite->lmdeploy) (3.0.0)\n",
+            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich->mmengine-lite->lmdeploy) (2.16.1)\n",
+            "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch<=2.2.2,>=2.0.0->lmdeploy) (1.3.0)\n",
+            "Requirement already satisfied: importlib-metadata>=6.6.0 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine-lite->lmdeploy) (7.1.0)\n",
+            "Requirement already satisfied: platformdirs>=3.5.1 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine-lite->lmdeploy) (4.2.0)\n",
+            "Requirement already satisfied: tomli>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine-lite->lmdeploy) (2.0.1)\n",
+            "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.4.0->starlette<0.38.0,>=0.37.2->fastapi->lmdeploy) (1.3.1)\n",
+            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.4.0->starlette<0.38.0,>=0.37.2->fastapi->lmdeploy) (1.2.1)\n",
+            "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata>=6.6.0->yapf->mmengine-lite->lmdeploy) (3.18.1)\n",
+            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich->mmengine-lite->lmdeploy) (0.1.2)\n"
+          ]
+        }
+      ],
+      "source": [
+        "!pip install lmdeploy"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Install vl package"
+      ],
+      "metadata": {
+        "id": "YZmonXZI_L3d"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install git+https://github.com/deepseek-ai/DeepSeek-VL.git"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "N_usa-tMlg44",
+        "outputId": "9c4165f0-6b68-4c33-9d80-ccd417380bee"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Collecting git+https://github.com/deepseek-ai/DeepSeek-VL.git\n",
+            "  Cloning https://github.com/deepseek-ai/DeepSeek-VL.git to /tmp/pip-req-build-_b9wee4w\n",
+            "  Running command git clone --filter=blob:none --quiet https://github.com/deepseek-ai/DeepSeek-VL.git /tmp/pip-req-build-_b9wee4w\n",
+            "  Resolved https://github.com/deepseek-ai/DeepSeek-VL.git to commit 37fcec4806394573f3268d9cf0c2f9669aa7993a\n",
+            "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
+            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
+            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
+            "Requirement already satisfied: torch>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from deepseek_vl==1.0.0) (2.2.1+cu121)\n",
+            "Requirement already satisfied: transformers>=4.38.2 in /usr/local/lib/python3.10/dist-packages (from deepseek_vl==1.0.0) (4.40.0)\n",
+            "Requirement already satisfied: timm>=0.9.16 in /usr/local/lib/python3.10/dist-packages (from deepseek_vl==1.0.0) (0.9.16)\n",
+            "Requirement already satisfied: accelerate in /usr/local/lib/python3.10/dist-packages (from deepseek_vl==1.0.0) (0.29.3)\n",
+            "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.10/dist-packages (from deepseek_vl==1.0.0) (0.1.99)\n",
+            "Requirement already satisfied: attrdict in /usr/local/lib/python3.10/dist-packages (from deepseek_vl==1.0.0) (2.0.1)\n",
+            "Requirement already satisfied: einops in /usr/local/lib/python3.10/dist-packages (from deepseek_vl==1.0.0) (0.7.0)\n",
+            "Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (from timm>=0.9.16->deepseek_vl==1.0.0) (0.17.1+cu121)\n",
+            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from timm>=0.9.16->deepseek_vl==1.0.0) (6.0.1)\n",
+            "Requirement already satisfied: huggingface_hub in /usr/local/lib/python3.10/dist-packages (from timm>=0.9.16->deepseek_vl==1.0.0) (0.20.3)\n",
+            "Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from timm>=0.9.16->deepseek_vl==1.0.0) (0.4.3)\n",
+            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (3.13.4)\n",
+            "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (4.11.0)\n",
+            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (1.12)\n",
+            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (3.3)\n",
+            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (3.1.3)\n",
+            "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (2023.6.0)\n",
+            "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (12.1.105)\n",
+            "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (12.1.105)\n",
+            "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (12.1.105)\n",
+            "Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (8.9.2.26)\n",
+            "Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (12.1.3.1)\n",
+            "Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (11.0.2.54)\n",
+            "Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (10.3.2.106)\n",
+            "Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (11.4.5.107)\n",
+            "Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (12.1.0.106)\n",
+            "Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (2.19.3)\n",
+            "Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (12.1.105)\n",
+            "Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.1->deepseek_vl==1.0.0) (2.2.0)\n",
+            "Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch>=2.0.1->deepseek_vl==1.0.0) (12.4.127)\n",
+            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.38.2->deepseek_vl==1.0.0) (1.25.2)\n",
+            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.38.2->deepseek_vl==1.0.0) (24.0)\n",
+            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.38.2->deepseek_vl==1.0.0) (2023.12.25)\n",
+            "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers>=4.38.2->deepseek_vl==1.0.0) (2.31.0)\n",
+            "Requirement already satisfied: tokenizers<0.20,>=0.19 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.38.2->deepseek_vl==1.0.0) (0.19.1)\n",
+            "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.38.2->deepseek_vl==1.0.0) (4.66.2)\n",
+            "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate->deepseek_vl==1.0.0) (5.9.5)\n",
+            "Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from attrdict->deepseek_vl==1.0.0) (1.16.0)\n",
+            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=2.0.1->deepseek_vl==1.0.0) (2.1.5)\n",
+            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.38.2->deepseek_vl==1.0.0) (3.3.2)\n",
+            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.38.2->deepseek_vl==1.0.0) (3.7)\n",
+            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.38.2->deepseek_vl==1.0.0) (2.0.7)\n",
+            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.38.2->deepseek_vl==1.0.0) (2024.2.2)\n",
+            "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=2.0.1->deepseek_vl==1.0.0) (1.3.0)\n",
+            "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.10/dist-packages (from torchvision->timm>=0.9.16->deepseek_vl==1.0.0) (9.4.0)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install nest_asyncio\n",
+        "import nest_asyncio\n",
+        "nest_asyncio.apply()"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "FNWeAUaZn3JB",
+        "outputId": "ec9edb83-981d-47cf-c803-5bfa9b265862"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Requirement already satisfied: nest_asyncio in /usr/local/lib/python3.10/dist-packages (1.6.0)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from lmdeploy import pipeline, TurbomindEngineConfig\n",
+        "from lmdeploy.vl import load_image\n",
+        "\n",
+        "engine_config = TurbomindEngineConfig(cache_max_entry_count=0.3)\n",
+        "pipe = pipeline('deepseek-ai/deepseek-vl-1.3b-chat', backend_config=engine_config)\n",
+        "\n",
+        "image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')\n",
+        "response = pipe(('describe this image', image))\n",
+        "print(response)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 260,
+          "referenced_widgets": [
+            "998fbdaa144d466b8973bda101228f84",
+            "3628d06a3bcb451aa7866b52dd553dc4",
+            "4690b670bfae4dc0b81c08c774bfbd9a",
+            "d2430af0eaa4457491a294e252104c11",
+            "88e413f539ac4bfa95d2954178a8df00",
+            "79d4ad1a55e64291b67f7a2ed2e82bfc",
+            "0d32734804454f3fa1511a8be9facd5b",
+            "5250b56c84e34862ac892d395730218f",
+            "5bf6228dfb8f4cc1a542590545f68338",
+            "f05724a0721f4321ac7f41129682e232",
+            "20029df435c44a44a6b1c61552cf8a25"
+          ]
+        },
+        "id": "3nGUWZi-lqb-",
+        "outputId": "2054d9dc-53c1-4f1e-b858-6810c0c61cbb"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stderr",
+          "text": [
+            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: \n",
+            "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
+            "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
+            "You will be able to reuse this secret in all of your notebooks.\n",
+            "Please note that authentication is recommended but still optional to access public models or datasets.\n",
+            "  warnings.warn(\n"
+          ]
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "998fbdaa144d466b8973bda101228f84"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Python version is above 3.10, patching the collections module.\n"
+          ]
+        },
+        {
+          "output_type": "stream",
+          "name": "stderr",
+          "text": [
+            "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+            "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
+            "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
+          ]
+        },
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Response(text=\"\\n\\nThis is a vivid, color photograph of a tiger resting in a grassy field. The tiger, with its distinctive orange and black stripes, is lying on its side, its body facing the camera. Its eyes are wide open, and it appears to be gazing directly at the camera, creating a sense of connection between the viewer and the subject. The tiger's tail is curled around its body, adding to the relaxed posture. The background is a lush green field, suggesting a natural, outdoor setting. There are no other animals visible in the image. The tiger's position and the open field provide a sense of tranquility and freedom.\", generate_token_len=130, input_token_len=625, session_id=0, finish_reason='stop', token_ids=[185, 185, 1567, 317, 245, 26206, 11, 3042, 14537, 280, 245, 42901, 28459, 279, 245, 69139, 2021, 13, 429, 42901, 11, 366, 895, 30372, 16639, 285, 3438, 45138, 11, 317, 13595, 331, 895, 2387, 11, 895, 3123, 14087, 254, 8603, 13, 9904, 3545, 418, 5505, 1721, 11, 285, 359, 6266, 276, 330, 36545, 4723, 430, 254, 8603, 11, 6817, 245, 3078, 280, 4714, 1439, 254, 32975, 285, 254, 3605, 13, 429, 42901, 6, 82, 9960, 317, 61867, 1983, 895, 3123, 11, 7227, 276, 254, 23450, 43891, 13, 429, 4140, 317, 245, 50461, 5575, 2021, 11, 23473, 245, 3892, 11, 13022, 5007, 13, 2071, 418, 642, 750, 8466, 9200, 279, 254, 3324, 13, 429, 42901, 6, 82, 3299, 285, 254, 1721, 2021, 2774, 245, 3078, 280, 28036, 1242, 285, 10264, 13], logprobs=None)\n"
+          ]
+        }
+      ]
+    }
+  ]
+}

a_mllm_notebooks/lmdeploy/lmdeploy_info.ipynb ADDED Viewed

	@@ -0,0 +1,132 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/dscilab_dungvo/workspace/huggingface_cache\n",
+      "models--AIDC-AI--Ovis1.6-Gemma2-27B\n",
+      "models--FoundationVision--groma-7b-pretrain\n",
+      "models--MBZUAI--GLaMM-FullScope\n",
+      "models--OpenGVLab--InternVL2_5-26B-AWQ\n",
+      "models--OpenGVLab--InternVL2_5-38B-AWQ\n",
+      "models--OpenGVLab--InternVL2_5-78B-AWQ\n",
+      "models--Qwen--Qwen2-VL-2B-Instruct\n",
+      "models--Qwen--Qwen2-VL-72B-Instruct-AWQ\n",
+      "models--Qwen--Qwen2-VL-7B-Instruct\n",
+      "models--Qwen--Qwen2.5-7B-Instruct\n",
+      "models--meta-llama--Llama-3.2-90B-Vision-Instruct\n",
+      "models--opengvlab--internvl2_5-26B-AWQ\n",
+      "models--opengvlab--internvl2_5-38B-AWQ\n",
+      "models--vinai--phobert-base-v2\n",
+      "version.txt\n"
+     ]
+    }
+   ],
+   "source": [
+    "!echo $HF_HOME\n",
+    "!ls $HF_HOME/hub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The supported chat template names are:\n",
+      "baichuan2\n",
+      "base\n",
+      "chatglm\n",
+      "chatglm3\n",
+      "codegeex4\n",
+      "codellama\n",
+      "cogvlm\n",
+      "cogvlm2\n",
+      "dbrx\n",
+      "deepseek\n",
+      "deepseek-coder\n",
+      "deepseek-vl\n",
+      "falcon\n",
+      "gemma\n",
+      "glm4\n",
+      "internlm\n",
+      "internlm-xcomposer2\n",
+      "internlm-xcomposer2d5\n",
+      "internlm2\n",
+      "internvl-internlm2\n",
+      "internvl-phi3\n",
+      "internvl-zh\n",
+      "internvl-zh-hermes2\n",
+      "internvl2-internlm2\n",
+      "internvl2-phi3\n",
+      "internvl2_5\n",
+      "llama\n",
+      "llama2\n",
+      "llama3\n",
+      "llama3_1\n",
+      "llama3_2\n",
+      "llava-chatml\n",
+      "llava-v1\n",
+      "mini-gemini-vicuna\n",
+      "minicpm3\n",
+      "minicpmv-2d6\n",
+      "mistral\n",
+      "mixtral\n",
+      "molmo\n",
+      "phi-3\n",
+      "puyu\n",
+      "qwen\n",
+      "qwen2d5\n",
+      "solar\n",
+      "ultracm\n",
+      "ultralm\n",
+      "vicuna\n",
+      "wizardlm\n",
+      "yi\n",
+      "yi-vl\n"
+     ]
+    }
+   ],
+   "source": [
+    "!lmdeploy list"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_mllm_notebooks/lmdeploy/lmdeploy_serve.sh ADDED Viewed

	@@ -0,0 +1,47 @@

+eval "$(conda shell.bash hook)"
+conda activate lmdeploy
+# MODEL_NAME=OpenGVLab/InternVL2_5-1B
+# MODEL_NAME=OpenGVLab/InternVL2_5-26B-AWQ
+MODEL_NAME=OpenGVLab/InternVL2_5-26B-MPO-AWQ
+# MODEL_NAME=Qwen/Qwen2-VL-7B-Instruct-AWQ
+# PROXY_URL=0.0.0.0
+# lmdeploy serve proxy --server-name $PROXY_URL --server-port 8080 --strategy "min_expected_latency" &
+CUDA_VISIBLE_DEVICES=2 \
+lmdeploy serve api_server \
+$MODEL_NAME \
+--server-port 2002 \
+--tp 1 \
+--dtype float16 \
+--cache-max-entry-count 0.05 \
+--proxy-url http://0.0.0.0:8082 &
+# --backend turbomind \
+# --model-format awq \
+# lmdeploy serve api_server [-h] [--server-name SERVER_NAME] [--server-port SERVER_PORT]
+#                                  [--allow-origins ALLOW_ORIGINS [ALLOW_ORIGINS ...]] [--allow-credentials]
+#                                  [--allow-methods ALLOW_METHODS [ALLOW_METHODS ...]]
+#                                  [--allow-headers ALLOW_HEADERS [ALLOW_HEADERS ...]] [--proxy-url PROXY_URL]
+#                                  [--backend {pytorch,turbomind}]
+#                                  [--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}]
+#                                  [--api-keys [API_KEYS [API_KEYS ...]]] [--ssl] [--model-name MODEL_NAME]
+#                                  [--max-log-len MAX_LOG_LEN] [--disable-fastapi-docs]
+#                                  [--chat-template CHAT_TEMPLATE] [--revision REVISION]
+#                                  [--download-dir DOWNLOAD_DIR] [--adapters [ADAPTERS [ADAPTERS ...]]]
+#                                  [--device {cuda,ascend,maca}] [--eager-mode] [--dtype {auto,float16,bfloat16}]
+#                                  [--tp TP] [--session-len SESSION_LEN] [--max-batch-size MAX_BATCH_SIZE]
+#                                  [--cache-max-entry-count CACHE_MAX_ENTRY_COUNT]
+#                                  [--cache-block-seq-len CACHE_BLOCK_SEQ_LEN] [--enable-prefix-caching]
+#                                  [--max-prefill-token-num MAX_PREFILL_TOKEN_NUM] [--quant-policy {0,4,8}]
+#                                  [--model-format {hf,llama,awq,gptq}] [--rope-scaling-factor ROPE_SCALING_FACTOR]
+#                                  [--num-tokens-per-iter NUM_TOKENS_PER_ITER]
+#                                  [--max-prefill-iters MAX_PREFILL_ITERS]
+#                                  [--vision-max-batch-size VISION_MAX_BATCH_SIZE]
+#                                  model_path

a_mllm_notebooks/lmdeploy/long_context.ipynb ADDED Viewed

	@@ -0,0 +1,169 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a674e57f",
+   "metadata": {},
+   "source": [
+    "# Context length extrapolation\n",
+    "\n",
+    "Long text extrapolation refers to the ability of LLM to handle data longer than the training text during inference. TurboMind engine now support [LlamaDynamicNTKScalingRotaryEmbedding](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L178) and the implementation is consistent with huggingface.\n",
+    "\n",
+    "## Usage\n",
+    "\n",
+    "You can enable the context length extrapolation abality by modifying the TurbomindEngineConfig. Edit the `session_len` to the expected length and change `rope_scaling_factor` to a number no less than 1.0.\n",
+    "\n",
+    "Take `internlm2_5-7b-chat-1m` as an example, which supports a context length of up to **1 million tokens**:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c4781275",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig\n",
+    "\n",
+    "backend_config = TurbomindEngineConfig(\n",
+    "        rope_scaling_factor=2.5,\n",
+    "        session_len=1000000,\n",
+    "        max_batch_size=1,\n",
+    "        cache_max_entry_count=0.7,\n",
+    "        tp=4)\n",
+    "pipe = pipeline('internlm/internlm2_5-7b-chat-1m', backend_config=backend_config)\n",
+    "prompt = 'Use a long prompt to replace this sentence'\n",
+    "gen_config = GenerationConfig(top_p=0.8,\n",
+    "                              top_k=40,\n",
+    "                              temperature=0.8,\n",
+    "                              max_new_tokens=1024)\n",
+    "response = pipe(prompt, gen_config=gen_config)\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbd245e2",
+   "metadata": {},
+   "source": [
+    "## Evaluation\n",
+    "\n",
+    "We use several methods to evaluate the long-context-length inference ability of LMDeploy, including [passkey retrieval](#passkey-retrieval), [needle in a haystack](#needle-in-a-haystack) and computing [perplexity](#perplexity)\n",
+    "\n",
+    "### Passkey Retrieval\n",
+    "\n",
+    "You can try the following code to test how many times LMDeploy can retrieval the special key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2de48014",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from lmdeploy import pipeline\n",
+    "from lmdeploy import TurbomindEngineConfig\n",
+    "import time\n",
+    "\n",
+    "session_len = 1000000\n",
+    "backend_config = TurbomindEngineConfig(\n",
+    "        rope_scaling_factor=2.5,\n",
+    "        session_len=session_len,\n",
+    "        max_batch_size=1,\n",
+    "        cache_max_entry_count=0.7,\n",
+    "        tp=4)\n",
+    "pipe = pipeline('internlm/internlm2_5-7b-chat-1m', backend_config=backend_config)\n",
+    "\n",
+    "\n",
+    "def passkey_retrieval(session_len, n_round=5):\n",
+    "    # create long context input\n",
+    "    tok = pipe.tokenizer\n",
+    "    task_description = 'There is an important info hidden inside a lot of irrelevant text. Find it and memorize them. I will quiz you about the important information there.'\n",
+    "    garbage = 'The grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.'\n",
+    "\n",
+    "    for _ in range(n_round):\n",
+    "        start = time.perf_counter()\n",
+    "        n_times = (session_len - 1000) // len(tok.encode(garbage))\n",
+    "        n_garbage_prefix = np.random.randint(0, n_times)\n",
+    "        n_garbage_suffix = n_times - n_garbage_prefix\n",
+    "        garbage_prefix = ' '.join([garbage] * n_garbage_prefix)\n",
+    "        garbage_suffix = ' '.join([garbage] * n_garbage_suffix)\n",
+    "        pass_key = np.random.randint(1, 50000)\n",
+    "        information_line = f'The pass key is {pass_key}. Remember it. {pass_key} is the pass key.'  # noqa: E501\n",
+    "        final_question = 'What is the pass key? The pass key is'\n",
+    "        lines = [\n",
+    "            task_description,\n",
+    "            garbage_prefix,\n",
+    "            information_line,\n",
+    "            garbage_suffix,\n",
+    "            final_question,\n",
+    "        ]\n",
+    "\n",
+    "        # inference\n",
+    "        prompt = ' '.join(lines)\n",
+    "        response = pipe([prompt])\n",
+    "        print(pass_key, response)\n",
+    "        end = time.perf_counter()\n",
+    "        print(f'duration: {end - start} s')\n",
+    "\n",
+    "passkey_retrieval(session_len, 5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c33e786",
+   "metadata": {},
+   "source": [
+    "This test takes approximately 364 seconds per round when conducted on A100-80G GPUs\n",
+    "\n",
+    "### Needle In A Haystack\n",
+    "\n",
+    "[OpenCompass](https://github.com/open-compass/opencompass) offers very useful tools to perform needle-in-a-haystack evaluation. For specific instructions, please refer to the [guide](https://github.com/open-compass/opencompass/blob/main/docs/en/advanced_guides/needleinahaystack_eval.md).\n",
+    "\n",
+    "### Perplexity\n",
+    "\n",
+    "The following codes demonstrate how to use LMDeploy to calculate perplexity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3b9a97ec",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoTokenizer\n",
+    "from lmdeploy import TurbomindEngineConfig, pipeline\n",
+    "import numpy as np\n",
+    "\n",
+    "# load model and tokenizer\n",
+    "model_repoid_or_path = 'internlm/internlm2_5-7b-chat-1m'\n",
+    "backend_config = TurbomindEngineConfig(\n",
+    "        rope_scaling_factor=2.5,\n",
+    "        session_len=1000000,\n",
+    "        max_batch_size=1,\n",
+    "        cache_max_entry_count=0.7,\n",
+    "        tp=4)\n",
+    "pipe = pipeline(model_repoid_or_path, backend_config=backend_config)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_repoid_or_path, trust_remote_code=True)\n",
+    "\n",
+    "# get perplexity\n",
+    "text = 'Use a long prompt to replace this sentence'\n",
+    "input_ids = tokenizer.encode(text)\n",
+    "ppl = pipe.get_ppl(input_ids)[0]\n",
+    "print(ppl)"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/long_context.md ADDED Viewed

	@@ -0,0 +1,119 @@

+# Context length extrapolation
+Long text extrapolation refers to the ability of LLM to handle data longer than the training text during inference. TurboMind engine now support [LlamaDynamicNTKScalingRotaryEmbedding](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L178) and the implementation is consistent with huggingface.
+## Usage
+You can enable the context length extrapolation abality by modifying the TurbomindEngineConfig. Edit the `session_len` to the expected length and change `rope_scaling_factor` to a number no less than 1.0.
+Take `internlm2_5-7b-chat-1m` as an example, which supports a context length of up to **1 million tokens**:
+```python
+from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig
+backend_config = TurbomindEngineConfig(
+        rope_scaling_factor=2.5,
+        session_len=1000000,
+        max_batch_size=1,
+        cache_max_entry_count=0.7,
+        tp=4)
+pipe = pipeline('internlm/internlm2_5-7b-chat-1m', backend_config=backend_config)
+prompt = 'Use a long prompt to replace this sentence'
+gen_config = GenerationConfig(top_p=0.8,
+                              top_k=40,
+                              temperature=0.8,
+                              max_new_tokens=1024)
+response = pipe(prompt, gen_config=gen_config)
+print(response)
+```
+## Evaluation
+We use several methods to evaluate the long-context-length inference ability of LMDeploy, including [passkey retrieval](#passkey-retrieval), [needle in a haystack](#needle-in-a-haystack) and computing [perplexity](#perplexity)
+### Passkey Retrieval
+You can try the following code to test how many times LMDeploy can retrieval the special key.
+```python
+import numpy as np
+from lmdeploy import pipeline
+from lmdeploy import TurbomindEngineConfig
+import time
+session_len = 1000000
+backend_config = TurbomindEngineConfig(
+        rope_scaling_factor=2.5,
+        session_len=session_len,
+        max_batch_size=1,
+        cache_max_entry_count=0.7,
+        tp=4)
+pipe = pipeline('internlm/internlm2_5-7b-chat-1m', backend_config=backend_config)
+def passkey_retrieval(session_len, n_round=5):
+    # create long context input
+    tok = pipe.tokenizer
+    task_description = 'There is an important info hidden inside a lot of irrelevant text. Find it and memorize them. I will quiz you about the important information there.'
+    garbage = 'The grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.'
+    for _ in range(n_round):
+        start = time.perf_counter()
+        n_times = (session_len - 1000) // len(tok.encode(garbage))
+        n_garbage_prefix = np.random.randint(0, n_times)
+        n_garbage_suffix = n_times - n_garbage_prefix
+        garbage_prefix = ' '.join([garbage] * n_garbage_prefix)
+        garbage_suffix = ' '.join([garbage] * n_garbage_suffix)
+        pass_key = np.random.randint(1, 50000)
+        information_line = f'The pass key is {pass_key}. Remember it. {pass_key} is the pass key.'  # noqa: E501
+        final_question = 'What is the pass key? The pass key is'
+        lines = [
+            task_description,
+            garbage_prefix,
+            information_line,
+            garbage_suffix,
+            final_question,
+        ]
+        # inference
+        prompt = ' '.join(lines)
+        response = pipe([prompt])
+        print(pass_key, response)
+        end = time.perf_counter()
+        print(f'duration: {end - start} s')
+passkey_retrieval(session_len, 5)
+```
+This test takes approximately 364 seconds per round when conducted on A100-80G GPUs
+### Needle In A Haystack
+[OpenCompass](https://github.com/open-compass/opencompass) offers very useful tools to perform needle-in-a-haystack evaluation. For specific instructions, please refer to the [guide](https://github.com/open-compass/opencompass/blob/main/docs/en/advanced_guides/needleinahaystack_eval.md).
+### Perplexity
+The following codes demonstrate how to use LMDeploy to calculate perplexity.
+```python
+from transformers import AutoTokenizer
+from lmdeploy import TurbomindEngineConfig, pipeline
+import numpy as np
+# load model and tokenizer
+model_repoid_or_path = 'internlm/internlm2_5-7b-chat-1m'
+backend_config = TurbomindEngineConfig(
+        rope_scaling_factor=2.5,
+        session_len=1000000,
+        max_batch_size=1,
+        cache_max_entry_count=0.7,
+        tp=4)
+pipe = pipeline(model_repoid_or_path, backend_config=backend_config)
+tokenizer = AutoTokenizer.from_pretrained(model_repoid_or_path, trust_remote_code=True)
+# get perplexity
+text = 'Use a long prompt to replace this sentence'
+input_ids = tokenizer.encode(text)
+ppl = pipe.get_ppl(input_ids)[0]
+print(ppl)
+```

a_mllm_notebooks/lmdeploy/pipeline.ipynb ADDED Viewed

	@@ -0,0 +1,570 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d3f6f4c5",
+   "metadata": {},
+   "source": [
+    "# Offline Inference Pipeline\n",
+    "\n",
+    "In this tutorial, We will present a list of examples to introduce the usage of `lmdeploy.pipeline`.\n",
+    "\n",
+    "You can overview the detailed pipeline API in [this](https://lmdeploy.readthedocs.io/en/latest/api/pipeline.html) guide.\n",
+    "\n",
+    "## Usage\n",
+    "\n",
+    "- **An example using default parameters:**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "3ff6970a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[0;31mInit signature:\u001b[0m\n",
+      "\u001b[0mTurbomindEngineConfig\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mdtype\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'auto'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmodel_format\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mNoneType\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mtp\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0msession_len\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mNoneType\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmax_batch_size\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mcache_max_entry_count\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.8\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mcache_chunk_size\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mcache_block_seq_len\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m64\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0menable_prefix_caching\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mbool\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mquant_policy\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mrope_scaling_factor\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0muse_logn_attn\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mbool\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mdownload_dir\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mNoneType\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mrevision\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mNoneType\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmax_prefill_token_num\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m8192\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mnum_tokens_per_iter\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmax_prefill_iters\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mSource:\u001b[0m        \n",
+      "\u001b[0;32mclass\u001b[0m \u001b[0mTurbomindEngineConfig\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;34m\"\"\"TurboMind Engine config.\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m    Args:\u001b[0m\n",
+      "\u001b[0;34m        dtype (str): data type for model weights and activations. It can be\u001b[0m\n",
+      "\u001b[0;34m            one of the following values, ['auto', 'float16', 'bfloat16']\u001b[0m\n",
+      "\u001b[0;34m            The `auto` option will use FP16 precision for FP32 and FP16\u001b[0m\n",
+      "\u001b[0;34m            models, and BF16 precision for BF16 models.\u001b[0m\n",
+      "\u001b[0;34m        model_format (str): the layout of the deployed model. It can be one\u001b[0m\n",
+      "\u001b[0;34m            of the following values [hf, meta_llama, awq, gptq],`hf` meaning\u001b[0m\n",
+      "\u001b[0;34m            huggingface model(.bin, .safetensors), `meta_llama` being\u001b[0m\n",
+      "\u001b[0;34m            meta llama's format(.pth), `awq` and `gptq` meaning the quantized\u001b[0m\n",
+      "\u001b[0;34m            model by AWQ and GPTQ, respectively. If it is not specified,\u001b[0m\n",
+      "\u001b[0;34m            i.e. None, it will be extracted from the input model\u001b[0m\n",
+      "\u001b[0;34m        tp (int): the number of GPU cards used in tensor parallelism,\u001b[0m\n",
+      "\u001b[0;34m            default to 1\u001b[0m\n",
+      "\u001b[0;34m        session_len (int): the max session length of a sequence, default to\u001b[0m\n",
+      "\u001b[0;34m            None\u001b[0m\n",
+      "\u001b[0;34m        max_batch_size (int): the max batch size during inference. If it is\u001b[0m\n",
+      "\u001b[0;34m            not specified, the engine will automatically set it according to\u001b[0m\n",
+      "\u001b[0;34m            the device\u001b[0m\n",
+      "\u001b[0;34m        cache_max_entry_count (float): the percentage of gpu memory occupied\u001b[0m\n",
+      "\u001b[0;34m            by the k/v cache.\u001b[0m\n",
+      "\u001b[0;34m            For versions of lmdeploy between `v0.2.0` and `v0.2.1`, it\u001b[0m\n",
+      "\u001b[0;34m            defaults to 0.5, depicting the percentage of TOTAL GPU memory to\u001b[0m\n",
+      "\u001b[0;34m            be allocated to the k/v cache.\u001b[0m\n",
+      "\u001b[0;34m            For lmdeploy versions greater than `v0.2.1`, it defaults to 0.8,\u001b[0m\n",
+      "\u001b[0;34m            signifying the percentage of FREE GPU memory to be reserved for\u001b[0m\n",
+      "\u001b[0;34m            the k/v cache\u001b[0m\n",
+      "\u001b[0;34m        cache_chunk_size (int): The policy to apply for KV block from\u001b[0m\n",
+      "\u001b[0;34m            the block manager, default to -1.\u001b[0m\n",
+      "\u001b[0;34m        cache_block_seq_len (int): the length of the token sequence in\u001b[0m\n",
+      "\u001b[0;34m            a k/v block, default to 64\u001b[0m\n",
+      "\u001b[0;34m        enable_prefix_caching (bool): enable cache prompts for block reuse,\u001b[0m\n",
+      "\u001b[0;34m            default to False\u001b[0m\n",
+      "\u001b[0;34m        quant_policy (int): default to 0. When k/v is quantized into 4 or 8\u001b[0m\n",
+      "\u001b[0;34m            bit, set it to 4 or 8, respectively\u001b[0m\n",
+      "\u001b[0;34m        rope_scaling_factor (float): scaling factor used for dynamic ntk,\u001b[0m\n",
+      "\u001b[0;34m            default to 0. TurboMind follows the implementation of transformer\u001b[0m\n",
+      "\u001b[0;34m            LlamaAttention\u001b[0m\n",
+      "\u001b[0;34m        use_logn_attn (bool): whether or not to use log attn: default to False\u001b[0m\n",
+      "\u001b[0;34m        download_dir (str): Directory to download and load the weights,\u001b[0m\n",
+      "\u001b[0;34m            default to the default cache directory of huggingface.\u001b[0m\n",
+      "\u001b[0;34m        revision (str): The specific model version to use. It can be a branch\u001b[0m\n",
+      "\u001b[0;34m            name, a tag name, or a commit id. If unspecified, will use the\u001b[0m\n",
+      "\u001b[0;34m            default version.\u001b[0m\n",
+      "\u001b[0;34m        max_prefill_token_num(int): the number of tokens each iteration during\u001b[0m\n",
+      "\u001b[0;34m            prefill, default to 8192\u001b[0m\n",
+      "\u001b[0;34m        num_tokens_per_iter(int): the number of tokens processed in each\u001b[0m\n",
+      "\u001b[0;34m            forward pass. Working with `max_prefill_iters` enables the\u001b[0m\n",
+      "\u001b[0;34m            \"Dynamic SplitFuse\"-like scheduling\u001b[0m\n",
+      "\u001b[0;34m        max_prefill_iters(int): the max number of forward pass during prefill\u001b[0m\n",
+      "\u001b[0;34m            stage\u001b[0m\n",
+      "\u001b[0;34m    \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mdtype\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'auto'\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmodel_format\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOptional\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mtp\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0msession_len\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOptional\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmax_batch_size\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mcache_max_entry_count\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.8\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mcache_chunk_size\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mcache_block_seq_len\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m64\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0menable_prefix_caching\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mbool\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mquant_policy\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mrope_scaling_factor\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0muse_logn_attn\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mbool\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mdownload_dir\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOptional\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mrevision\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOptional\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmax_prefill_token_num\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m8192\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mnum_tokens_per_iter\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0mmax_prefill_iters\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mint\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m    \u001b[0;32mdef\u001b[0m \u001b[0m__post_init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;34m\"\"\"Check input validation.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'auto'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'float16'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'bfloat16'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtp\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'tp must be a positive integer'\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;32massert\u001b[0m \u001b[0;36m0\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcache_max_entry_count\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \\\n",
+      "            \u001b[0;34m'invalid cache_max_entry_count'\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mquant_policy\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m8\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'invalid quant_policy'\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrope_scaling_factor\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'invalid rope_scaling_factor'\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmax_prefill_token_num\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \\\n",
+      "            \u001b[0;34m'invalid max_prefill_token_num'\u001b[0m\u001b[0;34m\u001b[0m\n",
+      "\u001b[0;34m\u001b[0m        \u001b[0;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnum_tokens_per_iter\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'invalid num_tokens_per_iter'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mFile:\u001b[0m           /dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/messages.py\n",
+      "\u001b[0;31mType:\u001b[0m           type\n",
+      "\u001b[0;31mSubclasses:\u001b[0m     "
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy import pipeline, TurbomindEngineConfig\n",
+    "TurbomindEngineConfig??"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "346a051f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Fetching 14 files: 100%|█████████████████████████████████████| 14/14 [00:00<00:00, 84855.86it/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2024-12-20 08:11:58,360 - lmdeploy - \u001b[33mWARNING\u001b[0m - turbomind.py:231 - get 849 model params\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[TM][WARNING] [LlamaTritonModel] `max_context_token_num` is not set, default to 32768.\n",
+      "                                                                                                \r"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
+      "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
+      "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
+      "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
+      "2024-12-20 08:12:08,076 - lmdeploy - \u001b[33mWARNING\u001b[0m - async_engine.py:505 - GenerationConfig: GenerationConfig(n=1, max_new_tokens=512, do_sample=False, top_p=1.0, top_k=50, min_p=0.0, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, stop_token_ids=[151645], bad_token_ids=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, response_format=None, logits_processors=None)\n",
+      "2024-12-20 08:12:08,077 - lmdeploy - \u001b[33mWARNING\u001b[0m - async_engine.py:506 - Since v0.6.0, lmdeploy add `do_sample` in GenerationConfig. It defaults to False, meaning greedy decoding. Please set `do_sample=True` if sampling  decoding is needed\n",
+      "[Response(text=\"Hello! I'm Qwen, an AI assistant created by Alibaba Cloud. I'm here to help with a wide variety of tasks, from answering questions and providing information on various topics to assisting with writing, translating, and more. How can I assist you today?\", generate_token_len=53, input_token_len=34, session_id=0, finish_reason='stop', token_ids=[9707, 0, 358, 2776, 1207, 16948, 11, 458, 15235, 17847, 3465, 553, 54364, 14817, 13, 358, 2776, 1588, 311, 1492, 448, 264, 6884, 8045, 315, 9079, 11, 504, 35764, 4755, 323, 8241, 1995, 389, 5257, 13347, 311, 45827, 448, 4378, 11, 66271, 11, 323, 803, 13, 2585, 646, 358, 7789, 498, 3351, 30], logprobs=None, index=0), Response(text='Shanghai is a major city located in the eastern part of China, at the mouth of the Yangtze River. It is the largest city in China and one of the largest cities in the world by population. Shanghai is known for its blend of traditional and modern architecture, vibrant economy, and cultural diversity. It is a global financial hub and a major center for commerce, fashion, technology, and transportation. Some notable features of Shanghai include the Shanghai Tower, the Bund, and the ancient city walls of the Huangpu District.', generate_token_len=106, input_token_len=32, session_id=1, finish_reason='stop', token_ids=[2016, 30070, 374, 264, 3598, 3283, 7407, 304, 279, 23149, 949, 315, 5616, 11, 518, 279, 10780, 315, 279, 24474, 83, 2986, 10948, 13, 1084, 374, 279, 7772, 3283, 304, 5616, 323, 825, 315, 279, 7772, 9720, 304, 279, 1879, 553, 7042, 13, 37047, 374, 3881, 369, 1181, 20334, 315, 8606, 323, 6481, 17646, 11, 32976, 8584, 11, 323, 12752, 19492, 13, 1084, 374, 264, 3644, 5896, 18719, 323, 264, 3598, 4126, 369, 35654, 11, 11153, 11, 5440, 11, 323, 17903, 13, 4329, 27190, 4419, 315, 37047, 2924, 279, 37047, 21938, 11, 279, 29608, 11, 323, 279, 13833, 3283, 14285, 315, 279, 58409, 5584, 10942, 13], logprobs=None, index=1)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# %pip install nest_asyncio\n",
+    "import nest_asyncio\n",
+    "nest_asyncio.apply()\n",
+    "\n",
+    "\n",
+    "from lmdeploy import pipeline, TurbomindEngineConfig\n",
+    "\n",
+    "backend_config = TurbomindEngineConfig(tp=4, cache_max_entry_count=0.2)\n",
+    "\n",
+    "\n",
+    "# pipe = pipeline('internlm/internlm2_5-7b-chat')\n",
+    "# models--Qwen--Qwen2.5-7B-Instruct\n",
+    "if __name__ == \"__main__\":\n",
+    "    pipe = pipeline(\"Qwen/Qwen2.5-7B-Instruct\", backend_config=backend_config)\n",
+    "    response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"])\n",
+    "    print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2abce346",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82167998",
+   "metadata": {},
+   "source": [
+    "In this example, the pipeline by default allocates a predetermined percentage of GPU memory for storing k/v cache. The ratio is dictated by the parameter `TurbomindEngineConfig.cache_max_entry_count`.\n",
+    "\n",
+    "There have been alterations to the strategy for setting the k/v cache ratio throughout the evolution of LMDeploy. The following are the change histories:\n",
+    "\n",
+    "1. `v0.2.0 <= lmdeploy <= v0.2.1`\n",
+    "\n",
+    "   `TurbomindEngineConfig.cache_max_entry_count` defaults to 0.5, indicating 50% GPU **total memory** allocated for k/v cache. Out Of Memory (OOM) errors may occur if a 7B model is deployed on a GPU with memory less than 40G. If you encounter an OOM error, please decrease the ratio of the k/v cache occupation as follows:\n",
+    "\n",
+    "   ```python\n",
+    "   from lmdeploy import pipeline, TurbomindEngineConfig\n",
+    "\n",
+    "   # decrease the ratio of the k/v cache occupation to 20%\n",
+    "   backend_config = TurbomindEngineConfig(cache_max_entry_count=0.2)\n",
+    "\n",
+    "   pipe = pipeline('internlm/internlm2_5-7b-chat',\n",
+    "                   backend_config=backend_config)\n",
+    "   response = pipe(['Hi, pls intro yourself', 'Shanghai is'])\n",
+    "   print(response)\n",
+    "   ```\n",
+    "\n",
+    "2. `lmdeploy > v0.2.1`\n",
+    "\n",
+    "   The allocation strategy for k/v cache is changed to reserve space from the **GPU free memory** proportionally. The ratio `TurbomindEngineConfig.cache_max_entry_count` has been adjusted to 0.8 by default. If OOM error happens, similar to the method mentioned above, please consider reducing the ratio value to decrease the memory usage of the k/v cache.\n",
+    "\n",
+    "- **An example showing how to set tensor parallel num**:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7f51b276",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, TurbomindEngineConfig\n",
+    "\n",
+    "model_path = \"Qwen/Qwen2.5-7B-Instruct\"\n",
+    "backend_config = TurbomindEngineConfig(tp=2, cache_max_entry_count=0.2)\n",
+    "pipe = pipeline(model_path, backend_config=backend_config)\n",
+    "response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"])\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "662e7b9b",
+   "metadata": {},
+   "source": [
+    "- **An example for setting sampling parameters:**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "ee7ffc98",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig\n",
+    "\n",
+    "backend_config = TurbomindEngineConfig(tp=2)\n",
+    "gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0.8, max_new_tokens=1024)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72358f97",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipe = pipeline(\"internlm/internlm2_5-7b-chat\", backend_config=backend_config)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "af76389b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2024-12-20 08:10:41,524 - lmdeploy - \u001b[33mWARNING\u001b[0m - async_engine.py:505 - GenerationConfig: GenerationConfig(n=1, max_new_tokens=1024, do_sample=False, top_p=0.8, top_k=40, min_p=0.0, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, stop_token_ids=[151645], bad_token_ids=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, response_format=None, logits_processors=None)\n",
+      "[Response(text=\"Hello! I'm Qwen, an AI assistant created by Alibaba Cloud. I'm here to help with a wide variety of tasks, from answering questions and providing information on various topics to assisting with writing, translating, and more. How can I assist you today?\", generate_token_len=53, input_token_len=34, session_id=4, finish_reason='stop', token_ids=[9707, 0, 358, 2776, 1207, 16948, 11, 458, 15235, 17847, 3465, 553, 54364, 14817, 13, 358, 2776, 1588, 311, 1492, 448, 264, 6884, 8045, 315, 9079, 11, 504, 35764, 4755, 323, 8241, 1995, 389, 5257, 13347, 311, 45827, 448, 4378, 11, 66271, 11, 323, 803, 13, 2585, 646, 358, 7789, 498, 3351, 30], logprobs=None, index=0), Response(text='Shanghai is a major city located in the eastern part of China, at the mouth of the Yangtze River. It is the largest city in China and one of the largest cities in the world by population. Shanghai is known for its blend of traditional and modern architecture, vibrant economy, and cultural diversity. It is a global financial hub and a major center for commerce, fashion, technology, and transportation. Some notable features of Shanghai include the Shanghai Tower, the Bund, and the ancient city walls of the Huangpu District.', generate_token_len=106, input_token_len=32, session_id=5, finish_reason='stop', token_ids=[2016, 30070, 374, 264, 3598, 3283, 7407, 304, 279, 23149, 949, 315, 5616, 11, 518, 279, 10780, 315, 279, 24474, 83, 2986, 10948, 13, 1084, 374, 279, 7772, 3283, 304, 5616, 323, 825, 315, 279, 7772, 9720, 304, 279, 1879, 553, 7042, 13, 37047, 374, 3881, 369, 1181, 20334, 315, 8606, 323, 6481, 17646, 11, 32976, 8584, 11, 323, 12752, 19492, 13, 1084, 374, 264, 3644, 5896, 18719, 323, 264, 3598, 4126, 369, 35654, 11, 11153, 11, 5440, 11, 323, 17903, 13, 4329, 27190, 4419, 315, 37047, 2924, 279, 37047, 21938, 11, 279, 29608, 11, 323, 279, 13833, 3283, 14285, 315, 279, 58409, 5584, 10942, 13], logprobs=None, index=1)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"], gen_config=gen_config)\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06a02f9d",
+   "metadata": {},
+   "source": [
+    "- **An example for OpenAI format prompt input:**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b6e03be1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Fetching 20 files:  25%|██████████▎                              | 5/20 [00:09<00:28,  1.89s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig\n",
+    "\n",
+    "backend_config = TurbomindEngineConfig(tp=2)\n",
+    "gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0.8, max_new_tokens=1024)\n",
+    "# pipe = pipeline(\"internlm/internlm2_5-7b-chat\", backend_config=backend_config)\n",
+    "prompts = [\n",
+    "    [{\"role\": \"user\", \"content\": \"Hi, pls intro yourself\"}],\n",
+    "    [{\"role\": \"user\", \"content\": \"Shanghai is\"}],\n",
+    "]\n",
+    "response = pipe(prompts, gen_config=gen_config)\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc8ef83f",
+   "metadata": {},
+   "source": [
+    "- **An example for streaming mode:**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "197a2719",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Hello!Sh Ianghai is'm Q a majorwen, city located an in AI assistant the eastern created by part of Alibaba Cloud China,. I at the'm here mouth of to help the Yang with at wideze variety River of. tasks It, is from the answering largest questions city and in providing China information and on one various of topics the to largest assisting cities with in writing the, world translating by, population and. more Shanghai. is How known can for I its assist blend you of today traditional? and modern architecture, vibrant economy, and cultural diversity. It is a global financial hub and a major center for commerce, fashion, technology, and transportation. Some notable features of Shanghai include the Shanghai Tower, the Bund, and the ancient city walls of the Huangpu District."
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig\n",
+    "\n",
+    "backend_config = TurbomindEngineConfig(tp=2)\n",
+    "gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0.8, max_new_tokens=1024)\n",
+    "# pipe = pipeline(\"internlm/internlm2_5-7b-chat\", backend_config=backend_config)\n",
+    "prompts = [\n",
+    "    [{\"role\": \"user\", \"content\": \"Hi, pls intro yourself\"}],\n",
+    "    [{\"role\": \"user\", \"content\": \"Shanghai is\"}],\n",
+    "]\n",
+    "for item in pipe.stream_infer(prompts, gen_config=gen_config):\n",
+    "    # print(item.text)\n",
+    "    # echo item.text incrementally\n",
+    "    print(item.text, end=\"\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "77228bf3",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Fetching 20 files:  25%|██████████▎                              | 5/20 [00:06<00:18,  1.26s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig\n",
+    "\n",
+    "backend_config = TurbomindEngineConfig(tp=2)\n",
+    "gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0.8, max_new_tokens=1024)\n",
+    "pipe = pipeline(\"internlm/internlm2_5-7b-chat\", backend_config=backend_config)\n",
+    "prompts = [\n",
+    "    [{\"role\": \"user\", \"content\": \"Hi, pls intro yourself\"}],\n",
+    "    [{\"role\": \"user\", \"content\": \"Shanghai is\" * 10}],\n",
+    "]\n",
+    "for item in pipe.stream_infer(prompts, gen_config=gen_config):\n",
+    "    print(item)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb2782c8",
+   "metadata": {},
+   "source": [
+    "- **An example to cauculate logits & ppl:**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "841c67c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoTokenizer\n",
+    "from lmdeploy import pipeline\n",
+    "\n",
+    "model_repoid_or_path = \"internlm/internlm2_5-7b-chat\"\n",
+    "pipe = pipeline(model_repoid_or_path)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_repoid_or_path, trust_remote_code=True)\n",
+    "\n",
+    "# logits\n",
+    "messages = [\n",
+    "    {\"role\": \"user\", \"content\": \"Hello, how are you?\"},\n",
+    "]\n",
+    "input_ids = tokenizer.apply_chat_template(messages)\n",
+    "logits = pipe.get_logits(input_ids)\n",
+    "\n",
+    "# ppl\n",
+    "ppl = pipe.get_ppl(input_ids)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68025eb2",
+   "metadata": {},
+   "source": [
+    "```{note}\n",
+    "get_ppl returns the cross entropy loss without applying the exponential operation afterwards\n",
+    "```\n",
+    "\n",
+    "- **Below is an example for pytorch backend. Please install triton first.**\n",
+    "\n",
+    "```shell\n",
+    "pip install triton>=2.1.0\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b3535249",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig\n",
+    "\n",
+    "backend_config = PytorchEngineConfig(session_len=2048)\n",
+    "gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0.8, max_new_tokens=1024)\n",
+    "pipe = pipeline(\"internlm/internlm2_5-7b-chat\", backend_config=backend_config)\n",
+    "prompts = [\n",
+    "    [{\"role\": \"user\", \"content\": \"Hi, pls intro yourself\"}],\n",
+    "    [{\"role\": \"user\", \"content\": \"Shanghai is\"}],\n",
+    "]\n",
+    "response = pipe(prompts, gen_config=gen_config)\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06883551",
+   "metadata": {},
+   "source": [
+    "- **An example for lora.**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "85d5f9a2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig\n",
+    "\n",
+    "backend_config = PytorchEngineConfig(\n",
+    "    session_len=2048, adapters=dict(lora_name_1=\"chenchi/lora-chatglm2-6b-guodegang\")\n",
+    ")\n",
+    "gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0.8, max_new_tokens=1024)\n",
+    "pipe = pipeline(\"THUDM/chatglm2-6b\", backend_config=backend_config)\n",
+    "prompts = [[{\"role\": \"user\", \"content\": \"您猜怎么着\"}]]\n",
+    "response = pipe(prompts, gen_config=gen_config, adapter_name=\"lora_name_1\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2991899f",
+   "metadata": {},
+   "source": [
+    "## FAQs\n",
+    "\n",
+    "- **RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase**.\n",
+    "\n",
+    "  If you got this for tp>1 in pytorch backend. Please make sure the python script has following\n",
+    "\n",
+    "  ```python\n",
+    "  if __name__ == '__main__':\n",
+    "  ```\n",
+    "\n",
+    "  Generally, in the context of multi-threading or multi-processing, it might be necessary to ensure that initialization code is executed only once. In this case, `if __name__ == '__main__':` can help to ensure that these initialization codes are run only in the main program, and not repeated in each newly created process or thread.\n",
+    "\n",
+    "- To customize a chat template, please refer to [chat_template.md](../advance/chat_template.md).\n",
+    "\n",
+    "- If the weight of lora has a corresponding chat template, you can first register the chat template to lmdeploy, and then use the chat template name as the adapter name."
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  },
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/pipeline.md ADDED Viewed

	@@ -0,0 +1,205 @@

+# Offline Inference Pipeline
+In this tutorial, We will present a list of examples to introduce the usage of `lmdeploy.pipeline`.
+You can overview the detailed pipeline API in [this](https://lmdeploy.readthedocs.io/en/latest/api/pipeline.html) guide.
+## Usage
+- **An example using default parameters:**
+```python
+from lmdeploy import pipeline
+pipe = pipeline('internlm/internlm2_5-7b-chat')
+response = pipe(['Hi, pls intro yourself', 'Shanghai is'])
+print(response)
+```
+In this example, the pipeline by default allocates a predetermined percentage of GPU memory for storing k/v cache. The ratio is dictated by the parameter `TurbomindEngineConfig.cache_max_entry_count`.
+There have been alterations to the strategy for setting the k/v cache ratio throughout the evolution of LMDeploy. The following are the change histories:
+1. `v0.2.0 <= lmdeploy <= v0.2.1`
+   `TurbomindEngineConfig.cache_max_entry_count` defaults to 0.5, indicating 50% GPU **total memory** allocated for k/v cache. Out Of Memory (OOM) errors may occur if a 7B model is deployed on a GPU with memory less than 40G. If you encounter an OOM error, please decrease the ratio of the k/v cache occupation as follows:
+   ```python
+   from lmdeploy import pipeline, TurbomindEngineConfig
+   # decrease the ratio of the k/v cache occupation to 20%
+   backend_config = TurbomindEngineConfig(cache_max_entry_count=0.2)
+   pipe = pipeline('internlm/internlm2_5-7b-chat',
+                   backend_config=backend_config)
+   response = pipe(['Hi, pls intro yourself', 'Shanghai is'])
+   print(response)
+   ```
+2. `lmdeploy > v0.2.1`
+   The allocation strategy for k/v cache is changed to reserve space from the **GPU free memory** proportionally. The ratio `TurbomindEngineConfig.cache_max_entry_count` has been adjusted to 0.8 by default. If OOM error happens, similar to the method mentioned above, please consider reducing the ratio value to decrease the memory usage of the k/v cache.
+- **An example showing how to set tensor parallel num**:
+```python
+from lmdeploy import pipeline, TurbomindEngineConfig
+backend_config = TurbomindEngineConfig(tp=2)
+pipe = pipeline('internlm/internlm2_5-7b-chat',
+                backend_config=backend_config)
+response = pipe(['Hi, pls intro yourself', 'Shanghai is'])
+print(response)
+```
+- **An example for setting sampling parameters:**
+```python
+from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig
+backend_config = TurbomindEngineConfig(tp=2)
+gen_config = GenerationConfig(top_p=0.8,
+                              top_k=40,
+                              temperature=0.8,
+                              max_new_tokens=1024)
+pipe = pipeline('internlm/internlm2_5-7b-chat',
+                backend_config=backend_config)
+response = pipe(['Hi, pls intro yourself', 'Shanghai is'],
+                gen_config=gen_config)
+print(response)
+```
+- **An example for OpenAI format prompt input:**
+```python
+from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig
+backend_config = TurbomindEngineConfig(tp=2)
+gen_config = GenerationConfig(top_p=0.8,
+                              top_k=40,
+                              temperature=0.8,
+                              max_new_tokens=1024)
+pipe = pipeline('internlm/internlm2_5-7b-chat',
+                backend_config=backend_config)
+prompts = [[{
+    'role': 'user',
+    'content': 'Hi, pls intro yourself'
+}], [{
+    'role': 'user',
+    'content': 'Shanghai is'
+}]]
+response = pipe(prompts,
+                gen_config=gen_config)
+print(response)
+```
+- **An example for streaming mode:**
+```python
+from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig
+backend_config = TurbomindEngineConfig(tp=2)
+gen_config = GenerationConfig(top_p=0.8,
+                              top_k=40,
+                              temperature=0.8,
+                              max_new_tokens=1024)
+pipe = pipeline('internlm/internlm2_5-7b-chat',
+                backend_config=backend_config)
+prompts = [[{
+    'role': 'user',
+    'content': 'Hi, pls intro yourself'
+}], [{
+    'role': 'user',
+    'content': 'Shanghai is'
+}]]
+for item in pipe.stream_infer(prompts, gen_config=gen_config):
+    print(item)
+```
+- **An example to cauculate logits & ppl:**
+```python
+from transformers import AutoTokenizer
+from lmdeploy import pipeline
+model_repoid_or_path='internlm/internlm2_5-7b-chat'
+pipe = pipeline(model_repoid_or_path)
+tokenizer = AutoTokenizer.from_pretrained(model_repoid_or_path, trust_remote_code=True)
+# logits
+messages = [
+   {"role": "user", "content": "Hello, how are you?"},
+]
+input_ids = tokenizer.apply_chat_template(messages)
+logits = pipe.get_logits(input_ids)
+# ppl
+ppl = pipe.get_ppl(input_ids)
+```
+```{note}
+get_ppl returns the cross entropy loss without applying the exponential operation afterwards
+```
+- **Below is an example for pytorch backend. Please install triton first.**
+```shell
+pip install triton>=2.1.0
+```
+```python
+from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig
+backend_config = PytorchEngineConfig(session_len=2048)
+gen_config = GenerationConfig(top_p=0.8,
+                              top_k=40,
+                              temperature=0.8,
+                              max_new_tokens=1024)
+pipe = pipeline('internlm/internlm2_5-7b-chat',
+                backend_config=backend_config)
+prompts = [[{
+    'role': 'user',
+    'content': 'Hi, pls intro yourself'
+}], [{
+    'role': 'user',
+    'content': 'Shanghai is'
+}]]
+response = pipe(prompts, gen_config=gen_config)
+print(response)
+```
+- **An example for lora.**
+```python
+from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig
+backend_config = PytorchEngineConfig(session_len=2048,
+                                     adapters=dict(lora_name_1='chenchi/lora-chatglm2-6b-guodegang'))
+gen_config = GenerationConfig(top_p=0.8,
+                              top_k=40,
+                              temperature=0.8,
+                              max_new_tokens=1024)
+pipe = pipeline('THUDM/chatglm2-6b',
+                backend_config=backend_config)
+prompts = [[{
+    'role': 'user',
+    'content': '您猜怎么着'
+}]]
+response = pipe(prompts, gen_config=gen_config, adapter_name='lora_name_1')
+print(response)
+```
+## FAQs
+- **RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase**.
+  If you got this for tp>1 in pytorch backend. Please make sure the python script has following
+  ```python
+  if __name__ == '__main__':
+  ```
+  Generally, in the context of multi-threading or multi-processing, it might be necessary to ensure that initialization code is executed only once. In this case, `if __name__ == '__main__':` can help to ensure that these initialization codes are run only in the main program, and not repeated in each newly created process or thread.
+- To customize a chat template, please refer to [chat_template.md](../advance/chat_template.md).
+- If the weight of lora has a corresponding chat template, you can first register the chat template to lmdeploy, and then use the chat template name as the adapter name.

a_mllm_notebooks/lmdeploy/proxy_server.ipynb ADDED Viewed

	@@ -0,0 +1,248 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "219e5106",
+   "metadata": {},
+   "source": [
+    "# Request Distributor Server\n",
+    "\n",
+    "The request distributor service can parallelize multiple api_server services. Users only need to access the proxy URL, and they can indirectly access different api_server services. The proxy service will automatically distribute requests internally, achieving load balancing.\n",
+    "\n",
+    "## Startup\n",
+    "\n",
+    "Start the proxy service:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy \"min_expected_latency\"\n",
+    "```\n",
+    "\n",
+    "After startup is successful, the URL of the proxy service will also be printed by the script. Access this URL in your browser to open the Swagger UI.\n",
+    "Subsequently, users can add it directly to the proxy service when starting the `api_server` service by using the `--proxy-url` command. For example:\n",
+    "`lmdeploy serve api_server InternLM/internlm2-chat-1_8b --proxy-url http://0.0.0.0:8000`。\n",
+    "In this way, users can access the services of the `api_server` through the proxy node, and the usage of the proxy node is exactly the same as that of the `api_server`, both of which are compatible with the OpenAI format.\n",
+    "\n",
+    "- /v1/models\n",
+    "- /v1/chat/completions\n",
+    "- /v1/completions\n",
+    "\n",
+    "## Node Management\n",
+    "\n",
+    "Through Swagger UI, we can see multiple APIs. Those related to api_server node management include:\n",
+    "\n",
+    "- /nodes/status\n",
+    "- /nodes/add\n",
+    "- /nodes/remove\n",
+    "\n",
+    "They respectively represent viewing all api_server service nodes, adding a certain node, and deleting a certain node.\n",
+    "\n",
+    "### Node Management through curl\n",
+    "\n",
+    "```shell\n",
+    "curl -X 'GET' \\\n",
+    "  'http://localhost:8000/nodes/status' \\\n",
+    "  -H 'accept: application/json'\n",
+    "```\n",
+    "\n",
+    "```shell\n",
+    "curl -X 'POST' \\\n",
+    "  'http://localhost:8000/nodes/add' \\\n",
+    "  -H 'accept: application/json' \\\n",
+    "  -H 'Content-Type: application/json' \\\n",
+    "  -d '{\n",
+    "  \"url\": \"http://0.0.0.0:23333\"\n",
+    "}'\n",
+    "```\n",
+    "\n",
+    "```shell\n",
+    "curl -X 'POST' \\\n",
+    "  'http://localhost:8000/nodes/remove?node_url=http://0.0.0.0:23333' \\\n",
+    "  -H 'accept: application/json' \\\n",
+    "  -d ''\n",
+    "```\n",
+    "\n",
+    "### Node Management through python"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "e4582e32",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'OpenGVLab/InternVL2_5-1B'"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:23333/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "\n",
+    "\n",
+    "server_port = 8080\n",
+    "model_name"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a169c92b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "usage: lmdeploy serve proxy [-h] [--server-name SERVER_NAME] [--server-port SERVER_PORT]\n",
+      "                            [--strategy {random,min_expected_latency,min_observed_latency}]\n",
+      "                            [--api-keys [API_KEYS [API_KEYS ...]]] [--ssl]\n",
+      "\n",
+      "Proxy server that manages distributed api_server nodes.\n",
+      "\n",
+      "optional arguments:\n",
+      "  -h, --help            show this help message and exit\n",
+      "  --server-name SERVER_NAME\n",
+      "                        Host ip for proxy serving. Default: 0.0.0.0. Type: str\n",
+      "  --server-port SERVER_PORT\n",
+      "                        Server port of the proxy. Default: 8000. Type: int\n",
+      "  --strategy {random,min_expected_latency,min_observed_latency}\n",
+      "                        the strategy to dispatch requests to nodes. Default:\n",
+      "                        min_expected_latency. Type: str\n",
+      "  --api-keys [API_KEYS [API_KEYS ...]]\n",
+      "                        Optional list of space separated API keys. Default: None. Type: str\n",
+      "  --ssl                 Enable SSL. Requires OS Environment variables 'SSL_KEYFILE' and\n",
+      "                        'SSL_CERTFILE'. Default: False\n"
+     ]
+    }
+   ],
+   "source": [
+    "!lmdeploy serve proxy --help"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "7f7b4e5b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[32mINFO\u001b[0m:     Started server process [\u001b[36m72836\u001b[0m]\n",
+      "\u001b[32mINFO\u001b[0m:     Waiting for application startup.\n",
+      "\u001b[32mINFO\u001b[0m:     Application startup complete.\n",
+      "\u001b[32mINFO\u001b[0m:     Uvicorn running on \u001b[1mhttp://0.0.0.0:8080\u001b[0m (Press CTRL+C to quit)\n",
+      "^C\n",
+      "\u001b[32mINFO\u001b[0m:     Shutting down\n",
+      "\u001b[32mINFO\u001b[0m:     Waiting for application shutdown.\n",
+      "\u001b[32mINFO\u001b[0m:     Application shutdown complete.\n",
+      "\u001b[32mINFO\u001b[0m:     Finished server process [\u001b[36m72836\u001b[0m]\n"
+     ]
+    }
+   ],
+   "source": [
+    "!lmdeploy serve proxy --server-name 0.0.0.0 --server-port {server_port} --strategy \"min_expected_latency\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e86af8a2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# query all nodes\n",
+    "import requests\n",
+    "url = 'http://localhost:8000/nodes/status'\n",
+    "headers = {'accept': 'application/json'}\n",
+    "response = requests.get(url, headers=headers)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dc76e33b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# add a new node\n",
+    "import requests\n",
+    "url = 'http://localhost:8000/nodes/add'\n",
+    "headers = {\n",
+    "    'accept': 'application/json',\n",
+    "    'Content-Type': 'application/json'\n",
+    "}\n",
+    "data = {\"url\": \"http://0.0.0.0:23333\"}\n",
+    "response = requests.post(url, headers=headers, json=data)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1c675bc1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# delete a node\n",
+    "import requests\n",
+    "url = 'http://localhost:8000/nodes/remove'\n",
+    "headers = {'accept': 'application/json',}\n",
+    "params = {'node_url': 'http://0.0.0.0:23333',}\n",
+    "response = requests.post(url, headers=headers, data='', params=params)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a6bf84e",
+   "metadata": {},
+   "source": [
+    "## Dispatch Strategy\n",
+    "\n",
+    "The current distribution strategies of the proxy service are as follows:\n",
+    "\n",
+    "- random： dispatches based on the ability of each api_server node provided by the user to process requests. The greater the request throughput, the more likely it is to be allocated. Nodes that do not provide throughput are treated according to the average throughput of other nodes.\n",
+    "- min_expected_latency： allocates based on the number of requests currently waiting to be processed on each node, and the throughput capability of each node, calculating the expected time required to complete the response. The shortest one gets allocated. Nodes that do not provide throughput are treated similarly.\n",
+    "- min_observed_latency： allocates based on the average time required to handle a certain number of past requests on each node. The one with the shortest time gets allocated."
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  },
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/proxy_server.md ADDED Viewed

	@@ -0,0 +1,97 @@

+# Request Distributor Server
+The request distributor service can parallelize multiple api_server services. Users only need to access the proxy URL, and they can indirectly access different api_server services. The proxy service will automatically distribute requests internally, achieving load balancing.
+## Startup
+Start the proxy service:
+```shell
+lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy "min_expected_latency"
+```
+After startup is successful, the URL of the proxy service will also be printed by the script. Access this URL in your browser to open the Swagger UI.
+Subsequently, users can add it directly to the proxy service when starting the `api_server` service by using the `--proxy-url` command. For example:
+`lmdeploy serve api_server InternLM/internlm2-chat-1_8b --proxy-url http://0.0.0.0:8000`。
+In this way, users can access the services of the `api_server` through the proxy node, and the usage of the proxy node is exactly the same as that of the `api_server`, both of which are compatible with the OpenAI format.
+- /v1/models
+- /v1/chat/completions
+- /v1/completions
+## Node Management
+Through Swagger UI, we can see multiple APIs. Those related to api_server node management include:
+- /nodes/status
+- /nodes/add
+- /nodes/remove
+They respectively represent viewing all api_server service nodes, adding a certain node, and deleting a certain node.
+### Node Management through curl
+```shell
+curl -X 'GET' \
+  'http://localhost:8000/nodes/status' \
+  -H 'accept: application/json'
+```
+```shell
+curl -X 'POST' \
+  'http://localhost:8000/nodes/add' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "url": "http://0.0.0.0:23333"
+}'
+```
+```shell
+curl -X 'POST' \
+  'http://localhost:8000/nodes/remove?node_url=http://0.0.0.0:23333' \
+  -H 'accept: application/json' \
+  -d ''
+```
+### Node Management through python
+```python
+# query all nodes
+import requests
+url = 'http://localhost:8000/nodes/status'
+headers = {'accept': 'application/json'}
+response = requests.get(url, headers=headers)
+print(response.text)
+```
+```python
+# add a new node
+import requests
+url = 'http://localhost:8000/nodes/add'
+headers = {
+    'accept': 'application/json',
+    'Content-Type': 'application/json'
+}
+data = {"url": "http://0.0.0.0:23333"}
+response = requests.post(url, headers=headers, json=data)
+print(response.text)
+```
+```python
+# delete a node
+import requests
+url = 'http://localhost:8000/nodes/remove'
+headers = {'accept': 'application/json',}
+params = {'node_url': 'http://0.0.0.0:23333',}
+response = requests.post(url, headers=headers, data='', params=params)
+print(response.text)
+```
+## Dispatch Strategy
+The current distribution strategies of the proxy service are as follows:
+- random： dispatches based on the ability of each api_server node provided by the user to process requests. The greater the request throughput, the more likely it is to be allocated. Nodes that do not provide throughput are treated according to the average throughput of other nodes.
+- min_expected_latency： allocates based on the number of requests currently waiting to be processed on each node, and the throughput capability of each node, calculating the expected time required to complete the response. The shortest one gets allocated. Nodes that do not provide throughput are treated similarly.
+- min_observed_latency： allocates based on the average time required to handle a certain number of past requests on each node. The one with the shortest time gets allocated.

a_mllm_notebooks/lmdeploy/pytorch_new_model.ipynb ADDED Viewed

	@@ -0,0 +1,261 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8e7fb7ca",
+   "metadata": {},
+   "source": [
+    "# lmdeploy.pytorch New Model Support\n",
+    "\n",
+    "lmdeploy.pytorch is designed to simplify the support for new models and the development of prototypes. Users can adapt new models according to their own needs.\n",
+    "\n",
+    "## Model Support\n",
+    "\n",
+    "### Configuration Loading (Optional)\n",
+    "\n",
+    "lmdeploy.pytorch initializes the engine based on the model's config file. If the parameter naming of the model to be integrated differs from common models in transformers, parsing errors may occur. A custom ConfigBuilder can be added to parse the configuration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e2aaf0c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# lmdeploy/pytorch/configurations/gemma.py\n",
+    "\n",
+    "from lmdeploy.pytorch.config import ModelConfig\n",
+    "\n",
+    "from .builder import AutoModelConfigBuilder\n",
+    "\n",
+    "\n",
+    "class GemmaModelConfigBuilder(AutoModelConfigBuilder):\n",
+    "\n",
+    "    @classmethod\n",
+    "    def condition(cls, hf_config):\n",
+    "        # Check if hf_config is suitable for this builder\n",
+    "        return hf_config.model_type in ['gemma', 'gemma2']\n",
+    "\n",
+    "    @classmethod\n",
+    "    def build(cls, hf_config, model_path: str = None):\n",
+    "        # Use the hf_config loaded by transformers\n",
+    "        # Construct the ModelConfig for the pytorch engine\n",
+    "        return ModelConfig(hidden_size=hf_config.hidden_size,\n",
+    "                           num_layers=hf_config.num_hidden_layers,\n",
+    "                           num_attention_heads=hf_config.num_attention_heads,\n",
+    "                           num_key_value_heads=hf_config.num_key_value_heads,\n",
+    "                           bos_token_id=hf_config.bos_token_id,\n",
+    "                           eos_token_id=hf_config.eos_token_id,\n",
+    "                           head_dim=hf_config.head_dim,\n",
+    "                           vocab_size=hf_config.vocab_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5493f54",
+   "metadata": {},
+   "source": [
+    "The `lmdeploy.pytorch.check_env.check_model` function can be used to verify if the configuration can be parsed correctly.\n",
+    "\n",
+    "### Implementing the Model\n",
+    "\n",
+    "After ensuring that the configuration can be parsed correctly, you can start implementing the model logic. Taking the implementation of llama as an example, we need to create the model using the configuration file from transformers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e49b0483",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LlamaForCausalLM(nn.Module):\n",
+    "\n",
+    "    # Constructor, builds the model with the given config\n",
+    "    # ctx_mgr is the context manager, which can be used to pass engine configurations or additional parameters\n",
+    "    def __init__(self,\n",
+    "                 config: LlamaConfig,\n",
+    "                 ctx_mgr: StepContextManager,\n",
+    "                 dtype: torch.dtype = None,\n",
+    "                 device: torch.device = None):\n",
+    "        super().__init__()\n",
+    "        self.config = config\n",
+    "        self.ctx_mgr = ctx_mgr\n",
+    "        # build LLamaModel\n",
+    "        self.model = LlamaModel(config, dtype=dtype, device=device)\n",
+    "        # build lm_head\n",
+    "        self.lm_head = build_rowwise_linear(config.hidden_size,\n",
+    "                                            config.vocab_size,\n",
+    "                                            bias=False,\n",
+    "                                            dtype=dtype,\n",
+    "                                            device=device)\n",
+    "\n",
+    "    # Model inference function\n",
+    "    # It is recommended to use the same parameters as below\n",
+    "    def forward(\n",
+    "        self,\n",
+    "        input_ids: torch.Tensor,\n",
+    "        position_ids: torch.Tensor,\n",
+    "        past_key_values: List[List[torch.Tensor]],\n",
+    "        attn_metadata: Any = None,\n",
+    "        inputs_embeds: torch.Tensor = None,\n",
+    "        **kwargs,\n",
+    "    ):\n",
+    "        hidden_states = self.model(\n",
+    "            input_ids=input_ids,\n",
+    "            position_ids=position_ids,\n",
+    "            past_key_values=past_key_values,\n",
+    "            attn_metadata=attn_metadata,\n",
+    "            inputs_embeds=inputs_embeds,\n",
+    "        )\n",
+    "\n",
+    "        logits = self.lm_head(hidden_states)\n",
+    "        logits = logits.float()\n",
+    "        return logits"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce1f7780",
+   "metadata": {},
+   "source": [
+    "In addition to these, the following content needs to be added:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b240132b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LlamaForCausalLM(nn.Module):\n",
+    "\n",
+    "    ...\n",
+    "\n",
+    "    # Indicates whether the model supports cudagraph\n",
+    "    # Can be a callable object, receiving forward inputs\n",
+    "    # Dynamically determines if cudagraph is supported\n",
+    "    support_cuda_graph = True\n",
+    "\n",
+    "    # Builds model inputs\n",
+    "    # Returns a dictionary, the keys of which must be inputs to forward\n",
+    "    def prepare_inputs_for_generation(\n",
+    "        self,\n",
+    "        past_key_values: List[List[torch.Tensor]],\n",
+    "        inputs_embeds: Optional[torch.Tensor] = None,\n",
+    "        context: StepContext = None,\n",
+    "    ):\n",
+    "        ...\n",
+    "\n",
+    "    # Loads weights\n",
+    "    # The model's inputs are key-value pairs of the state dict\n",
+    "    def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):\n",
+    "        ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffa21e47",
+   "metadata": {},
+   "source": [
+    "We have encapsulated many fused operators to simplify the model construction. These operators better support various functions such as tensor parallelism and quantization. We encourage developers to use these ops as much as possible."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "70668060",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Using predefined build_merged_colwise_linear, SiluAndMul, build_rowwise_linear\n",
+    "# Helps us build the model faster and without worrying about tensor concurrency, quantization, etc.\n",
+    "class LlamaMLP(nn.Module):\n",
+    "\n",
+    "    def __init__(self,\n",
+    "                 config: LlamaConfig,\n",
+    "                 dtype: torch.dtype = None,\n",
+    "                 device: torch.device = None):\n",
+    "        super().__init__()\n",
+    "        quantization_config = getattr(config, 'quantization_config', None)\n",
+    "        # gate up\n",
+    "        self.gate_up_proj = build_merged_colwise_linear(\n",
+    "            config.hidden_size,\n",
+    "            [config.intermediate_size, config.intermediate_size],\n",
+    "            bias=config.mlp_bias,\n",
+    "            dtype=dtype,\n",
+    "            device=device,\n",
+    "            quant_config=quantization_config,\n",
+    "            is_tp=True,\n",
+    "        )\n",
+    "\n",
+    "        # silu and mul\n",
+    "        self.act_fn = SiluAndMul(inplace=True)\n",
+    "\n",
+    "        # down\n",
+    "        self.down_proj = build_rowwise_linear(config.intermediate_size,\n",
+    "                                              config.hidden_size,\n",
+    "                                              bias=config.mlp_bias,\n",
+    "                                              quant_config=quantization_config,\n",
+    "                                              dtype=dtype,\n",
+    "                                              device=device,\n",
+    "                                              is_tp=True)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        \"\"\"forward.\"\"\"\n",
+    "        gate_up = self.gate_up_proj(x)\n",
+    "        act = self.act_fn(gate_up)\n",
+    "        return self.down_proj(act)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1701d22",
+   "metadata": {},
+   "source": [
+    "### Model Registration\n",
+    "\n",
+    "To ensure that the developed model implementation can be used normally, we also need to register the model in `lmdeploy/pytorch/models/module_map.py`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "966830a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "MODULE_MAP.update({\n",
+    "    'LlamaForCausalLM':\n",
+    "    f'{LMDEPLOY_PYTORCH_MODEL_PATH}.llama.LlamaForCausalLM',\n",
+    "})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5eee6ab5",
+   "metadata": {},
+   "source": [
+    "If you do not wish to modify the model source code, you can also pass a custom module map from the outside, making it easier to integrate into other projects.\n",
+    "\n",
+    "```\n",
+    "from lmdeploy import PytorchEngineConfig, pipeline\n",
+    "\n",
+    "backend_config = PytorchEngineConfig(custom_module_map='/path/to/custom/module_map.py')\n",
+    "generator = pipeline(model_path, backend_config=backend_config)\n",
+    "```"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/pytorch_new_model.md ADDED Viewed

	@@ -0,0 +1,181 @@

+# lmdeploy.pytorch New Model Support
+lmdeploy.pytorch is designed to simplify the support for new models and the development of prototypes. Users can adapt new models according to their own needs.
+## Model Support
+### Configuration Loading (Optional)
+lmdeploy.pytorch initializes the engine based on the model's config file. If the parameter naming of the model to be integrated differs from common models in transformers, parsing errors may occur. A custom ConfigBuilder can be added to parse the configuration.
+```python
+# lmdeploy/pytorch/configurations/gemma.py
+from lmdeploy.pytorch.config import ModelConfig
+from .builder import AutoModelConfigBuilder
+class GemmaModelConfigBuilder(AutoModelConfigBuilder):
+    @classmethod
+    def condition(cls, hf_config):
+        # Check if hf_config is suitable for this builder
+        return hf_config.model_type in ['gemma', 'gemma2']
+    @classmethod
+    def build(cls, hf_config, model_path: str = None):
+        # Use the hf_config loaded by transformers
+        # Construct the ModelConfig for the pytorch engine
+        return ModelConfig(hidden_size=hf_config.hidden_size,
+                           num_layers=hf_config.num_hidden_layers,
+                           num_attention_heads=hf_config.num_attention_heads,
+                           num_key_value_heads=hf_config.num_key_value_heads,
+                           bos_token_id=hf_config.bos_token_id,
+                           eos_token_id=hf_config.eos_token_id,
+                           head_dim=hf_config.head_dim,
+                           vocab_size=hf_config.vocab_size)
+```
+The `lmdeploy.pytorch.check_env.check_model` function can be used to verify if the configuration can be parsed correctly.
+### Implementing the Model
+After ensuring that the configuration can be parsed correctly, you can start implementing the model logic. Taking the implementation of llama as an example, we need to create the model using the configuration file from transformers.
+```python
+class LlamaForCausalLM(nn.Module):
+    # Constructor, builds the model with the given config
+    # ctx_mgr is the context manager, which can be used to pass engine configurations or additional parameters
+    def __init__(self,
+                 config: LlamaConfig,
+                 ctx_mgr: StepContextManager,
+                 dtype: torch.dtype = None,
+                 device: torch.device = None):
+        super().__init__()
+        self.config = config
+        self.ctx_mgr = ctx_mgr
+        # build LLamaModel
+        self.model = LlamaModel(config, dtype=dtype, device=device)
+        # build lm_head
+        self.lm_head = build_rowwise_linear(config.hidden_size,
+                                            config.vocab_size,
+                                            bias=False,
+                                            dtype=dtype,
+                                            device=device)
+    # Model inference function
+    # It is recommended to use the same parameters as below
+    def forward(
+        self,
+        input_ids: torch.Tensor,
+        position_ids: torch.Tensor,
+        past_key_values: List[List[torch.Tensor]],
+        attn_metadata: Any = None,
+        inputs_embeds: torch.Tensor = None,
+        **kwargs,
+    ):
+        hidden_states = self.model(
+            input_ids=input_ids,
+            position_ids=position_ids,
+            past_key_values=past_key_values,
+            attn_metadata=attn_metadata,
+            inputs_embeds=inputs_embeds,
+        )
+        logits = self.lm_head(hidden_states)
+        logits = logits.float()
+        return logits
+```
+In addition to these, the following content needs to be added:
+```python
+class LlamaForCausalLM(nn.Module):
+    ...
+    # Indicates whether the model supports cudagraph
+    # Can be a callable object, receiving forward inputs
+    # Dynamically determines if cudagraph is supported
+    support_cuda_graph = True
+    # Builds model inputs
+    # Returns a dictionary, the keys of which must be inputs to forward
+    def prepare_inputs_for_generation(
+        self,
+        past_key_values: List[List[torch.Tensor]],
+        inputs_embeds: Optional[torch.Tensor] = None,
+        context: StepContext = None,
+    ):
+        ...
+    # Loads weights
+    # The model's inputs are key-value pairs of the state dict
+    def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
+        ...
+```
+We have encapsulated many fused operators to simplify the model construction. These operators better support various functions such as tensor parallelism and quantization. We encourage developers to use these ops as much as possible.
+```python
+# Using predefined build_merged_colwise_linear, SiluAndMul, build_rowwise_linear
+# Helps us build the model faster and without worrying about tensor concurrency, quantization, etc.
+class LlamaMLP(nn.Module):
+    def __init__(self,
+                 config: LlamaConfig,
+                 dtype: torch.dtype = None,
+                 device: torch.device = None):
+        super().__init__()
+        quantization_config = getattr(config, 'quantization_config', None)
+        # gate up
+        self.gate_up_proj = build_merged_colwise_linear(
+            config.hidden_size,
+            [config.intermediate_size, config.intermediate_size],
+            bias=config.mlp_bias,
+            dtype=dtype,
+            device=device,
+            quant_config=quantization_config,
+            is_tp=True,
+        )
+        # silu and mul
+        self.act_fn = SiluAndMul(inplace=True)
+        # down
+        self.down_proj = build_rowwise_linear(config.intermediate_size,
+                                              config.hidden_size,
+                                              bias=config.mlp_bias,
+                                              quant_config=quantization_config,
+                                              dtype=dtype,
+                                              device=device,
+                                              is_tp=True)
+    def forward(self, x):
+        """forward."""
+        gate_up = self.gate_up_proj(x)
+        act = self.act_fn(gate_up)
+        return self.down_proj(act)
+```
+### Model Registration
+To ensure that the developed model implementation can be used normally, we also need to register the model in `lmdeploy/pytorch/models/module_map.py`
+```python
+MODULE_MAP.update({
+    'LlamaForCausalLM':
+    f'{LMDEPLOY_PYTORCH_MODEL_PATH}.llama.LlamaForCausalLM',
+})
+```
+If you do not wish to modify the model source code, you can also pass a custom module map from the outside, making it easier to integrate into other projects.
+```
+from lmdeploy import PytorchEngineConfig, pipeline
+backend_config = PytorchEngineConfig(custom_module_map='/path/to/custom/module_map.py')
+generator = pipeline(model_path, backend_config=backend_config)
+```

a_mllm_notebooks/lmdeploy/tiger.jpeg ADDED Viewed

a_mllm_notebooks/lmdeploy/turbomind.ipynb ADDED Viewed

	@@ -0,0 +1,88 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0b726f14",
+   "metadata": {},
+   "source": [
+    "# Architecture of TurboMind\n",
+    "\n",
+    "TurboMind is an inference engine that supports high throughput inference for conversational LLMs. It's based on NVIDIA's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer). Major features of TurboMind include an efficient LLaMa implementation, the persistent batch inference model and an extendable KV cache manager.\n",
+    "\n",
+    "## High level overview of TurboMind\n",
+    "\n",
+    "```\n",
+    "  +--------------------+\n",
+    "  |        API         |\n",
+    "  +--------------------+\n",
+    "          |    ^\n",
+    "  request |    | stream callback\n",
+    "          v    |\n",
+    "  +--------------------+   fetch   +-------------------+\n",
+    "  |  Persistent Batch  | <-------> |  KV Cache Manager |\n",
+    "  +--------------------+   update  +-------------------+\n",
+    "             ^\n",
+    "             |\n",
+    "             v\n",
+    "+------------------------+\n",
+    "|  LLaMA implementation  |\n",
+    "+------------------------+\n",
+    "| FT kernels & utilities |\n",
+    "+------------------------+\n",
+    "```\n",
+    "\n",
+    "## Persistent Batch\n",
+    "\n",
+    "You may recognize this feature as \"continuous batching\" in other repos. But during the concurrent development of the feature, we modeled the inference of a conversational LLM as a persistently running batch whose lifetime spans the entire serving process, hence the name \"persistent batch\". To put it simply\n",
+    "\n",
+    "- The persistent batch as N pre-configured batch slots.\n",
+    "- Requests join the batch when there are free slots available. A batch slot is released and can be reused once the generation of the requested tokens is finished.\n",
+    "- __On cache-hits (see below), history tokens don't need to be decoded in every round of a conversation; generation of response tokens will start instantly.__\n",
+    "- The batch grows or shrinks automatically to minimize unnecessary computations.\n",
+    "\n",
+    "## KV Cache Manager\n",
+    "\n",
+    "The [KV cache manager](https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/models/llama/SequenceManager.h) of TurboMind is a memory-pool-liked object that also implements LRU policy so that it can be viewed as a form of __cache of KV caches__. It works in the following way\n",
+    "\n",
+    "- All device memory required for KV cache is allocated by the manager. A fixed number of slots is pre-configured to match the memory size of the system. Each slot corresponds to the memory required by the KV cache of a single sequence. Allocation chunk-size can be configure to implement pre-allocate/on-demand style allocation policy (or something in-between).\n",
+    "- When space for the KV cache of a new sequence is requested but no free slots left in the pool, the least recently used sequence is evicted from the cache and its device memory is directly reused by the new sequence. However, this is not the end of the story.\n",
+    "- Fetching sequence currently resides in one of the slots resembles a _cache-hit_, the history KV cache is returned directly and no context decoding is needed.\n",
+    "- Victim (evicted) sequences are not erased entirely but converted to its most compact form, i.e. token IDs. When the same sequence id is fetched later (_cache-miss_) the token IDs will be decoded by FMHA backed context decoder and converted back to KV cache.\n",
+    "- The eviction and conversion are handled automatically inside TurboMind and thus transparent to the users. __From the user's aspect, system that use TurboMind has access to infinite device memory.__\n",
+    "\n",
+    "## LLaMa implementation\n",
+    "\n",
+    "Our implementation of the LLaMa family models is modified from Gpt-NeoX model in FasterTransformer. In addition to basic refactoring and modifications to support the LLaMa family, we made some improvements to enable high performance inference of conversational models, most importantly:\n",
+    "\n",
+    "- To support fast context decoding in multi-round conversations. We replaced the attention implementation in context decoder with a [cutlass](https://github.com/NVIDIA/cutlass)-based FMHA implementation that supports mismatched Q/K lengths.\n",
+    "- We introduced indirect buffer pointers in both context FMHA and generation FMHA to support the discontinuity in KV cache within the batch.\n",
+    "- To support concurrent inference with persistent batch, new synchronization mechanism was designed to orchestrate the worker threads running in tensor parallel mode.\n",
+    "- To maximize the throughput, we implement INT8 KV cache support to increase the max batch size. It's effective because in real-world serving scenarios, KV cache costs more memory and consumes more memory bandwidth than weights or other activations.\n",
+    "- We resolved an NCCL hang issue when running multiple model instances in TP mode within a single process, NCCL APIs are now guarded by host-side synchronization barriers.\n",
+    "\n",
+    "## API\n",
+    "\n",
+    "TurboMind supports a Python API that enables streaming output and tensor parallel mode.\n",
+    "\n",
+    "## Difference between FasterTransformer and TurboMind\n",
+    "\n",
+    "Apart of the features described above, there are still many minor differences that we don't cover in this document. Notably, many capabilities of FT are dropped in TurboMind because of the difference in objectives (e.g. prefix prompt, beam search, context embedding, sparse GEMM, GPT/T5/other model families, etc)\n",
+    "\n",
+    "## FAQ\n",
+    "\n",
+    "### Supporting Huggingface models\n",
+    "\n",
+    "For historical reasons, TurboMind's weight layout is based on [the original LLaMa implementation](https://github.com/facebookresearch/llama) (differ only by a transpose). The implementation in huggingface transformers uses a [different layout](https://github.com/huggingface/transformers/blob/45025d92f815675e483f32812caa28cce3a960e7/src/transformers/models/llama/convert_llama_weights_to_hf.py#L123C76-L123C76) for `W_q` and `W_k` which is handled in [deploy.py](https://github.com/InternLM/lmdeploy/blob/ff4648a1d09e5aec74cf70efef35bfaeeac552e0/lmdeploy/serve/turbomind/deploy.py#L398)."
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/turbomind.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# Architecture of TurboMind
+TurboMind is an inference engine that supports high throughput inference for conversational LLMs. It's based on NVIDIA's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer). Major features of TurboMind include an efficient LLaMa implementation, the persistent batch inference model and an extendable KV cache manager.
+## High level overview of TurboMind
+```
+  +--------------------+
+  |        API         |
+  +--------------------+
+          |    ^
+  request |    | stream callback
+          v    |
+  +--------------------+   fetch   +-------------------+
+  |  Persistent Batch  | <-------> |  KV Cache Manager |
+  +--------------------+   update  +-------------------+
+             ^
+             |
+             v
++------------------------+
+|  LLaMA implementation  |
++------------------------+
+| FT kernels & utilities |
++------------------------+
+```
+## Persistent Batch
+You may recognize this feature as "continuous batching" in other repos. But during the concurrent development of the feature, we modeled the inference of a conversational LLM as a persistently running batch whose lifetime spans the entire serving process, hence the name "persistent batch". To put it simply
+- The persistent batch as N pre-configured batch slots.
+- Requests join the batch when there are free slots available. A batch slot is released and can be reused once the generation of the requested tokens is finished.
+- __On cache-hits (see below), history tokens don't need to be decoded in every round of a conversation; generation of response tokens will start instantly.__
+- The batch grows or shrinks automatically to minimize unnecessary computations.
+## KV Cache Manager
+The [KV cache manager](https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/models/llama/SequenceManager.h) of TurboMind is a memory-pool-liked object that also implements LRU policy so that it can be viewed as a form of __cache of KV caches__. It works in the following way
+- All device memory required for KV cache is allocated by the manager. A fixed number of slots is pre-configured to match the memory size of the system. Each slot corresponds to the memory required by the KV cache of a single sequence. Allocation chunk-size can be configure to implement pre-allocate/on-demand style allocation policy (or something in-between).
+- When space for the KV cache of a new sequence is requested but no free slots left in the pool, the least recently used sequence is evicted from the cache and its device memory is directly reused by the new sequence. However, this is not the end of the story.
+- Fetching sequence currently resides in one of the slots resembles a _cache-hit_, the history KV cache is returned directly and no context decoding is needed.
+- Victim (evicted) sequences are not erased entirely but converted to its most compact form, i.e. token IDs. When the same sequence id is fetched later (_cache-miss_) the token IDs will be decoded by FMHA backed context decoder and converted back to KV cache.
+- The eviction and conversion are handled automatically inside TurboMind and thus transparent to the users. __From the user's aspect, system that use TurboMind has access to infinite device memory.__
+## LLaMa implementation
+Our implementation of the LLaMa family models is modified from Gpt-NeoX model in FasterTransformer. In addition to basic refactoring and modifications to support the LLaMa family, we made some improvements to enable high performance inference of conversational models, most importantly:
+- To support fast context decoding in multi-round conversations. We replaced the attention implementation in context decoder with a [cutlass](https://github.com/NVIDIA/cutlass)-based FMHA implementation that supports mismatched Q/K lengths.
+- We introduced indirect buffer pointers in both context FMHA and generation FMHA to support the discontinuity in KV cache within the batch.
+- To support concurrent inference with persistent batch, new synchronization mechanism was designed to orchestrate the worker threads running in tensor parallel mode.
+- To maximize the throughput, we implement INT8 KV cache support to increase the max batch size. It's effective because in real-world serving scenarios, KV cache costs more memory and consumes more memory bandwidth than weights or other activations.
+- We resolved an NCCL hang issue when running multiple model instances in TP mode within a single process, NCCL APIs are now guarded by host-side synchronization barriers.
+## API
+TurboMind supports a Python API that enables streaming output and tensor parallel mode.
+## Difference between FasterTransformer and TurboMind
+Apart of the features described above, there are still many minor differences that we don't cover in this document. Notably, many capabilities of FT are dropped in TurboMind because of the difference in objectives (e.g. prefix prompt, beam search, context embedding, sparse GEMM, GPT/T5/other model families, etc)
+## FAQ
+### Supporting Huggingface models
+For historical reasons, TurboMind's weight layout is based on [the original LLaMa implementation](https://github.com/facebookresearch/llama) (differ only by a transpose). The implementation in huggingface transformers uses a [different layout](https://github.com/huggingface/transformers/blob/45025d92f815675e483f32812caa28cce3a960e7/src/transformers/models/llama/convert_llama_weights_to_hf.py#L123C76-L123C76) for `W_q` and `W_k` which is handled in [deploy.py](https://github.com/InternLM/lmdeploy/blob/ff4648a1d09e5aec74cf70efef35bfaeeac552e0/lmdeploy/serve/turbomind/deploy.py#L398).

a_mllm_notebooks/lmdeploy/w4a16.ipynb ADDED Viewed

	@@ -0,0 +1,174 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "76ea6484",
+   "metadata": {},
+   "source": [
+    "# AWQ/GPTQ\n",
+    "\n",
+    "LMDeploy TurboMind engine supports the inference of 4bit quantized models that are quantized both by [AWQ](https://arxiv.org/abs/2306.00978) and [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ), but its quantization module only supports the AWQ quantization algorithm.\n",
+    "\n",
+    "The following NVIDIA GPUs are available for AWQ/GPTQ INT4 inference:\n",
+    "\n",
+    "- V100(sm70): V100\n",
+    "- Turing(sm75): 20 series, T4\n",
+    "- Ampere(sm80,sm86): 30 series, A10, A16, A30, A100\n",
+    "- Ada Lovelace(sm89): 40 series\n",
+    "\n",
+    "Before proceeding with the quantization and inference, please ensure that lmdeploy is installed by following the [installation guide](../get_started/installation.md)\n",
+    "\n",
+    "The remainder of this article is structured into the following sections:\n",
+    "\n",
+    "<!-- toc -->\n",
+    "\n",
+    "- [Quantization](#quantization)\n",
+    "- [Evaluation](#evaluation)\n",
+    "- [Inference](#inference)\n",
+    "- [Service](#service)\n",
+    "- [Performance](#performance)\n",
+    "\n",
+    "<!-- tocstop -->\n",
+    "\n",
+    "## Quantization\n",
+    "\n",
+    "A single command execution is all it takes to quantize the model. The resulting quantized weights are then stored in the $WORK_DIR directory.\n",
+    "\n",
+    "```shell\n",
+    "export HF_MODEL=internlm/internlm2_5-7b-chat\n",
+    "export WORK_DIR=internlm/internlm2_5-7b-chat-4bit\n",
+    "\n",
+    "lmdeploy lite auto_awq \\\n",
+    "   $HF_MODEL \\\n",
+    "  --calib-dataset 'ptb' \\\n",
+    "  --calib-samples 128 \\\n",
+    "  --calib-seqlen 2048 \\\n",
+    "  --w-bits 4 \\\n",
+    "  --w-group-size 128 \\\n",
+    "  --batch-size 1 \\\n",
+    "  --work-dir $WORK_DIR\n",
+    "```\n",
+    "\n",
+    "Typically, the above command doesn't require filling in optional parameters, as the defaults usually suffice. For instance, when quantizing the [internlm/internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat) model, the command can be condensed as:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy lite auto_awq internlm/internlm2_5-7b-chat --work-dir internlm2_5-7b-chat-4bit\n",
+    "```\n",
+    "\n",
+    "**Note:**\n",
+    "\n",
+    "- We recommend that you specify the --work-dir parameter, including the model name as demonstrated in the example above. This facilitates LMDeploy in fuzzy matching the --work-dir with an appropriate built-in chat template. Otherwise, you will have to designate the chat template during inference.\n",
+    "- If the quantized model’s accuracy is compromised, it is recommended to enable --search-scale for re-quantization and increase the --batch-size, for example, to 8. When search_scale is enabled, the quantization process will take more time. The --batch-size affects the amount of memory used, which can be adjusted according to actual conditions as needed.\n",
+    "\n",
+    "Upon completing quantization, you can engage with the model efficiently using a variety of handy tools.\n",
+    "For example, you can initiate a conversation with it via the command line:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy chat ./internlm2_5-7b-chat-4bit --model-format awq\n",
+    "```\n",
+    "\n",
+    "Alternatively, you can start the gradio server and interact with the model through the web at `http://{ip_addr}:{port`\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve gradio ./internlm2_5-7b-chat-4bit --server_name {ip_addr} --server_port {port} --model-format awq\n",
+    "```\n",
+    "\n",
+    "## Evaluation\n",
+    "\n",
+    "Please refer to [OpenCompass](https://opencompass.readthedocs.io/en/latest/index.html) about model evaluation with LMDeploy. Here is the [guide](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lmdeploy.html)\n",
+    "\n",
+    "## Inference\n",
+    "\n",
+    "Trying the following codes, you can perform the batched offline inference with the quantized model:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4ee45c86",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from lmdeploy import pipeline, TurbomindEngineConfig\n",
+    "engine_config = TurbomindEngineConfig(model_format='awq')\n",
+    "pipe = pipeline(\"./internlm2_5-7b-chat-4bit\", backend_config=engine_config)\n",
+    "response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"])\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0743ccb8",
+   "metadata": {},
+   "source": [
+    "For more information about the pipeline parameters, please refer to [here](../llm/pipeline.md).\n",
+    "\n",
+    "In addition to performing inference with the quantized model on localhost, LMDeploy can also execute inference for the 4bit quantized model derived from AWQ algorithm available on Huggingface Hub, such as models from the [lmdeploy space](https://huggingface.co/lmdeploy) and [TheBloke space](https://huggingface.co/TheBloke)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a522e026",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# inference with models from lmdeploy space\n",
+    "from lmdeploy import pipeline, TurbomindEngineConfig\n",
+    "pipe = pipeline(\"lmdeploy/llama2-chat-70b-4bit\",\n",
+    "                backend_config=TurbomindEngineConfig(model_format='awq', tp=4))\n",
+    "response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"])\n",
+    "print(response)\n",
+    "\n",
+    "# inference with models from thebloke space\n",
+    "from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig\n",
+    "pipe = pipeline(\"TheBloke/LLaMA2-13B-Tiefighter-AWQ\",\n",
+    "                backend_config=TurbomindEngineConfig(model_format='awq'),\n",
+    "                chat_template_config=ChatTemplateConfig(model_name='llama2')\n",
+    "                )\n",
+    "response = pipe([\"Hi, pls intro yourself\", \"Shanghai is\"])\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a75d4d9e",
+   "metadata": {},
+   "source": [
+    "## Service\n",
+    "\n",
+    "LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve api_server ./internlm2_5-7b-chat-4bit --backend turbomind --model-format awq\n",
+    "```\n",
+    "\n",
+    "The default port of `api_server` is `23333`. After the server is launched, you can communicate with server on terminal through `api_client`:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy serve api_client http://0.0.0.0:23333\n",
+    "```\n",
+    "\n",
+    "You can overview and try out `api_server` APIs online by swagger UI at `http://0.0.0.0:23333`, or you can also read the API specification from [here](../llm/api_server.md).\n",
+    "\n",
+    "## Performance\n",
+    "\n",
+    "We benchmarked the Llama-2-7B-chat and Llama-2-13B-chat models with 4-bit quantization on NVIDIA GeForce RTX 4090 using [profile_generation.py](https://github.com/InternLM/lmdeploy/blob/main/benchmark/profile_generation.py). And we measure the token generation throughput (tokens/s) by setting a single prompt token and generating 512 tokens. All the results are measured for single batch inference.\n",
+    "\n",
+    "| model            | llm-awq | mlc-llm | turbomind |\n",
+    "| ---------------- | ------- | ------- | --------- |\n",
+    "| Llama-2-7B-chat  | 112.9   | 159.4   | 206.4     |\n",
+    "| Llama-2-13B-chat | N/A     | 90.7    | 115.8     |"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/w4a16.md ADDED Viewed

	@@ -0,0 +1,130 @@

+# AWQ/GPTQ
+LMDeploy TurboMind engine supports the inference of 4bit quantized models that are quantized both by [AWQ](https://arxiv.org/abs/2306.00978) and [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ), but its quantization module only supports the AWQ quantization algorithm.
+The following NVIDIA GPUs are available for AWQ/GPTQ INT4 inference:
+- V100(sm70): V100
+- Turing(sm75): 20 series, T4
+- Ampere(sm80,sm86): 30 series, A10, A16, A30, A100
+- Ada Lovelace(sm89): 40 series
+Before proceeding with the quantization and inference, please ensure that lmdeploy is installed by following the [installation guide](../get_started/installation.md)
+The remainder of this article is structured into the following sections:
+<!-- toc -->
+- [Quantization](#quantization)
+- [Evaluation](#evaluation)
+- [Inference](#inference)
+- [Service](#service)
+- [Performance](#performance)
+<!-- tocstop -->
+## Quantization
+A single command execution is all it takes to quantize the model. The resulting quantized weights are then stored in the $WORK_DIR directory.
+```shell
+export HF_MODEL=internlm/internlm2_5-7b-chat
+export WORK_DIR=internlm/internlm2_5-7b-chat-4bit
+lmdeploy lite auto_awq \
+   $HF_MODEL \
+  --calib-dataset 'ptb' \
+  --calib-samples 128 \
+  --calib-seqlen 2048 \
+  --w-bits 4 \
+  --w-group-size 128 \
+  --batch-size 1 \
+  --work-dir $WORK_DIR
+```
+Typically, the above command doesn't require filling in optional parameters, as the defaults usually suffice. For instance, when quantizing the [internlm/internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat) model, the command can be condensed as:
+```shell
+lmdeploy lite auto_awq internlm/internlm2_5-7b-chat --work-dir internlm2_5-7b-chat-4bit
+```
+**Note:**
+- We recommend that you specify the --work-dir parameter, including the model name as demonstrated in the example above. This facilitates LMDeploy in fuzzy matching the --work-dir with an appropriate built-in chat template. Otherwise, you will have to designate the chat template during inference.
+- If the quantized model’s accuracy is compromised, it is recommended to enable --search-scale for re-quantization and increase the --batch-size, for example, to 8. When search_scale is enabled, the quantization process will take more time. The --batch-size affects the amount of memory used, which can be adjusted according to actual conditions as needed.
+Upon completing quantization, you can engage with the model efficiently using a variety of handy tools.
+For example, you can initiate a conversation with it via the command line:
+```shell
+lmdeploy chat ./internlm2_5-7b-chat-4bit --model-format awq
+```
+Alternatively, you can start the gradio server and interact with the model through the web at `http://{ip_addr}:{port`
+```shell
+lmdeploy serve gradio ./internlm2_5-7b-chat-4bit --server_name {ip_addr} --server_port {port} --model-format awq
+```
+## Evaluation
+Please refer to [OpenCompass](https://opencompass.readthedocs.io/en/latest/index.html) about model evaluation with LMDeploy. Here is the [guide](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lmdeploy.html)
+## Inference
+Trying the following codes, you can perform the batched offline inference with the quantized model:
+```python
+from lmdeploy import pipeline, TurbomindEngineConfig
+engine_config = TurbomindEngineConfig(model_format='awq')
+pipe = pipeline("./internlm2_5-7b-chat-4bit", backend_config=engine_config)
+response = pipe(["Hi, pls intro yourself", "Shanghai is"])
+print(response)
+```
+For more information about the pipeline parameters, please refer to [here](../llm/pipeline.md).
+In addition to performing inference with the quantized model on localhost, LMDeploy can also execute inference for the 4bit quantized model derived from AWQ algorithm available on Huggingface Hub, such as models from the [lmdeploy space](https://huggingface.co/lmdeploy) and [TheBloke space](https://huggingface.co/TheBloke)
+```python
+# inference with models from lmdeploy space
+from lmdeploy import pipeline, TurbomindEngineConfig
+pipe = pipeline("lmdeploy/llama2-chat-70b-4bit",
+                backend_config=TurbomindEngineConfig(model_format='awq', tp=4))
+response = pipe(["Hi, pls intro yourself", "Shanghai is"])
+print(response)
+# inference with models from thebloke space
+from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig
+pipe = pipeline("TheBloke/LLaMA2-13B-Tiefighter-AWQ",
+                backend_config=TurbomindEngineConfig(model_format='awq'),
+                chat_template_config=ChatTemplateConfig(model_name='llama2')
+                )
+response = pipe(["Hi, pls intro yourself", "Shanghai is"])
+print(response)
+```
+## Service
+LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
+```shell
+lmdeploy serve api_server ./internlm2_5-7b-chat-4bit --backend turbomind --model-format awq
+```
+The default port of `api_server` is `23333`. After the server is launched, you can communicate with server on terminal through `api_client`:
+```shell
+lmdeploy serve api_client http://0.0.0.0:23333
+```
+You can overview and try out `api_server` APIs online by swagger UI at `http://0.0.0.0:23333`, or you can also read the API specification from [here](../llm/api_server.md).
+## Performance
+We benchmarked the Llama-2-7B-chat and Llama-2-13B-chat models with 4-bit quantization on NVIDIA GeForce RTX 4090 using [profile_generation.py](https://github.com/InternLM/lmdeploy/blob/main/benchmark/profile_generation.py). And we measure the token generation throughput (tokens/s) by setting a single prompt token and generating 512 tokens. All the results are measured for single batch inference.
+| model            | llm-awq | mlc-llm | turbomind |
+| ---------------- | ------- | ------- | --------- |
+| Llama-2-7B-chat  | 112.9   | 159.4   | 206.4     |
+| Llama-2-13B-chat | N/A     | 90.7    | 115.8     |

a_mllm_notebooks/lmdeploy/w8a8.ipynb ADDED Viewed

	@@ -0,0 +1,75 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "dce2a8a7",
+   "metadata": {},
+   "source": [
+    "# SmoothQuant\n",
+    "\n",
+    "LMDeploy provides functions for quantization and inference of large language models using 8-bit integers.\n",
+    "\n",
+    "Before starting inference, ensure that lmdeploy and openai/triton are correctly installed. Execute the following commands to install these:\n",
+    "\n",
+    "```shell\n",
+    "pip install lmdeploy\n",
+    "pip install triton>=2.1.0\n",
+    "```\n",
+    "\n",
+    "## 8-bit Weight Model Inference\n",
+    "\n",
+    "For performing 8-bit weight model inference, you can directly download the pre-quantized 8-bit weight models from LMDeploy's [model zoo](https://huggingface.co/lmdeploy). For instance, the 8-bit Internlm-chat-7B model is available for direct download from the model zoo:\n",
+    "\n",
+    "```shell\n",
+    "git-lfs install\n",
+    "git clone https://huggingface.co/lmdeploy/internlm-chat-7b-w8 (coming soon)\n",
+    "```\n",
+    "\n",
+    "Alternatively, you can manually convert original 16-bit weights into 8-bit by referring to the content under the [\"8bit Weight Quantization\"](#8bit-weight-quantization) section. Save them in the internlm-chat-7b-w8 directory, using the command below:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy lite smooth_quant internlm/internlm-chat-7b --work-dir ./internlm-chat-7b-w8\n",
+    "```\n",
+    "\n",
+    "Afterwards, use the following command to interact with the model via the terminal:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy chat ./internlm-chat-7b-w8 --backend pytorch\n",
+    "```\n",
+    "\n",
+    "## Launching gradio service\n",
+    "\n",
+    "Coming soon...\n",
+    "\n",
+    "## Inference Speed\n",
+    "\n",
+    "Coming soon...\n",
+    "\n",
+    "## 8bit Weight Quantization\n",
+    "\n",
+    "Performing 8bit weight quantization involves three steps:\n",
+    "\n",
+    "1. **Smooth Weights**: Start by smoothing the weights of the Language Model (LLM). This process makes the weights more amenable to quantizing.\n",
+    "2. **Replace Modules**: Locate DecoderLayers and replace the modules RSMNorm and nn.Linear with QRSMNorm and QLinear modules respectively. These 'Q' modules are available in the lmdeploy/pytorch/models/q_modules.py file.\n",
+    "3. **Save the Quantized Model**: Once you've made the necessary replacements, save the new quantized model.\n",
+    "\n",
+    "The script `lmdeploy/lite/apis/smooth_quant.py` accomplishes all three tasks detailed above. For example, you can obtain the model weights of the quantized Internlm-chat-7B model by running the following command:\n",
+    "\n",
+    "```shell\n",
+    "lmdeploy lite smooth_quant internlm/internlm-chat-7b --work-dir ./internlm-chat-7b-w8\n",
+    "```\n",
+    "\n",
+    "After saving, you can instantiate your quantized model by calling the from_pretrained interface."
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "main_language": "python",
+   "notebook_metadata_filter": "-all"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/lmdeploy/w8a8.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# SmoothQuant
+LMDeploy provides functions for quantization and inference of large language models using 8-bit integers.
+Before starting inference, ensure that lmdeploy and openai/triton are correctly installed. Execute the following commands to install these:
+```shell
+pip install lmdeploy
+pip install triton>=2.1.0
+```
+## 8-bit Weight Model Inference
+For performing 8-bit weight model inference, you can directly download the pre-quantized 8-bit weight models from LMDeploy's [model zoo](https://huggingface.co/lmdeploy). For instance, the 8-bit Internlm-chat-7B model is available for direct download from the model zoo:
+```shell
+git-lfs install
+git clone https://huggingface.co/lmdeploy/internlm-chat-7b-w8 (coming soon)
+```
+Alternatively, you can manually convert original 16-bit weights into 8-bit by referring to the content under the ["8bit Weight Quantization"](#8bit-weight-quantization) section. Save them in the internlm-chat-7b-w8 directory, using the command below:
+```shell
+lmdeploy lite smooth_quant internlm/internlm-chat-7b --work-dir ./internlm-chat-7b-w8
+```
+Afterwards, use the following command to interact with the model via the terminal:
+```shell
+lmdeploy chat ./internlm-chat-7b-w8 --backend pytorch
+```
+## Launching gradio service
+Coming soon...
+## Inference Speed
+Coming soon...
+## 8bit Weight Quantization
+Performing 8bit weight quantization involves three steps:
+1. **Smooth Weights**: Start by smoothing the weights of the Language Model (LLM). This process makes the weights more amenable to quantizing.
+2. **Replace Modules**: Locate DecoderLayers and replace the modules RSMNorm and nn.Linear with QRSMNorm and QLinear modules respectively. These 'Q' modules are available in the lmdeploy/pytorch/models/q_modules.py file.
+3. **Save the Quantized Model**: Once you've made the necessary replacements, save the new quantized model.
+The script `lmdeploy/lite/apis/smooth_quant.py` accomplishes all three tasks detailed above. For example, you can obtain the model weights of the quantized Internlm-chat-7B model by running the following command:
+```shell
+lmdeploy lite smooth_quant internlm/internlm-chat-7b --work-dir ./internlm-chat-7b-w8
+```
+After saving, you can instantiate your quantized model by calling the from_pretrained interface.

a_mllm_notebooks/openai/.ipynb_checkpoints/infer-checkpoint.py ADDED Viewed

	@@ -0,0 +1,167 @@

+# !pip install openai
+from openai import OpenAI
+from tqdm import tqdm
+client = OpenAI(api_key="YOUR_API_KEY", base_url="http://0.0.0.0:8082/v1")
+model_name = client.models.list().data[0].id
+NUM_MODEL = len(client.models.list().data)
+NUM_THREAD = min(int(NUM_MODEL * 1.5), 32)
+import datasets, huggingface_hub
+disk_path ='/dscilab_dungvo/workspace/BA-PRE_THESIS/dataset_pretraining/SYNTH-PEDES/annotation_english_vietnamese_processed'
+dataset = datasets.load_from_disk(disk_path)
+# Dataset({
+#     features: ['image_name', 'person_id', 'caption_0', 'caption_1', 'attributes', 'prompt_caption', 'image', 'viet_captions', 'viet_prompt_caption'],
+#     num_rows: 4791127
+# })
+# {'image_name': 'Part1/1/0.jpg',
+#  'person_id': 1,
+#  'caption_0': 'A woman with black hair and she is wearing a black jacket with blue jeans paired with black shoes.',
+#  'caption_1': '',
+#  'attributes': 'woman,short hair,black jacket,blue denim jeans,black sneakers,black backpack',
+#  'prompt_caption': 'The woman has short hair. She is wearing a black jacket, blue denim jeans and black sneakers. She is carrying a black backpack. ',
+#  'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=59x129>,
+#  'viet_captions': ['Một người phụ nữ với mái tóc đen và cô ấy đang mặc một chiếc áo khoác màu đen với quần jean màu xanh kết hợp với giày đen.'],
+#  'viet_prompt_caption': ['Người phụ nữ có mái tóc ngắn. Cô đang mặc một chiếc áo khoác màu đen, quần jean denim màu xanh và giày thể thao màu đen. Cô đang mang theo một ba lô màu đen.']}
+def get_output(english_text):
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=[
+            {
+                "role": "system",
+                "content": "You are a helpful assistant who is proficient in translating English to Chinese.",
+            },
+            {
+                "role": "user",
+                "content": "Please translate and paraphrase the following sentence into natural, fluent Chinese: " + english_text,
+            },
+        ],
+        temperature=0.7,
+        top_p=0.9,
+    )
+    return response.choices[0].message.content
+output_root_folder = './output_chinese'
+import os
+# make dir
+os.makedirs(output_root_folder, exist_ok=True)
+# multithread: NUM_THREAD threads
+import threading
+import time
+# def get_list_partition_index(n, num_partition):
+#     partition_size = n // num_partition
+#     partition_index = []
+#     for i in range(num_partition):
+#         if i == num_partition - 1:
+#             partition_index.append((i * partition_size, n))
+#         else:
+#             partition_index.append((i * partition_size, (i + 1) * partition_size))
+#     return partition_index
+# /dscilab_dungvo/workspace/vlm_clone/a_mllm_notebooks/openai/output_chinese/thread_32/4509280.json
+def get_uninferenced_indices(total_indices, output_dir):
+    inferenced_indices = set()
+    for thread_folder in os.listdir(output_dir):
+        if 'thread' not in thread_folder:
+            continue
+        thread_path = os.path.join(output_dir, thread_folder)
+        if os.path.isdir(thread_path):
+            for json_file in os.listdir(thread_path):
+                try:
+                    index = json_file.split('.')[0]
+                    index = int(index)
+                except:
+                    print(f"Error: {json_file}")
+                    continue
+                inferenced_indices.add(index)
+    uninferenced_indices = [index for index in total_indices if index not in inferenced_indices]
+    return uninferenced_indices
+total_indices = list(range(len(dataset)))
+REMAIN_INDEXES = get_uninferenced_indices(total_indices, output_root_folder)
+def get_list_partition_from_list_index(list_index, num_partition):
+    n = len(list_index)
+    partition_size = n // num_partition
+    partition_index = []
+    for i in range(num_partition):
+        if i == num_partition - 1:
+            partition_index.append(list_index[i * partition_size:])
+        else:
+            partition_index.append(list_index[i * partition_size:(i + 1) * partition_size])
+    return partition_index
+# LIST_PARTITION_INDEX is list of list of index
+LIST_PARTITION_INDEX = get_list_partition_from_list_index(REMAIN_INDEXES, NUM_THREAD)
+import json
+# Each thread do a loop in its partition index. for each index, get the chinese translation for: prompt_caption, caption_0, caption_1
+def thread_function(thread_id):
+    # make output folder for this thread
+    os.makedirs(os.path.join(output_root_folder, f"thread_{thread_id}"), exist_ok=True)
+    list_index = LIST_PARTITION_INDEX[thread_id]
+    for i in tqdm(range(len(list_index))):
+        if i % 1000 == 0:
+            print(f"Thread {thread_id}: {i}/{len(list_index)}")
+        index = list_index[i]
+        item = dataset[index]
+        dump_item = {}
+        for key in ['prompt_caption', 'caption_0', 'caption_1']:
+            english_text = item[key]
+            if english_text == '':
+                chinese_text = ''
+            else:
+                chinese_text = get_output(english_text)
+            dump_item[key + '_chinese'] = chinese_text
+        # dump to json file
+        with open(os.path.join(output_root_folder, f"thread_{thread_id}", f"{index}.json"), 'w') as f:
+            json.dump(dump_item, f)
+    print(f"Thread {thread_id}: Done")
+threads = []
+# for i, (start, end) in enumerate(LIST_PARTITION_INDEX):
+for i in range(NUM_THREAD):
+    x = threading.Thread(target=thread_function, args=(i,))
+    threads.append(x)
+    x.start()
+    time.sleep(1)
+for thread in threads:
+    thread.join()
+print("Done")
+# # Combine all json files in each thread folder to a single json file
+# import os
+# import json
+# list_json_files = []
+# for thread_folder in os.listdir(output_file):
+#     for json_file in os.listdir(os.path.join(output_file, thread_folder)):
+#         list_json_files.append(os.path.join(output_file, thread_folder, json_file))
+# output_json_file = './output_chinese.json'
+# with open(output_json_file, 'w') as f:
+#     for json_file in list_json_files:
+#         with open(json_file, 'r') as f_json:
+#             json.dump(json.load(f_json), f)
+#             f.write('\n')

a_mllm_notebooks/openai/.ipynb_checkpoints/langchain_openai_api-checkpoint.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

a_mllm_notebooks/openai/.ipynb_checkpoints/load_synth_pedes-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,96 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datasets, huggingface_hub\n",
+    "# huggingface_hub.login(token=\"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\")\n",
+    "disk_path ='/dscilab_dungvo/workspace/BA-PRE_THESIS/dataset_pretraining/SYNTH-PEDES/annotation_english_vietnamese_processed'\n",
+    "dataset = datasets.load_from_disk(disk_path)\n",
+    "# dataset = dataset.cast_column('image', datasets.Image(decode=True))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Dataset({\n",
+       "    features: ['image_name', 'person_id', 'caption_0', 'caption_1', 'attributes', 'prompt_caption', 'image', 'viet_captions', 'viet_prompt_caption'],\n",
+       "    num_rows: 4791127\n",
+       "})"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'image_name': 'Part1/1/0.jpg',\n",
+       " 'person_id': 1,\n",
+       " 'caption_0': 'A woman with black hair and she is wearing a black jacket with blue jeans paired with black shoes.',\n",
+       " 'caption_1': '',\n",
+       " 'attributes': 'woman,short hair,black jacket,blue denim jeans,black sneakers,black backpack',\n",
+       " 'prompt_caption': 'The woman has short hair. She is wearing a black jacket, blue denim jeans and black sneakers. She is carrying a black backpack. ',\n",
+       " 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=59x129>,\n",
+       " 'viet_captions': ['Một người phụ nữ với mái tóc đen và cô ấy đang mặc một chiếc áo khoác màu đen với quần jean màu xanh kết hợp với giày đen.'],\n",
+       " 'viet_prompt_caption': ['Người phụ nữ có mái tóc ngắn. Cô đang mặc một chiếc áo khoác màu đen, quần jean denim màu xanh và giày thể thao màu đen. Cô đang mang theo một ba lô màu đen.']}"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dataset[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_mllm_notebooks/openai/.ipynb_checkpoints/openai_api-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,408 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "65815b1f",
+   "metadata": {},
+   "source": [
+    "# Image URL"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d606605d-b949-4b3d-b582-9316734320f1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ChatCompletion(id='1831', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on a grassy surface. The tiger is positioned with its front legs extended forward and its head slightly raised, giving it a relaxed appearance. The tiger's distinctive orange fur with black stripes is clearly visible, and it is surrounded by green grass, suggesting a natural or zoo-like environment. The lighting is bright, indicating a sunny day. The tiger's expression is calm and focused.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735906949, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=82, prompt_tokens=1843, total_tokens=1925, completion_tokens_details=None))\n"
+     ]
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:8081/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": [\n",
+    "                {\n",
+    "                    \"type\": \"text\",\n",
+    "                    \"text\": \"describe this image\",\n",
+    "                },\n",
+    "                {\n",
+    "                    \"type\": \"image_url\",\n",
+    "                    \"image_url\": {\n",
+    "                        \"url\": \"https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg\",\n",
+    "                    },\n",
+    "                },\n",
+    "            ],\n",
+    "        }\n",
+    "    ],\n",
+    "    temperature=0.5,\n",
+    "    top_p=0.8,\n",
+    ")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "370fea1d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ChatCompletion(id='6', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on a grassy area. The tiger has distinct orange fur with black stripes and is resting \n",
+    "text = response.choices[0].message.content"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "46de478b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\"The image shows a tiger lying on a grassy surface. The tiger is relaxed, with its front legs stretched out and its head slightly raised, giving a clear view of its face and stripes. The background consists of lush green grass, and the tiger's distinctive orange, black, and white fur is prominently displayed. The lighting suggests a bright, sunny day.\""
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "text"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "f60099ff-ca4c-46f1-9dcd-3a4fb776ea4d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "5"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(client.models.list().data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "e51e6cd6-9ca3-4082-8a8c-f1668f0de5c9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ChatCompletion(id='1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image features a tiger lying down on a grassy surface. The tiger is positioned with its front legs stretched forward and its head slightly raised, giving it a relaxed posture. The background is lush and green, suggesting a natural, outdoor setting. The tiger's distinctive orange, black, and white stripes are clearly visible, making it a striking and recognizable subject. The lighting highlights the tiger's fur, creating a vivid and clear image of the animal.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640960, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=90, prompt_tokens=1843, total_tokens=1933, completion_tokens_details=None))\n",
+      "ChatCompletion(id='1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on a grassy surface. The tiger is relaxed, with its front paws stretched out and its head slightly tilted. The stripes on the tiger's fur are prominent and characteristic of the species. The background consists of lush green grass, and the lighting suggests a bright, sunny day. The tiger appears calm and comfortable in its environment.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640964, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=73, prompt_tokens=1843, total_tokens=1916, completion_tokens_details=None))\n",
+      "ChatCompletion(id='2', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The image shows a tiger lying down on green grass. The tiger has a striking orange coat with black stripes and a white underbelly. It is looking directly at the camera, giving a calm and composed expression. The background consists of lush, green foliage, providing a natural and serene setting for the animal.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640967, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=62, prompt_tokens=1843, total_tokens=1905, completion_tokens_details=None))\n",
+      "ChatCompletion(id='1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image features a tiger lying down on a lush, green grassy area. The tiger is relaxed, with its front legs stretched out, and its distinctive orange fur with black stripes is clearly visible. The background consists of well-maintained grass, creating a serene and natural setting. The lighting suggests a bright, sunny day, enhancing the vivid colors of the tiger's coat. The tiger's facial expression is calm, adding to the tranquil atmosphere of the scene.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640969, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=93, prompt_tokens=1843, total_tokens=1936, completion_tokens_details=None))\n",
+      "ChatCompletion(id='2', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on green grass. The tiger is relaxed, with its front paws stretched out and its head turned slightly to the side, giving a direct and calm gaze towards the camera. The tiger's distinctive orange fur with black stripes is clearly visible, and the background is lush and green, suggesting a natural or well-maintained habitat. The lighting is bright, indicating a sunny day.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640973, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=82, prompt_tokens=1843, total_tokens=1925, completion_tokens_details=None))\n",
+      "ChatCompletion(id='2', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The image shows a tiger lying down on a lush green lawn. The tiger has striking orange fur with black stripes and a white underbelly. It is looking directly at the camera with a relaxed posture. The surrounding grass is vibrant and well-maintained, creating a peaceful and natural setting.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640977, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=59, prompt_tokens=1843, total_tokens=1902, completion_tokens_details=None))\n",
+      "ChatCompletion(id='3', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image features a tiger lying on green grass. The tiger is in a relaxed position, with its front paws stretched out in front of it. The background consists of lush, green foliage, and the tiger's distinctive orange and black stripes are clearly visible. The lighting suggests it's a bright, sunny day. The tiger appears calm and at ease in its environment.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640979, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=74, prompt_tokens=1843, total_tokens=1917, completion_tokens_details=None))\n",
+      "ChatCompletion(id='3', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on a grassy surface. The tiger has its front paws stretched forward, with the rest of its body relaxed. The background consists of lush green grass, and the tiger's distinctive orange, black, and white stripes are clearly visible. The animal's expression is calm, and it is looking directly at the camera. The lighting in the image is bright, highlighting the tiger's features and the vivid colors of its fur.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640981, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=91, prompt_tokens=1843, total_tokens=1934, completion_tokens_details=None))\n",
+      "2.86 s ± 846 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%timeit\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[{\n",
+    "        'role':\n",
+    "        'user',\n",
+    "        'content': [{\n",
+    "            'type': 'text',\n",
+    "            'text': 'describe this image',\n",
+    "        }, {\n",
+    "            'type': 'image_url',\n",
+    "            'image_url': {\n",
+    "                'url':\n",
+    "                'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg',\n",
+    "            },\n",
+    "        }],\n",
+    "    }],\n",
+    "    temperature=0.8,\n",
+    "    top_p=0.8)\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "094bec32-0324-486a-809e-d919891c2167",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# !ps aux|grep lmdeploy |grep -v grep | awk '{print $2}'|xargs kill -9"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07a1fb36-e361-4d59-870e-0a8a3f15e5d5",
+   "metadata": {},
+   "source": [
+    "# PIL Image"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "e56e3874",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "import datasets, huggingface_hub\n",
+    "\n",
+    "disk_path = \"/dscilab_dungvo/workspace/BA-PRE_THESIS/dataset_pretraining/SYNTH-PEDES/annotation_english_vietnamese_processed\"\n",
+    "dataset = datasets.load_from_disk(disk_path)\n",
+    "\n",
+    "image = dataset[110]['image']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "c0c2b27d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from PIL import Image\n",
+    "import io\n",
+    "import base64\n",
+    "import uuid\n",
+    "# {\"url\": 'data:image/jpeg;base64,' + img_str}}\n",
+    "\n",
+    "def pil_to_url(pil_image):\n",
+    "    buffered = io.BytesIO()\n",
+    "    pil_image.save(buffered, format=\"JPEG\")\n",
+    "    img_str = base64.b64encode(buffered.getvalue()).decode()\n",
+    "    return f\"data:image/jpeg;base64,{img_str}\"\n",
+    "    \n",
+    "    \n",
+    "\n",
+    "def generate_content(image, prompt):\n",
+    "\n",
+    "    # image is a PIL image\n",
+    "    messages = (\n",
+    "        [\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": [\n",
+    "                    {\n",
+    "                        \"type\": \"text\",\n",
+    "                        \"text\": prompt,\n",
+    "                    },\n",
+    "                \n",
+    "                    {\n",
+    "                        \"type\": \"image_url\",\n",
+    "                        \"image_url\": {\n",
+    "                            \"url\": pil_to_url(image),\n",
+    "                        },\n",
+    "                    },\n",
+    "                ],\n",
+    "            }\n",
+    "        ],\n",
+    "    )\n",
+    "\n",
+    "    # send message to the model\n",
+    "    response = client.chat.completions.create(\n",
+    "        model=model_name, messages=messages, temperature=0.5, top_p=0.8\n",
+    "    )\n",
+    "\n",
+    "    return response\n",
+    "\n",
+    "# print(generate_content(image=dataset[110][\"image\"], prompt=\"describe this image\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "cbf16d3e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAD0AFcDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDm/FehWvh2yjjt2kLMQpJHWsLSG1a0k3Wds8hc+nBr0zxTb2V3aRT3Um1c5U4zx61PY6poFno3nC6gKoOSCMk/SuZXZo7Hn2ua79vtGtrzTvIvFA+YelcjKo5rV8R60uravLdRJti+6vGCRWKzk1rFXJsaVrOF3KJHCH+Dcdv5Vo22oFZ45I40DxcodgOTjHNc5EWLda7rwhpEV3cKZeR71UtEOEbsrT+LNaWbzQY42IxlExTLbXNSuSy3L+ZGzbiWGea19Z8K6hJfyNZxq8WeBmp7DwhqCxZljRSe2ayvc2cLHKzRl3dlBA9NxqnDIEm2+prrZ/D95FK6NH+VZR8NXrT5ChRnvWhDZVvLK2kMeWILDkg0VpT6JcqmW29cUUhaCXfiD7d4RSynceeuFBJ5wO1Y/h7w3deI7oLFlIFb55COPoKf4X0SHWb7NzKwiX7wHU17Fp32LTbWO1tLfbGowAOKl6Eo5Hxh4esdI8Hhba2XzIQP3nc+teWumUDDvXsfj66aXw7LGEOGryTTYTczCDIyQcZpxK6lSEESDPrXs/gjSRLp4mzjcMdK86i8OzSMp29SOte/eGtITTvDsKkcqmSampK+htTjbU47VfEEXh+Z/tMTMd+BisxvijaH5VsZGP1pnjFhdTyb/mCuQo9BXCPbhHzjinBKxNWXY6O78czTSM6WYGTwCelV18Z3DZzap+dYpi3DoaRYMZ4qjE0rnxPdznAiVRn1orNMOT0ooHcPB+rJpWvxSz827nZIPb1r6IsLeynhSVEQqwypHcV8z6jp8+malJayrhkP5+9dhpPjzVNK0pLNCrhU2qx6j/GqUeYzPTvHCWI0CeJmjUFDgAjJNeH6cP7OkL7FZ+xPan32rXOoy+ZczM7e54FVfN962jTQrs6a08QvG43ouK9H0XxyJ7b7K5jXK4FeJiXnrT47qSNso7KfUGk6UWUqkj07VtL3iSdP9IVm3bVPIrh55zbXoDWcg9NycCrFn4kngiVGkY4HepZ/EAucb0VvcrR7KwOdyB3W4beYtgqMxAHipZdUjlkjjKbTgKDV6OFSucUnASkYbqQx+XNFbxtkx0FFZ8pVzd+IfhpLmxGqwp++iX95gdRXlQfK17x4vuhB4WuJOoMXT8K8DViRnGM9qumybDyaTNNzRnmtRDt1LuqPNGaaYiUvjvU0E+4gVTY5FOt8q4Ip8wWuaUwztYHkc11Glv5tpGWPPSudiTz4iO9bulkQQrGxHHepdmK1jXEYA6cUVYQBl4orKxVx3ijWrfVvDS2dq3mSbVV1Jx0HUV5hNY3NsAZYiAe46V3SabcWTDzoSorI13iFVHc1kpWZuoXicoaSpZIyrYNR4rpTuYNWENFLSUxAa19EsYb1nEjEbegFY/etbw/NtvtvrUT2NKe51kPhuOW0kNvIRMBkAnAqnFaTW0gSQDf1OK6LTbtYnOaoXZvJtRElvbSGNR1AyDXNGbub1Ka5blm2kCoA5waKzpxNDcLNMrrnIAxRXQcg46rdXTH7TMT6ADP9aytRU3FzGvQZ71GsdzbSMsgPBPJH69afcTAukhwBuycVyXuzuinaxj6vB5VyfQiszNbevDf5Uo6Fe1YmOK66exz1FZiZopKUVoZC4qxpjGHUIjngnmol5qSMBZVb0NKSuhwdmd3a3QwMDnNb1hb6pdqGN5HDbE8KMZ/+tXH2Mm6JSDUXiV7i3hjljuJUBwAqtgdK4npI7WnKI3xdrQ+3m1tZDL5BKmUnqc+lFceGZiSeT6nmiuuOxxSjqe0abp8V7ZSCaMMx7kcjiuI1W0NpPJCQcIcDNd/p10scuxAArVjeJrNJ1aVfvfzrD2dkdkXdnCXNwk2nRox/eIT+VZZxVq7jZHKgVUEbVtT2MKu4hopdho8s46VoY2YBtpp/mZWnR25cgYrSi0h5oDjg/SndWKUWXNEkDxFCelavia1+0WFqmeS3H5VgaWr2l0Y3GCOoNad1evPdQxFsheg9K4ai947YP3dTm9R0iXTJFMhBVhwc4orsvFGnpLo0UuBvBHJFFbxloc7Wps2U4+0IrdCat6hEJUKntWTZzQeeqsw3A8VrPJvJPrWn2QTszkNQ0XfLuU4Unmqo0Be8uK6m7XEbVml651OzOynSjNXZkjQI+8p69hViLRbRSNxZvarRcik8wjkiq5zZUIIt21lZxsNsK10dg9nGjf6OuSPSuWjnAYDNXY7kBcA0uZsPZxsUvEkEceoJcxIFDDaxFZToqalAx+6w7VoavI8kQDN8gOax7qUmWAoeQaTOeorbHYaxG0+joqDJAWirOmTLcWao+CQBndRSRlyM5WyV5JwMV1MKsQKwNLRjehx90dfautiUFBxzXXBXOaTKl1DvhJAycVzUsrxuflGAe9dr5WVII4rlbyDZPIpHRjXPUp2Z3YapdWMr+1ERyHXFOW/S4IWMgZ9arX1spBPTFZ1jlb0L2NZ2Ou50BiwM7sn2p8MTs4LHC+lEcLKPvVKFYnAOKpFDNSRTZSe1cxFP506D+62K6icFo9h5BqjPpEcd0kqDaG5IFU1ocVR2kW7bUBZXphZyAy5HpRXP+JJtl6iR53KuDRWVi+aJv2+uWumllkhdjnkrWlD4y0ogFlmQ+myuNvCHmf8A3jUCxgkV2RdjzG7nqNt4g0u5iDJcAcdG4rFv5Y5buR4mDI3ORXO6dDZglrjp6CtdzCT/AKOu2MDpU1Hc6sL8RRvvukAdaraJpsup6mYIiocLu+arN1V7wEwj8ZRMf4lwAehrA7puxKyGCRon+8hKn6igeoq94hi8jXbpQMAtkVmgkUJlQleJJ5bysFjXc3pXTR+Ho59JW5lISVEz1rl4rkxzKR1B7V1omaXTi28gFOhrZK6POxE7SPJ9cVptWbYkjDHZSaK9ssI7S3tIyLWLfjJZlBJoqeUi9zxOTmQn3pAQKfdRmOVvTNV8nNXc51sWFcqeDWxYzb49vcVz8jMgyK19CbzN5f8ACom9DooO0i1d4APrWt4JtN+tx3POY6zL1ecjvXVeBYGE7NsOMZJrnudtSd0Q+LiW1xjtwMfnWE8oRctxXTeL5rca4sSgl9vzHHFc1fIrRkCmmTCdkZtvqLyXhVF6HjPeu10WK61O1kZvljThie4rzaW4ayvhIg5Wu48Ia/eapO8WxI4VXJCjk11QehxVdZHS398mmWiF1LDgYFFWjGsnDoCPcUVVi01Y8WuL9bqZht2sCeDTAcVBqkH2HX7qEdFkP86nU8is0cyHuu6M+wrR0YbImPrVL+E/Sr2nMFgOfWpqbG9HVlud9zKvcnivZvCumRWumRuVUEpzjv3rxOSQNLHjruFe56bMYvDQk6FUJ/8AHa5zebPJPEN+L7xnfmMnYjkL9M1DMSyfhWPbXBm1i9l6lpm/nWo7/KabLprQopo41SZ137SBya2/AVqbW6vY2O5kbbkVU0aXF6y56itTwp+71LUR6yf41vS1OaroztFI70VCGIPWiurlMOY8V8XSbvFV6c8mQ5/OmQkGJTUPiNH/ALeu3b+Jyc/jS2jZhHrWHUSLqHjmpIJdoKioQpPNV4Zv9IZfepnsa0XZmzbAvdxem8V7VeXa2vhCRjgYiPP/AAGvErWTZNG3vXpOp3Et54Slij+8YR39ua50tTpnseP2Nywv2OeXYn9a6IMWSuTtz5V2d38LEV1cHzRD3FORVLYn0z5NQX3rc0MbNZvAO+01gQt5Vwr/AN081saLcBtVmI48wDr7CtqTsc9VanYA5Wis+fUILP8A10ir7Zorb2hlyHlXia7ju7lZVUAkc1RsmLKRS6rbSRTtnoCcVFablJqGxJGqvC1lozC8xV0N8vWqYGbpD6mpbKgtTWSQrtr0XQbpbvR3hkbohFefm3Plg9sVraTqj2kTxgHkYyDWKWp0yehyWpRi31m5jByBIcfnXR2J3QL/ALtczqRMmqyOTnLZrrrGDbbJhSAVHX6UpoKDH20QebnpVDXZjbzRi3ZkYkcqaukvBNkdKzNXy80D+4qovQKi1IdRVnt4mcu5IBJJzzRVu6g862Reneii5CidlqfhuyvlYiLBb0FYo8CNuOxjjPetvUtbktmxDgY61iz+Jr5xjzNuPSqVyNBf+EBnJ/4+EUfWsnVfCraQi3LXSSAHgD1qZ9cvPmPnvz1way727muVwzsw9zVJE7EP26TdtzlfStHT0MwfA5rGSB2kHB610ulwiEZHUjmqsgu2crf2k6Xju0bAZ610Vjq/k2cSSIWKrjrV7UIvMgckc44rmlhuixUROfTAqGkyoya2Ne51S3mQjYyn61lTSNNNGQxKg9Kli0TU7ogR278nqRWvYeDNXaQMUC49anRA5yZUv7pTbx+UNhAANFdRF4BvJxiQc+1FToaRloZeqZ89uaymA20UVujGIxUG6nmNPSiigaFjjXeOK6PT4I9i8daKKmRSN6HTraZPnTNamm6XZozbYh+VFFZvYZvQWduBkRr+VXUhjC8ItFFYlvYUnavAH5UUUVSJP//Z",
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAFcAAAD0CAIAAABhIi17AAB85UlEQVR4AbXdWXdj2XUneMwgwSnmiJwlWUrL8rJ79epV7vL3f/SL+8FV1pDKlHKOgQxOAAiAQP/++wAMpjJtl6pXn4y8vLj3DHve+4y3+5tfvtjb2xtWevDg2P3V1cV0Or2aTW9vbzudzsnJyd/88tMXL14Mh+PJZDIejkej0enp6RdffHF+cfX69euzi/P5fH677mw2m26no1Ruut31cuW+Pxqq85OPf97r9d774P3FYvHVn7+qmjdq66036/V6dn15dXX1f/4f//jzn//8/fffl/PV9y/Pz8/39ke/+tWvTg4PgXG7Wv35z39eLJfeqvC3v/3tcrm8uHx7fX66XC46nX5n01uuN/3eQOPgXyxWMoAEtKPx/mAw6PcHnV7vtru+WS1vlzdavL6+Ho/HQBqsViuVHh8f7++PNTAD0eXlzc3N6nblNWRkkPvt27fHxw+kRw8fHh4eHRwc9Ab9s9NzReYBojObL+CDCveT4oh4cXHx2WefodTnf/pCQ511B/6gQtnNciXP7PoAWf18/vz5o0ePPOluOvhydX3x5ZdffrVeq3M+m3399de9fv9pJdBLEOysN999/c3p6RtIDsf7nU4PiVXQ7/fB1vihuJvNZu1Jv9ff7/WWvQ5o5ZTcDH7+ycfvvfcesPqDLlSvry6gvd6sNrfJ2e0Nlb68uFjc3Jydnl5dnq8++STY9ntPnz5Bi6vZ1Vff4u2y1+900GBNCjbuJE1q/ujwZIGkt7cyew7W8d4IZ7xdLlfzq6vAenuL4liN/4eTA/XPZjNP5tMrOUHrydXl5dnZGdKcv3376tUrrFIhLg8HkweP3uv1j1ar215vgP+97u1oNLhZLrq9Dey1udmoYdPvB77hIGK/XPaH/UGv08VgMA/+6Z/+6emzx5rBf0Cfn5/1uhuEgHxIuLolOciCtH7iBhXAjOOTh1Kn18cNgHrVUkO+FKK77ob2hNPD/f19Erdapx4/teWqqu7tmjyPh6nnu++++/bbb9ehWIiyul3cLpYKdrsdP4eDwjDSOht++61GHz9+vL9/uJxTwxFakN1bcPbG3c56tVxREIyAv3+lo6jQx5a90fjo5BiC6HWwP5kdzmA6+Kf/9n9BhNDiw/nZqZxaHfb7S3KIqrepfFpyi1J+np6+hs/Jg0dPnjwZ7u2Tcyh5VZIQZndV0CiInJsOoAcDKAyJHuqQi+64v+n0NuuNBke9/s3NfPTg+MWz59eXF5i8vJlFOojvcrVaLG+6dLmEvARMQ8iKHnRT5fMZ7e/OZii5Ds/DigghqNUg3XZid7p0gUyQ4E0f/oNev9vvzKez9WrR72729vcGh4cTbXk3v5lNZ9enb94ADqXDVcDGonRXy+AfSRoOqT5pnM5ucLI/imaig8zd0oJ2bRBstL7pEFSQBaDb28nBUbtxz9p1CEKfxGzOTy8gPOh1S1UwP1p1OxzN+4V/hPmdkiMKQycLZhJxVEDfwbDX68bGqR+6t2vmrNkCD1CfsvQgpdlU3u0Ouj1UxkI0wtTB7ZLg3QwJC3lCnAVjp/W98Xi/PxiwQDCnwCWBTycHB7PFzdn524vzK/Zss1iqYl0+oRf1o0Jd9C9zFhKUTG5I92ZDttbDEaMYgXUP5wVzoobhEFg4sz8eoml3M9iwR2uAk6t9urC+XbI4dFj14OkNBxzBoBc1xKgecaK4q5vbzXIwjtKjxWrTXxC9XjQ6ZCgQ6QXA2DvmJuLWiQf0ltEYUIdOZxTmNGs5GDQd3t8/YN4fPX58eHjItGj+2bMX+5PJ9Gb+/auXf/7TV7PZDbkYDkarmNuqLsx3r2L1R7Ogp21taVWKtq9WdCb3kUYIIMpmOO5N9kYYjhlMmOeqwGqZIb9cbFSfygZ9Vhx4LH1poMqXG9anT+A10R2NGK9wfjjqzVm5+JatRACrQebKImJqaLXBoQVrwAKvZCdiyIuok/GINT86OoIz+/f0yTNGaLQ3hs/e3gSz2PvDo5OL8+ne3rd0gQnrD0naan1LgfvaSMsdxrkHbqKJInCmHeSCiWFk0Yr/5YCQjF2UO9Zx0Nfo3miInXCjletepzfsr28Hm2VvWACwAWynJrgGlbrZG+8vbhFrwxihenmBTm/QI980n/D0Ov3bJQPnR7eJwpJS9Pvz6RSVUQHpgS7GqAiHCSlL/uzZs8lsXpx/xhc8f/aCHRIoeSujBtkCVhzEcL6ZL+EwuCVTSB7k74SieJAinjfSqIF0tIbZJBmIQXfTG48Gk73x4WT/0YOThw+OB5vu2ZvTy9vNanaDUSjIqqMgkDBwhCFD7nGE9KqFTzXKbPmVVK3gaBLpBlN74hoJbHa3xNUTKh+g+QgOyR+vSf6Dx48up9eb3qUMRP3o8PjRk6cQVrsaS1RX0/ni+nK6mC/wkskSH5D5OOIVKaCIYsZgp/a656madLTfaSiWw9V/fTreYw6ODw8enBw9f/Lw6ePH3dV6NZ+96rAGN8wGF6oVFiLa2+35FztZpg7Yqo4DWm1WkM2bYMVykqK98YR4yuMJGNy4Yvtefy+iQUyQBYipZR2KupEEIZiPmUdHV0o+evjIk3iF4TAYNMQWN9PXp7/73e/EgtRpuQxp4pZgvcEWtI9KV9ORDrKQsu23u13iuij/ZG//YDyiBQeTvaePHz19/BA5RJNHB4f0gkWA82J+Q3RJAKMxHI+VGvYS8PiHmURwnShHqyiFDMXbHh33doVYBD4IrktQN4mDmjkIU2O3Eu97MhBUy6qSvYPJA36x0zs4PCZHYmWmiHrHw8cox4KyiK5vXr06ffX6ZjqraIpFG+LJbTyiVO1taRF64Hk9j2QWF13UR2j7+3ujg/29o4M9GvHg8ODR8bHAcbVYHR8dPDo5ZrBmQujrq4Seqy6rJeacTPYmh5PR/gRvBn3tbnqDm+LBUv3CEteGOTXV+rCTWAvMFIQDUop+gQBpZFOnYEeGAZ/J2w0HiN2/upqen19eXU+rwLXc5Eq9aqaib968+dOf/vQv//Iv//7v/34zX6i3QmTs2PD2HSrKNW2pEPx3KbJQlVRFvFsxuTETWAK4vYqECQX818v19PL4448+fPL08duzc0EXHDCJ5z6gtMcnk8PDAeuIMf0h/K9n07cJ+a6ro1UCUtZnsphQdmTCajg3xwdHigwOTwJ/J/2uUOHqcjoe6YR0UOX07dmr16/fvj2Xr9P56mBy9N4HH37wwQdMMaNNEf7t3/7tD5/9XtQUPiSVtJfAl0neeojiPqGIDWwJIdzw0elrVAKEqImnYB2JUxLfsb7lGQ4PJoPnL2A+fy/hMS13r/RI8DIGy34vfoRx4S+j6g+uL96eXYZe6t/0yhrECuAc7SdTwGJc+8NxXEX0dMPqNJHRLMQHr1+fPnj0mF2Z3QgFluzWxdXlV19+c3l5jVHPn7/34YcfEmExkjhfUjVdhkPg6PRKzbSVTlfZ4/jIkghwJ2yBxIopqn/uPY1IxPfjREwSgQJK9XSYxWu2ZUxhxHPD4aOTBzRHJo4ptoxV63RYB2qWjiyihQrLTY/DGszmkwKGRyxX1Q1fI63pB8QjbjjP0D0G8XblAYt2K4/r4M9ffvPehx/v7x2S08Pj429fvTl9e/HyzWkINp1fTW8++/wLwabmOV4FCMCTR091gWkN2SFRJVTe3LLJcYxpJXCUJih1y5aFJAKT7hAcoUXCxc0+IeT/eh12gUTczOaIOtk7WCxuEhqmks24H6fYHY4Go6HwSLG4oA1KxexJ4cXCwEJaQ02CEFQrdfdGwE6RYSQxssi6FnyMrLIETeXDzWjwh88+//Rv/47ww0HAoQf5+s2ZogP5oj76fEtUCNSQ9HwwMPLx61//2kiBrsRvfx9/MZud40ZPDJ0oroSixk9CFCnBk6LAYSHyP1D8VZVxFFYJFQ72x8E77jBBLoM22jsAM+Jy6+JKvjv+MmIWLaE/8cdrcI5uO1NwshTIJqzSYAWfqT/84L0SUkEmHiHglOMnOFBjFtPfOzs7/5//87ebXl+8JCapKJYgK6wftEqcPujjn6hTTfqRv/jFL/77//3fPvroI1r65Mkj/VADIUiftqUyv+49AYHkWa4kIKodjkHFQ64Q8iziHonYG/GaqBArW1CKtPt5s8ey98fUYYCQqYLkpKH0fDUAq83ihjMVxcC2eAa7SC6cBXshi6pCxHKZqLBObE4RXGk3vBAijuQPf/hDABmNgAh6YYLhFuUloYUa09hmo0/x93//97/5zW9+9rOfyaOKeJ7hUAirUt48jTUuFPKNEOU0CvmQgTj6HxM3JEdZRossSAzfiDxzfx3aQqiSMtozX9xy5gIMBQXUMaEBSVSsp+SNjt033319eX5B/2FocGV5s1imZ61bEe+OoBpqclEAbGSAOcQRCBaug7cXV0cnD6+vZl9/9e1iOf/+++8Xs7nIRCahC6CRDFPee/H817/69B//8R+Jw/5ohA/GBfwTH6tqRZPjq0UpJbQhmjelDlv7EEGBSyixNvYnaNmMhn0RwrOKl06OjnSOqfTF+TURxVsG//zi+uzy6nI21zGmjgovO+uF7mbZIBBKV7PLs/NzzCV1BYrYc4XReACkxj8yTnqAxAZz70tegJrvwkUQJR+pMMijGM7I2vgjH2hgInb66IOP/u7v/u5vf/mr9z94gW9bfreOjcE+Etv4HIVIuMY3qLYIb1SUFgeBKEIMVFnptd6UCKO7L146OWJiSMNIWI/Pm/73r94sL6+vXr25mM5enp6/PjtbbjrXgnYsWeu+RUjVzrKsEkNThJgtzYEZY0PtskftIZDIVIhf3fkMOFZ3u731PPQ6OXmgyNQAy/W1gT3kwF05GLDp1TV1ff/Fe7/59ad/++kv33/x4mByUGKfztpoPNhPMJKhNA2E82zgJv1i8avaW0DNv7ljaIEmrgIhddAEhpJgGdkdYw2hUZneA5HTwcHl9fTN29ffvXzz7avXXxNPQXC3J1ZGAeMOQYDNTMSstXSIBRxhdVRym4DkiR8expSCIRFJ8SQPkuRpGQaaxPMWCGDg2dnpZG9P3xZR6ZVg4R/+4R9+/emnXCOEY8QNBwwFbam9JbKAnMpKbJCHBEDSgCf+Iih5DnF14RMDpW8NfsxvcufKY0EhEWGnO9rbf/Dw8fXNsv/m7eIm8jtkhkLidIpr2CkxophY+IcKMc3F7ab81Whab1RwAxgQlluJv5ChPQzzCk59ys2Njvtq/c13pAB2nfObG01xxLTgn//7P/3yl798/OBhxB6fqw/DI5M9QDNpBEFFTQM9BJmfomP8LmdcvTXeSxdagTgIcrxkSbnD46PDhw9PEJcxJ1FahwvDNt7fO+72zy6vwWik9OlmM2X2Q4UQMuaV+Qw3w1PuORazn65BnGG8aIS/eS3AxC4hUyll5DJYhEvytBvEGnAHaOPRYmEMm13NoBXI3n/x/J//+Z//8R/+nrCI/LSJpApEHZlpg0X6T7HA8W1eJsYpt+QVeqMLBhhEjlUgu0pQixJI1gtKfOTh/t6JHtQR4dvTxybuqr+9uZ3F8Ka/jzqZ1BFQs5TzuYhPI1HxYoYemfiznEaeaTddmrIIDaS7KxEL/CUUwPdcNvlbZs8HM8gXSTOeF2kcv3j27OOPP/77v/vbv/mbvzGrQ1i40dSRJqJp+bPpEFRdTC4nPwJLJ4M2u7AEFbiYGwE7P0RqtVqjAwYUkF6MpBONAmRB2tfPozWVzDAJ4S8urq5msxpSGRlIVMteZ9QxoLrMgI7mQusifmiBuuGx53FO5MG/hFYN7bLZ8pd/qkksAN/9axpRTIuQkPn90dAE0d//5te/+tUvnz99qhvLPiifKvB5fUuT42PDky4KYBotaDDJ02RKVfL4OREKjMc8PDBZQ+aABFNoIFClhr8hHLkM5EFJqfWNgdauPsLF1TUlFVeKq0aCvJs1Cb1lGBJ/hRMZjk0Q2eHUg3WTU8izHYV2iFUYugYBkkKb3nHRbVLLo6+SOint3v7+48cP/+bTX/3qbz998ugRZDKIok9iTPp0qgBCCG2g7caYgu6DnrhuePXfVaehVIUWgMH/g8kkVna+GQsMjAMYBfEiI4c3J8dH7714ZnzNtCA2JEQmL/pms+X1/CaKoaoSY/0C9cojbGFYhgKne35ec9Do8DYQDgli19gPv4lcXjVCEP6AFmJVF28rC1U4BWv0rgw+p22syYirJq+mU/aXDdQwtPECUVQzJ+fDhJhCBpbo2rREzWjxJtPpPPQuqyPsZ03MODKEF3E6ukx9Q6yueEX2kZvQaYsBUnOrvLPsofiMiqldPWKkjAOJAiPnakYp5DTmgxBSngTZQqmwBWHpO7zeGQj4A0zyqmFf2ZRLne5dDXYyqjGuRA7oHulQudcplQPorjSaXTPCJ65hJuTJUF51tFIZb7wzP/erZmj3DvfYAdOJ6lCRcQTOUsUPHzxAAoMmqIDo5ESVIMn0j4pNyd3qCwRDBIIA7EyXJUc8I/eFlk28Na7LHGSkALZ9bran/dq+Illqo9ZFCbfvSKDgQLPYzMgRbzzWaoYRu+Y1+mw3Y2R6CZZAFfWnlkz2dfRzD8xcz2bHhidfvuLpZr0lTcQg+ie8uyAn8wXTcMzXTTJgxUckOipgzGsQn2HGKUaGTJiOgNwNheckb3U7XywpnYEJwKLRYD3gVjg3InhLdQrV0nEcDAVTf0liQ88TpIuhrJRXfjKsCWXz8I5kbiSRx5iD8VjgeHp29t4HL/jusAfra6COQrplp+KVSy1Lag5pynB/QtJfvzn/4k9fU3dFTGugAjP7+MnTvcnRaHJE4nvrW+aQHCD3SBzf6yBgxhr6IyFYp8dNjvjq7jBTcTOuR6S0NCHYS4gWBt6SQk43ckEdbswjqIpLG1G6GqDpG/7SLyKSrBKsOFwFS/6aSJRZ4EL6EWFUcJUh6lkJ5OlK6BaqxbjTbHpjcO/o6EBdQOc4FCCNWCd/4MC+oDTgJgejieHY4wcPMxYwumYWZdifxCI8evx0cvRA1jLlt6hNJQCO6mDY9A2LjparzXy11k/Sf6dxAsNbg87rzdU1AynKCScOjg7X11MG31AN0hAWuYadcQu00yMd7MFGuyGEUdrSizIBWwwDNgsoRbE2t4ndog7kumRCiW5IIJEWYw6nBlhOTzHLTIxIRtUNeU/w3xUzEcUr8+qElgQRV3OwJvLfnl+a4iS+htf9EypeYevNam8gDF3SD7Zxb4/vM8s+2PTXby/n3cH54rYXZxnpMyJiNAY5dH7W8BVE6XsfEsL9wwQMq5Xurxse1HwU8UQC2NB/14xFiN+IA5NVQWB6LRSnhEKGok6MY5x2gqutxa3YopvxBblDj4Hx6Jvvvn355vWZEVciJ6vkOXLIgArBOuB22Q56qz8xCec51idffvk1nbegQRdhect93NwsbjmFeIUMVTFvBhNHBwLGvb3Dhdnb169Or46Pz48fnBwc7DPMar64vDK48vDx09v0gBOkSJNRXwRhOgj+eodYzH1aAmQUAeQmLxDITbsC1b2yrnjOkojY3Tcq1JM8lzwsV5K/MVj+mN9lC+BsfOHzzz//5GcfJapVl66bOLaMfLIjA7zSDMHu13iftSnmL/f2J0fD0Sp2rtefzvT3tz2WMJeP7ne4z/FKj8As4/JyurqYpbd2cDF9eH1z8sAQR8ZC5tOb/njy/sc/P3rwVB9PN/f16Rurp+bmJFarq5s56lfM3p3e6GelX9fpoHfNzBTyw3jZAFlE1HOFaabJA3z0goeqHkeUI4TIlUWr1yGVgu4vL68MPelK8mFZg1QChjoKFzlV4Rkp7DPo4iXzovP5Yjabi/c5I5DorVg+EEji1EivsrHjOGL0gyFgiK5mq6NFRi5MOa86F0hjPkOHgsEzhgu3k0f7G+NdzMn1/PuXp5aaSQkoF7en529VC5+aVD4QvopmB0R5PBhHDI3lN86DMyyGb5d2tJApUiJ83ZoMEXBkK1SIZ8byeDB4mrD66quv/vVf//X4wZF+BOujSdGLZqgAxeI15QSFWfvhYH9lwdF1bNzB0cPoT/pltI5tCwOy6GzACzBFylS3Thyttludjumq05/f3r69ut4/v2RlHj56YEkJXLhnhBiOD07GBwbh1xWnZrFPb4jthp4kxAobOPLFylqSqKclToMxemorYMIdqXBbiBEpKB3h6ZMSx5fSIA6kN81RB38OyTXKeXFhDuq3v33GRlJXmiaYk7WoYd3VFUKQArrz1TfffvbZ5yzC5UV6wdRKDd6qih6oVOvV+wkV8qJUMq3wq0AFyBWCrk1Yc3JmmbI8YZwuJt8kIQUz8d4HH+uzHD94/PL1K1R48PixMfvpfIZ5EpwODib4bQw82CZ+L5TBYjQj8EQeWo8rvzJEEk7R6PykI4nuM6qbNWtGnEO8CHX3anr9+99/9uTJMxM2bDgvV8FVT2T13cvTt2/NBZ1/8813f/zjF99/9yrDm4QxNUagCAKQwgoK2hV01ugCyVBFBhnSyXZHsXVn48NMmnn3snM02Y8IDMcPTzJRTCXVVayD3dC4y/7hEXe4f3Q8vTRPYkUjbQtKf/vrT81nnl2csUbGrBUheF5VhBK2B7aKD4ajrMIEQF/MTARiSjWS8ZGouqF68IVE8b0DDfzxj3/85JOf/fzN2Wq5WRxEV70SUPzut3/87ruX/mW53cV1TAwPnkFqVKBvIUQEMsqpjTIRASE1Z1xBwBMnJU9dk62ruXOzfrqHAtbRweXjue62+Fol2mVEkQ6grI2FpyfdvtnTSc2kCXNoRyRcXNUfLtvsXPXDtJC4MhhHXngqjbvRFwmvwREBiQeBdYyKR/wlTHgA99o24kL2zNP927/9T4s9wQSU69nNV19984fff04iptcmMyyNMvORMX9xAWeiIB6UULiQhWhJRtfi0RO05Vk6x6ERM1EtBj1CakC8e5N7M2/d/nixWmsO0J5IasuV5GSUg7sQ3BDBEJ97EqpzorFtNX9rZFbDmdFiFRPjZL6IAdgCgC8FHBj8pQacxG45lNHrmxsjDIBTe6awe0MKL4yjES9evE9jr6azb7+1pumsQofwDUFTN01siIWCQdSvRs0gq8ZQAWCR/EBQmaMJlSpnUx/USSTC7PN/JFROsqChQBwXFh4KdGFv4UarxxwfDfWEcuBueg+ZqQSeeQ1DbYQh8tBexRjGSkI9SQ0IbYx68MknH5lcqqWZMarpdQC705uw0HuHwH7z5u3L78+MgPASENaYt5JmK2eqC0Vi7sLzQi2X/EhLSLV9Dn1Ve1h5cvVezamCLREMRWOW/bkVCxmH8JQIlEYUyTLKIkvJQI2mkA7MQ9yqoSrUZ0sR9igaQS1CkjiKYBe0Q02mSGNREDAYMB6YbhKcEMiodLRaX4DmuLFCVYQS+PzM0PFtZBiLwo/qtKCx1+4Lq9S4Sw2g/Cr4chMwYjgjDn5GhOK9Wv2eBEJtAYCZoKGu3qGv1JiWUikaQsAcRNCoqCSr1RSR3NDf/slR4sHUidv+VPv5zQgm9mm/240ig/fe++BnP/uFxWrT2TetmFlc+mhhRQKEmkMALmpnknxp+tDDyD2PnU4QoFo3DliFiiuVd20pchA2QD6ZgVUjQ7lVJUSa9AAFgTzFSWqPsjEE6egrFZwNi2pT8VAhZrgE0nxelkvqfdKTNftgCpKjXa32DYEiUawCKmRdQEhepdHOj/AvzK2U1cSmHkWpzL5i4ZD6BcKWQ8SYZ/ElcN17ATKRWghcMWWrAogRhwx3B0nX4lh7mWvEHP8STATZlgEBCxs08UITHkdLm5SlWDURnpfAiJIUlwFwBb03MQQGZoxIp96qosYuV2b4rE5DwnA/RkEdkWF/qoYmodsiwBo8evSA4jGEX3/5jUgAtEGloungllmS0vB4j6TwLv9RnAhEq7cAShuZf9ulRhEW2IOEC/zxHXmisUEueVnYdkOHK1THUGWbeKdYOIqbDZWySqFXXFLeRmOF89EwRv16TbsXoqzhjVGcLJzwRnFNZ9wegPqUqGchZJxIyQJPR6KMiNuecfrm7e8/++zN6/OqnY8pghWEqBJwi9UKooX7e9dCpjI0nCMXTGJmJAFdqGt3+67kRYkCoNVzVx5YQWyX3IfwmW0RU0Nb49tiHsscdiWu3RVIdTrm6fVbnm3B1H5nP/6A9DQh8rpnoISnaMVTUvH0rCnSRx9/bCTVbPHUKqnsAyjnD1tSG3ErkxYzKYEysLlr13roknzb+1CKCKQ4SN/l3EpAdKPVU9q6LdSKlyfbkRizCWfVwfjI1zfjL8yLLJZpLIV3G4KwOLLWcBPHMev3Jjf760ysFtjRI1xJv+qOA1QsxcSOqqbtDx+dcJmvM8pyfnp2WRZBSykf4PyXVNWV58/DeylV/eBJQ35bw13GbR5/ykvVz8DxLkMEJ7p396SATOu45SGquELVE3KOwQHOk0oJDGMs47aYWKpBiMrK5rVs8jcbrXiVqIIoEBOQJflDw/Cffvrp2zN7ZT5HC+EZG4nwCjewSIT7n0qR3NbMT729V0NBzOrJDIwCpWxCWexWg7pKTFpNMIx3hT0/4RH6BPpUkCQz+Mtcpk9gTwUqqJYrlRnh0EKnTM5WXagWEkchPCkzy8RYhq9MNaz38t57L37xNz87PX1rwZIIQscmbk34WUKhrqqiVZhr+/kXD+9e//j5Hd9a2XcZivl+VhPNHW6r8dBLEMbRFNoQcwOfuIBIR1KYHFT9zaCGjqJ7m8TcpwISxk5RHO4ypmprS6pIYAm3i5zJUYR47/0X33/xxz8tZglFJOtO1KgANmxB+xH+slG5u7fbm7KjWzXaFUmb5VNJRGlaMNFEagiIFbhVTWl7S50id4APG2J1BQAmv1qbGROJQCWz4QxzCxnQSYUhQRxTRiKxMjMd1c8L8i1wqlKazPBee6iU/qUFOx99/MGT3z0y11TeqUi4Q1/VIUel+zjXw0hsGriX8vzeT+Xq17tn7cndczCgUti1KZa23BiRVLpQrVe8kJ9MXXJv4fEgKbslhMXlxP0KIQoo8sDe8Y8whrW6K7QP+TJDR5esBtGrpGK6X6ZSzCS/Hp8aaKZaogltccd6jTJHrMK0hm0wbzjEBZPPLZ4F/vayxXmH6jZXUJAqXoRz5U1tEkwyv+NV2UJoaJ7PUZFXQUBbkfIax+A4Ci+vAI/teK8882jBEPWxo1K5xJH+aTQuR3NhpwJqFherM9KusGvBANrck4usEC3GQ769LUwbxJV3R4Ltj//vf4ooIJD+orIowC5xAulYZcun7Y7w0T0CfKaIa8C8tAzEFCeyExwQQmk3eXwvNfQz84VyCRn9J37HfZtJE2KnMAvkSr34G3xWPDnfJVU32nn0rrF373P3Tl5+/DxiJd2TiILYJWlXNR5gbgx11AVKpMOw42oR51FRoCr4uvBaDG5WRI5M9kTmm9iH282CahDZ8LNogTJZW9D4LAf0SURGNrM5MOPg5VuJDJucYajMC0VeGhV+IBEQAfTdNYilrWDRnteDH1x2z1Oq5WyvE2Heo/MPqwWwSpGVtlvSobNDJiyAM7tHU403RU9ME1akBMKyGAnhSce2oVDgh0m7ZmsTlCGhsVsKlgUH19PLt+fXF5eTo+NWjfCE/BVvSpkD71aHG+h3V23s0PPsLvP9h9u8P8B8K/9N9ZPhrhI3xK9kINCD0FuEEORkWYh+un4BdosbIgoRnzEyHNhhNrC2OiTDwExtmbYSU4xiFWJZlJG56IF++N9a9aANFxhOs4tb4GX0UyWNMQWWJiDfpKA9DrdbUryB7mb3LI2o4w6lu+dudg+3LPqpV/ef5V5tLTUIYh1rz01BSTQoLwwhb09edmaWDSAP9DyeEfLRK3Ohle5qV2f6ESFWlq1kFJRdmF1PSYSBcCUSPu4mc8KDwLKVgqCxte2twrzdMrUeFLlzp+YdzvXiv7o0SgfhKosPubVjUC8gkT8+uGSJUNhktIUR02eKjgAWw/OgdR/KV4SRCf2MTd6YgGnBQUFbBYCT4WYg8ijcjIUk9ilJKrOXSb+qNyZUneXGZMZQZc3xKvYjEmx5e58K95H9awnxF2XzM1RJAptAhvU3GOIasxUXqKedRQ6j7F5OdGTmVxGjeJml66yXvfRELL057h1boqES6a4VmyuyEkfCeVLk+ub1a+Ighw56JrPpXC9GNKKU0ZfSoOLHfQ6z4vdSGmjOpHVf/ITA/fy7zJGsH9Bu5988LTC3Lz2WUSVIILyBJumGSPoPDPdSdLg22Niz5MQ6sAyTxrXK3t8MV8a+LC8bZBkZY6DTHLGqzXkBw/iC+b8GnAag6qdxCdOP4337jphfQmApdWYlK1X/Vu20ITL1nydYbNXnP8/3H729TzV8QsZGW/QhoiURfHlCm1o01VsvLLub2VICSvPHOLpe9C319j6yIfSqsFpV7vOzxEG9A/Og7RH+c49sqTV4/tlQnlGGzKGgiu4HlYhJQOGtXlRMC4EmBTHju8Qu5/nu51/7N8hXZa0G7Kka1JkbIYzlktnJwLFhqdlJnLIQdzgiHkavbZcwV1N+w14rW20yYFTg0aCQMfKdMChtpHLjC20k2wpUQ99vpq/NPqKFbtXrs1ML6g1Z6YRG90xuxb5k7sGQqQooxn8kEe8o9dcSYJe/gbglxhZWv9h6+8D2s1Tm5ND6QbqABrC0pPzk8Gi4P5rNb7797uXsq69tKjfVPMkKoiDKV9rMILfpDSKwrX+nihnt14/eHD8kESLHs7M3b05f2UahePlYC7t0QITlN+Si4NluQkJFlTRB2AG/o0umQ8hDU4fGybss/9VNbH/6CHf5NKGK4FKcs6vSsnlL5Bhu+zJqd+7AFixbLfi1q+n8+upy325zg27dPUuqLFK2+WmwN0l/MxsLFyM+Jt7GHBbXEw9kedLMSS+H4+zTtPHGHJTJKJqDOhwH3luwZvckQaKLw+5Q/xRQOm3wjJ38UfqPBORHGX/6QePSj95VF4hmlsVEf80Tb36c+UeeDBZtLM3tPnt8ghamJE7P3lqIR9LPrSqPyvTkYxZ40VZ54oCQNU5n8Pkfv4C/7aqu4gxPnj9/enm5F3pVl/Z6Oj87u7DujL01q13djpT+C3BDm3CseNgsZx781UnNKfNOFLY1aM4L1yzAMnU/n/UthrJi3JIx8g5dm28sRxz1Hx5PBsNnIkjaaxUCey+wpBgRJ+rTT0DRdCSbdmtXVdbEa4eumDE27rY/thJ4HwkYIQ7WmvRHjzoH+4evT0/tKqJcKMlf7RRqC+KP/9xFFj9+9b/3pFEBjfAQyk6uGff2s3JY7MR/EZIIIY+4tJTOhJLwmYNbZIO+BdQ1PcNO7o08xGq1laQnzIKpqCkLivyQ2kAlKEkUp0OKdEz4SDb55sAOaLt15lGI6GjcauxCbtPXalLQJCI9tko42jRcq/eRD0qN5/efbssUz1tx1SXbu7Isf6ar53PqLb5hedK1SBcCQLfpWRR0/MXByJJVu+e6jq1wFgj8+3uHo8mB5TV1FFVHp7Gt8nXGRHarZMOSfgYzaGKOULmxUjPzVBmuCIFjSKydzd5/Lkc4hAPgy7xdielfwLpF5x6GDec7WrSf997/9O29bMjabG06RwkE/Cv/j86qlfKznoAQmHGjXat8LD62Gm7eXwwmIqakqJVuswUhCOFe9DxAXRPRHGKsZcJNQUh6AI47YRsl3jXzx21FSi12L+uQRQIRrYK/VLnxP7DuePcO9IblPax+Gu2UrcLbev8il0lDHCcO4rmsY4G1/DgSChGDUAectuL3Y90BznpaNGuTeO9mwTrY+5c42ODMfGYBiOppgCs7muMVWI10yzs9myIty0RAkWgxe5moPAeOZE9johRWWeym5ZCg6UF69SosduVO0cLFvX87mnhxL91RBDfuPf7p28oTbkfubRXJGo6Msq87WaDPHvjfW1uuY+0MUkPWbvNsrLBdeUaDIMgbYr3Vt5m8YSgpQO3pVJh9iamAjSEE3jQSLsDMAF1QiOMmJTAxWergq+hfhZC6LA1pfkSfFq5ArFQUyeU/Ry6A79L9+x+XS7hA6jPlEepqh6mKE2cUM3rQiT2oWTwkgDZfRpstR3CuyNnbN27Qh0E3U7Oeza22Fm5yJZ3Ofi0E6yDo4GF2EhrxzaCvOMn0JnJrjFZphBPKgvwE0bELvA0CpspsXkC1mIiSAXKzxSkL9RFIPU0i3iG7Q/p/428xR+gLVNofRZXKChggA4YVUwOrUDt18Bx69QZWHBkmOT97O7u6RKlgYK19tvjphscPmOnOki1damHHzz75OLJkOa2NsvPlenGzcQIazA09ZT2WG6YhA5I0MGvelRYtYYczp6KMOq+oshUECAJhh+Z/LRG7nO/+Novwk6QDAAQRn503Pn5nF7K8h/rTlZrXyPaf2lInSDg8PLDE/5bE7++v+2Pm0jw/mvCHoeOVs3qmODmwadrQ3cJE7dXl7XQ+vbyYX2V4xpJNGDWYUInkRUsy1JkrnqQWsXnmusxgpZfWQCdHHjUj8Q65H939UAvevUa5nzSN8ktaQXEprVeycSrPIw4W3WY1dDSC+dp0BcFHtqYdHppush/vhlUf7TluwYJqRbJ6qncq/AL4wArtxXzNJAoevHQVPt0IS0MBqGZIOoZY45vbdmSTUyAoYL8rAoXvQL0aZkXqyhyUL1LE26jVOwzv392PvsF0ZyzlQYj7he6IAnnNNBIowjJE78qRx8jX0Av4mX2QHEwwv/v0vReTBycGCOaO5bNFuTdcXE91Q6xVur6evXzpFMlLq3YG1vtGKhZLiuD0GZsZrbpDEocuaAlMrhp2owFb/rRn+i89NegSQoYicIvpoFLUSoktFnc39/H/i/uWx3VLCBW7R8pdiqGt1B4UZUsSYx4BtyU6GOI3kwhDNN/0o+Wrl8Jtpw1k6+PqyjZzkDpRcm4tYQKnOA6bYb/485cO00MF/S3kHMVDUPguiWhU55EaFdAYIURZBmvcgEbiNaUdwPn7X3qH+5l/fP8XtclQDI+kJUXkejTRHpWb/SxjsoCVUfKvpmAyGZn1a7c9IwPLzfrt1cWNhd6j8Y2tF6zCwXEn27n2LG6+uLJjN+vyX52eDpwtaXxFOD7JTtYeQoz7g4Ps8DEFbml1pmlQIRvHamkAGluax2WSBrRgDEKG0spyoSTjr06F3V+WunuY6smGHlBGSVA89sDwKMBIQ9bcY7WBBnOTpCahVOaqbam4nE2ZvyvjT5v1wcmj588e7x89sAPIIkprFV69+vLly9disG++/i5j0NiLCgzig8OjD54/U5mTfyI7cU3onlkequB4HIcd1SxY1gFRCoZBKVoYa5MExq0S/SVOP/69E/k7bNvN9nH98USl2+fp+FScyPBVmJAerMM9jTBmybPAbxlKSToUfXs3FroJ16uZw50+/uCDZ+87aejQKMPzjz6xfv3/+dd/+x//4384e4laE4escEI+XKUCin02nRGHmtdNQEnzVCsPEdDdPDywVH9AYWwcKSOU8XtlhZMIUS4zBvP/j6S5RCBkIt4fe5Deju2V3jB5rRg2KxkNvfFjINYzftjr/vrj3zx+/sw2i5Fhp/EBKsxn8y//9BUNOHv9BjUhSODDVQJWKBE5z/3jIR1Fmp6Cn0wNcnilJWsdrK1lM+bjjNz3O6tSAPhvE6IQh4q4fmgtdhnu/m598I5kCt69uruJdCWpD/6EIrzGowQz5J41d35hDgaJ82C+MDTrxyMyMZ/2PX340YeTkyOugS/P0vf+4Jtvv/3sd7837iCDStJzMECbVjIoJ9QwCpUxPM5Au+yCTOBAIIKgjHtPEkOnAxpLGdrFMDTTnJoCd9ns/PhrkoKNEO2aekoGq44QSIQQcWC8Iw9ueISsR9CzCm+qafwLzBCWEgkaHzrTqejYf+eEE0Nlg/GfPv/i97/93emFJfj7BEcMRq51sJZOS9OtRgIrq9uKddWItZmMhj8g7gihR2GdsZ+aBhkLbvVQjhG6z8vS50Jge2lYtR/32Y5Bdz/D71qE4kkhfPcKziTBITFbd4EP+B9PYQTM4N8YvNmOQo5xRqBT0p1H39nH8sXnBMEM041hw97g33/3R9sr5FgJQU3HZQdTZ8CILmZTB3yOR9k6S+BTzXpteBMhHUQKAjDRwIhQTn8Fa3qWIQSeIGZZbZC5+V9JKHKH+Y/z371qN3UNdeqGoioLqUwOkf6KafUdjUEDKeCovEjA2WXr/9n5m6+/+ebNxcXbCycSoVV/ltOnHIF9sNJLX29EQI6bGLw9e2OprH0B1oyyf3sH+6hm6M2OeiO8MQcEEKYhdZP86IV7bQpkWRLwYWOI8aP0HyH8Hz1XQbMX8A6xGYTIWFHBqGAWx/nBRVvzqA6nIDr407wZzA0BRRnBFoPHta/7wqUrexMd8n01NbBs6MTQG7+Zza/WhoayA1kj8j3jEukY5QxpXan0plYi5T0beTVeC8JDXfViPigROF2HCiVdQRlyJFz2S/KngL7n5NqL//L6H5GmCKFSTVTfLVyhI+k0EVDroG1exKMYMVnLwIGJ8ab2jgt8cHwCC93HqWG61dp6pWkNPDgJkyG1Aw9ugw/efyGc9u7Rw5Ob65uAkr1UzHGJQHovwa9AiV7ocxv2RhFHVkgL86DlWnRPK0/T7vj5tiEDbf9z/Bvyd9fWlp9uXIJXlQcQ6qs3vEkmr4yGZXc6QOQPp3hJolsJ2/y15xQJxuNL/QVbf0nvjUnKyHE/s4/OYef+YHxyMDnt3IqyzpD1ZoGETv91FiTjr1qQqEshzSCB3vg+jatT4TyUCsJGo3b77qpgQ+bdox/etbd32fx0L8u9Uq2G3XOYI4kAIe4sWwZKSBVq65e2ChsbKoSuno7IocbQcjaaYVuhQUaUSBXJIT6Ww9nyb7jgwdHhJx99aKNqzlA7mCAP3BWo83oTTJWAl75pc6oOkzPo27ZdxyTV3HFkMmjs8Gz45FqsdHMPt22mlmdX4if+3tHFu8ZbdsOt55I/gteMfsSLx4TEizJgceeZeQCb/7Pwi1CtczwMihiCZSD9LrEKpzMmYy+nIUvGIWZDOB4XXH3v9LKzdpbiSIVGDrEnR6Y+HEhEHZgPOmaABkxkFAUbGcDid7NzRYSfwPB//ZGmq/U77Uh4kDHjGLeNzccmZJxfmfPWEv7QItBaiLCcCBmHOUiJ2EO3SBchwBF8tKdVDXlFwSHhCDEUHQ54kmze0l1brK90sTb9rGgwYYG6BKRIjkDZ1JyoudbN0xG7WLp2CkdYd/8Hlp3B1O8MM6Li9zFP1h+lcDirVZIzl4rH/PXE1AkYsL8eN3I7m9zss2MoZ6hwMM4mnvwXeV8ZEqPxgju9RG1BTdlEoQEzshC7wfWbYbD+17YSkmP45MWLR/uTY4O7hh9ECn/8w+fpqChEqu7ArQ00xjPta4y21Aa3eEsSGRbE8DRj2JB8pyF3NfyVN2FgWBAQ3CKB+llHIpmwyS/KazohTIpqZFwUHTMqz/yNa0VSLGZGAoyELOukTMwWNSXkY2LseBY11XlNw1HnyGkSD57YOr6/j9u945O305mxtwQK0T3awznkFIGV+QxnItiMrjNawpLxmBiIXdqJQXiYV2HF7t1/9TeOoExRu6Zfg2sgaAMajZupsP4XQXqFOlvx8Tepggaavc70dMxjpifI642O+I3Az2reoYk2OGVGPgeTm6CMC+CJsmnM7P7BZHRzuHr27PmbN2IM6zqyyr4hydGCyRCNSOOgBl0iDjlhtMZCKraPDys07uMLssD916SWP/OPyhItIlcWMHpX3SlCkh3gtTAPYBLHrwVGgaDAXEFgp39sk5kZVx1N8jtK7ETn6XScfhlRTp5gOzTA9IZ+NLJ1Tdcer7sn1zNnMZsWNY0hj5kQYhCT14wz71xmk14UHaMU93G8+/HX4g/tqPYuBf9IUgRN/6msRpR6S9PqZEI/px+MrODOaSmYn34OsHV/sTXzR5EF/c7RpHd0/MQA5KvXb23cNIRirMbalIGgKgdv951RoyUSoZwww+EYR9ArgwSMdGU1LF9CJCsaZjN6YTyimczAxI2roBC4h8UOm7/y7xbJoJ71rJQqdnObKElAwpJ4TQMOUdn4f8a+lkSHXGAIg8S1SFd7dH1jgTl7+PTps+H45Pj8+5y+cm1Xap2OkKM2JrVpnj7sldQJDUd1xsOJ4zAyJRjjh0Xxl+CCeTaimvwVnNfyt9Aoghq25VJ/E7xEeHJtWLVre56nP5UyqKaOKhtrGD4UTtGzSHhZv3ofb0lWzCETmEytZc4KqDHXXH5MlYTjwNNpzuTcvnNKRw+fPjuYnPRMaq++vb3IkjejLDGk2o0h5jRYU6cPjXr7e5PHDx77wghFMNUHf5Vxk+oFpRsm2CJRSwUabjDKTRDYmgA/i2pbXO+y/RTu757dZaubjGyrOOoWE5CmUykscSMkSgRobaoDYK9nHWeEOUdKNiOyOa8n2yC5kvgOUqNcrGB9q0E1TDuj5vhAQzU6VQHalD33wk1m2LVv9MkKOaeQGWE7mE4PSAErEPEqR6B8CO/UwvRegVG0R8doRMAruQgjpe0wVGRoS52GsUbdNDLFp9RPT9S1pV2Kuw91y9JlcBV+0fR2nGIMRUZbyb6rMWFj0iYdHc3hXi9LLGMtRyK9GhzVAh2BeaIJBtCAyq7TYVfpYyJKBUANJePvUaV8eiCHVk/2EWKqB+tsfNywqm5oYgdgeqkBb5simEnp3xPgUoIg61H+7NJf/NxiW2+35FCFMLjcXsuc+6hjDD2jjgQBPeaJR4gw8JRGT2C3Gaeikp3E9+ZsbPfRFfD5F07N+GFUlnXJB5l6vh72ZDBy2o08L1+fOg/8gaNAVEybrBS0fIEpEajJTrL29vaPjx6YFWQLqYbj4XOSG3WEXQL2xF5MQqqPTkWHtUYO8yfKgVj16IeC0PAM1JXiB9pNSkQG7nqifrD0rBXsvcrIITJkkMXe4sxPcwSGfLyNvKbHJBF8vM5XgDx0SLKlWZlJKt3lctWknsPR+NHjB8evDx0hPEBGYaiwKpM1N4tR9t8Mx85Ui8o5oN5B/YK01eHVFVjn8ykX5KTzmqKMvrWh5zKOAElglxSabKU9IVP82fZnvS5cI/pb29kebq9R/LLRjRxBPuMq6oAuIpgCINjOH1pA3ioGG51Qh1jU1pnObc4xCX+6nWmmoBemp6oTALYCoyKohBH5PolTTceiiYGcvI1/WLxkCxFkkOPrMpXvfdrN8VlcBjrTMbKXQZUCMRIgFYZYhDo4B2QPtig1TO5+7tBtgVXy7TLu3uRJ+JvxjXjCmCIpHI9K68Zlbsm4s2+y5VNM3evLhJdyI4yHfIP+sWPT49tjzXgAw0bNm2gvEahJFSEFaJ1fzPCp0KcC2gfkMFZdQq7bseU8uB31ZpAy0qcPbln0nrNfrq+zD2HdzvrVPWe8qSn5zYgw0HVhIg5lzBtiMe/3UmEe1D2NiN97mT5OHhbSJS46fTXWHbtMSa1OZ9adVeaINGrBvS90JH1NZH4Nklg64o4K+VBZ95YVwVeTV2aWWQVKKvSo74hpllhxruR64hsePop3bZlLRqyyuUDETTTidXVO0BD+IR6z7FB40xwWuh2trnyFRVt237UtqbECbFpGYcPAMueeEMuyCO7uiOAhQJsckQ83YGk/S6RcBGYhTuEf5fQzKxolJwAihC+K9A2hW+FHwvtZ4bheXfe7vrmQ9RcYGXQZgNLZGhaiDgbVojOENP8iMWkpxg070aIXjSBwDG6aZ3RWOQQDkWJ6uCGqnhA+SsBMxEpV0n9XW1iU6mCvypAjCkGPkrZ+Uc13VIgMyBVYwADVKpmsIVjzrJQwSCQGdGJ8bKHBYbrgY0x1jToc6v5UFNvVGR70rof9i7e906nNLct0DPSmIMVNiGhsgKrRc2XToMP/owAOlewPEgMWROKu7CNIo5TLQ8Nq+ZgAnUJUEEBBBiaQ9SAluhgExT9Hk6LUHgs0X9DAOr4DpUotQo/yF9ALt6OTgHNfLKBCIRdq1FqhcMVbNbixLc4J2wZv3OsgO5GIDpN/EYqE6JDx1kdY6jMsOjibm/n01FfAlqvZxRlIcp7+yiwMRQEOuzfe9Cyw6PhCQ0Sdz7p1Dq0jx4J3pGDAa+5V1LRVZE2H1W2eAVMxEVqkgMp4E3aBeJfck9+MXibSQUL548C1JG35L3/t5omnEAHiBYlTzgoj832iD/TQiD1wxn/8FrHSvH0H6eH/di0mtPct9B7tW4+JCsjhKCdWDVGIvGUHB7Z+kmBLFa4ugnyCF2v4x6E+XTDJJgAoA8NxeCVBk0eFG8qk3fRDwAqBjfAj62DtyItTQqsMv5LfWszENGWTbTYs17rTLKW2UN40f1ZkE+oc1an5oF9UeGcOPODdPNeQV6QCs9BF66xbNJPkOycSspLFNbQ9lit+AYW162Mb4jfWwFM/eW9UYJymJo75un5v/mR2+uqVT0loIm6y1BCZ5NeiTjcRUFulfA1SxBAVjEEDVY6xpvwkK2rv0CYlVaEwicz4BaZ6RDOdKXlwOIj4618zIkmmsggzwIi4npmsKoEt3U4bqkUgzLkNcSX3mtPTJ+cWRDCBhSk6BNyaJgqx0np8DwlUxZrTM9g22Jietm3SOgUr1Zab2xsz1qZRMpNi/YnvNhEhG0UwlUMoAU4XMX1qQeWlahn4uIOMIS64CJi5D6BmOjE8zaUg1juvaiQa5xI8ibMkQyUOlEofizNhk8l9GSsjMvnYDBHlRMz7wcU/tGfaonUxLIazgZvxS9nghi0l18EcZM1HyC2lXOQ5+Luml4D+RoMwFuC8/8jOBIMklrPlVAp50qHIFEnGok3A2EBjDBKh8Ult+KQB1FdNMkdN0yh4ooFIHCpHLjWRAq5ycKu0RYue5CElieUunwfMAZlfRKWHgz1fh6O68t84yk48h94694xAG+1JcR4LLbCJBCbwy9Cf6DwdlkyI1ocuiAOoWk554BPrHyMSSug78ohZj87Ym5e16sg00GpuAk1lbAxvgyBG1jhqK0wkvYFUXnohB7xCYFOQS12GKYpQuTzZfSCRXkdA1IHqzi2DAFKggvVOIqhYgSz2QwNSVlgpmtMN8oU8OOuR9w87i/6cnhkDx0pPi3Sxf+EmAGGFjjGLFqjGUkS0MhvsoGP8q9XmQZkgYgvfleYS44SRaMgjsznxfaEidtBHSKsjMheDvYgwG/hZ+p6dtfuXl70V4bNRholNloSJJDFLk/KZzFQPDA3CX5XQNATfZCHPiR9NJrG2VLrWmb/BJcIQWYrYwpVqiEcztmH1/N44GxDnCwJHkOBQ5lHYFgCCTnayZsAb4Wh8+oHFHDbVjTXdGUyspAl1aiF/q//HaMSXmFpldQSjAuE4QtEdwtr9hQzp1qFIDDy2DfIZM+JghoI42+nTo0Faz5KMTK9qsT4WFk7nPgu1xsICHieTtkGRCoUOPQumIKD7KN+WneFhtEgGJam/gD3dG5N8AprNWlhihBtvU0UCrUSyKhFcyoMFnsOQjaStTIW6dP2TgchHOmqEQlnllI2gceZxPBV8ZiCDDsqvHBi13FCSn1gQBLAK/REQCYQW17ORJV/UCK15MYhQBJ9YxRKJn+EZcjqqCm996mqydM6XH8gvoRqiWyDvXtUIpowqYFAQZhpGu2bPIxOMSDQlJGPnLeXgcYJPAI9n2AaDmU9vFEwlKCmbv+Qk7A3TPSxJCznyPxFRZxhQkZtnSVFKFYX+4ZDpFRLJqUVYohFeo4jOHimQnZyrBAp6U8DHPM15ay0TfghAsmRPLOqFv1RGC/CWKVltIHCzPS4zeq0BEAX8yEKuMjgjq+2dCI0wP0MbMS65QARnCqNUnPvAKFySp1KUpXxBZC332+HMor4WFCDDTRILMJVArAm2IqrM/yFCTAQimCLh7ml+vmeRyYLmqtOzak2GUgaHTN+TK2t70NEpypnRU4nRlPLMYAYNSiBOzMko3/pTcZAMG9I0BYxUJx4gsRHa+gfDkMc9sGO1oRHhiS3BURh6rAr9OwY0GVv2FPMwDWi8/UgB6lJ56l1ssldEyzW9HMl8UiiVV9WoCmOadJlqjjXCJaXC0Cvwu4lMlF1Qe6Yzbbmd39QnGtNViq0TzdPLDN4mMFlaD2lcQRloqzfcIn9IoLZyBEihFlWDg44HHKbcrJVNKvKVU0YFuhcKRJ5KnAO02yJSxGaLRmqDaiEYciiOEKml8AdhwIuoCkxKHACDF3kSuBuPQIQwPAulDQ+NF6Z/RBiie02UKoI0aKzfMbu+Us5mmM4mXjN9hsSIvuGxWeo59pe3piU0ZltWDHzYWkEWRQBX7FrNWgYFv1m9pthwCyL0vP6CD6D5PxwpJIN9lkDBIBhTokToEYcUi6y0FFlIgfamhKfkNEKhytQKPP9HiKJNTiXKmvAQBSIlgblvcuJ9TTLLmYUpriiMWxwEirGfVoD4dANjQFaIE8aXqyQ3qTQfN7eDICwNJQIhQgTFWDV+Vv1N+AKKJhMOJE9BjxYNwxSgJZGHAjrK3mojDrK7b3LOk5UhwFzYqAUeasYJV5ITdwLz2BPHEImkZUgMRm2zkC+S4X8xRTrWPsxGP7L8rI60Jdcq5ATpP7uoCtsDeZ18NX2jrylKXiWoNr3FVSO0gUXlI3PdzmT/oGhcZ8dFcD0DOvfnyyZkCV4W2udsRXFJ1jrZmBDmIzIJCRrFLE4Ptn74GTeHplqvn7mJPoTPyV/rZ91GiPNHCkO0mms0wHPY1sQ0vDyLzuoKcDcRzeqHlqXLvAnRi8JWzFLRjha04cMmIUSNoew5udcIjADf7qJIf7hi3Nbm85lGSYSf8TfkrQDimpCgwgpi5RugGZmKUJGCHDLZosAgTkgSOoCgHcQN7EKyMMG0SKYWQ4hKEEZSNcfwxyQLK8hJCFokLQ/EbPlKR6U04ZzgWKVa3mc+TW2RhYgEzAVRkQUy4/NE5UqLtOlxiSk0QTpWvZvpdWIHXQ+1ZjV44GBLIovitHINlLpSmmxsKYHEGO6UM8lDYuAg3qhkvCyalwyBTqEAlKgtLlW1JRo7HOpVqr3Dyk2ji2sKh4CeyfIuT4bMKnlI5qpRjiOfaFImSlegxN8bpSaf9Tmk7CVC4uqLYI+wQXTAXV7MfdTeN3L2Dvcn2kvPWr5oAxSRspIXAUFAXY8jxxGWIhVg0gcQNKfvFEtopCJyQSCyNqTYzDBExrOFB2t2yFHlQiQX7VQTuYjPmjNLMzEyhCFqHwdWeXPJwwQISUaAMwyXWebUA776rw1UtI4DiaYvpPwH1ESrtJCDfUzqWQR7WaN5iefVnm5LYn6Rdo5LFW+DM2RI+xG/pET04ax88ti1puPvaE0aWSaQnJVqZwGhQtFt2ciCJsCaauva8HetuhPbTHNCVBYK4aRQRlOV0UBW9nryVWpo3ArNQBUDzb9lxKXGGQXRzcnmuUQMfYaUaUEcOpEZJo9AnwixaxjCHCa6i6C0C6rWm0KgMF0DrsH3LtGz1EXukNzaiEhN6WzshsgNi267c2YFl/I1LjM8xLkWfhBTqBoLh4CaGyGqt1/0rVYaLVqGhpWBo7RYS1cIFPyVbVdAtUrgqR755ZSKnGSPanhvDWjMNDsvCKohsy3U6knHLOTIYs8s3FquHEe/MZbX4IO6CqQiZcRM7VtSxOps26OPeUxfUsCszdDSDyOczMvagoiMFMS2K67a0tYgyCCjSz3JPgtN3BFZTvgo4qFXUpZh0KeSoDibYkzLH6hIaeEvP1jzFpnibGJS1SO1zAjnK0eU1nRrqU28XOqPOdO7Xflm2HUbRmVIGtqojITJZICqUpqhmFlJWwqSkCkihuJCtZBUrIPu4qjx7R6yz5aOOmJn0FvCvp1CRr7AhBBcj1dIfY8KOQO9pRjTGHCWHBkSG6TBECgsiR4Uk4xWZwaeQqkYTOKbCjrcVgIioFJBuhQMmO/ZZISiqEM8LWkaONuURt9Or2dvh2f5KqSkRm2EvQxSEcIYnnHENFZUcm1wAMbwEuqUgUjr0RMjaQZdruf94QwtvcvAX3weRdD21ihEqrOwjLnamlB14i7i0FCUR3dn3+vwIzy7WC029gIQIUI7DxsFIby73ZKUYUB4lCsBorWRR1AkNMjizAyCtbZQplXlp+Uob8fnoQL44gAynxP4GiEymooJiTxjbMCRTJvEZOkjMUrUcZPvJ/Ghvu7DEoraTaYKo6GDSzqnKix5jxy5LylIsJCfdcWFOJZbH6TKxHuzVekTpA9ND3eaGPyDbcIBlxg5j4K0IENcED8fnUiKTJvF6xzMfb/lNmdypa3IVMmgAVHjpHPRKO1d+5b561NeLuFK0p3nh7aoQ+87YcAw/Q38VldRPyNOZULJY8J/VDCbadJMeIqFgYS9MByU0TCcw58Ap7AmyAbyuy9oS5zLT0OpZdBQMjQSQyvcrVpSUwq1sqjmRmckFNGPxZ34qe1bUJuFPTw6Ics2W9/4XnqogKiGlnP2X/Anxo7o6/hCqVU5t8YRPY3XxxJjUSFHNQh5BFJjSUF6bMgZQsRRxAHlZ1hh1mM05DbznWD8zOZtTsWCYxSQ0FsZ/2LICiVKm+g2Q/41MFDuoyQjKCNXdu2pXTchizJJFvbGSYUMRQhuJ1QjRXAxwyXwM+aVnmXMEZ3esyfg5MjBLTben7/tG0pFBA4m9M33v3zzK7MxVjo5mN85FPERabA6KvF1re+xZX4oLlUMg3yBIy44oUd6t+Ep0aU4UI/sOOuNk6q19zLE25Hq8FNKRSVv1ka4xc9IR13bTd6bm/V9hnSZYhFRqmDb3rRK1KvVetvIG6iURcfWkfYTIQQ+k8lhbI210tFr48m+Nzx2zBe4RIP1LxiHcCESPGPW4hQ0rJaWqurCgbhUKvzDFyPqBJLwexFdqmlthMAg5VsPOyWyuyWJOGJaPvobVYmuMlqwlZK9jlRwr3VfffOWXklwSOGy2xHjeoEInuV98xTiPU2WZ00oXUJV5In3Yb0Pbw8x3oFGFkRPDvJpLDKY45fYDLYz4W6dfqtS3ojwW+nnUpBVS5Aushfpo5e4FT30eMuloABiEmENlDFfLA6U/g8iJObOsG+jJpUklNud6QyXoASLSrkHCTHLBE9S+JFALHraiJAn9Yq+UpVoChyrb+ENVFJfdDDbRZ1wpzyX48OX7B3wwWEgYWrZnpkXwhguFhWqcKa0983BjTNuKdXDXO/S9qFWtqyKm2459EJ94VbEs3S+VSkxVcgIVn33pMRbmb9M1U4uTG50DAFBRBDwjuSAoRWqdlpmchfBKp/lokZZdvXkBn1IPzJ5RRbog1ElcxEZQMkHkrP5waeBnVnnXPysc5E1AVHN3iIKX2Xi0zfiBGWhWRLb1GxaZBWoxRxtpLkQCqdjmln1xDIwj62ODMgSISE88sUgb41DCIGxzEHuCoHUDJ8y3Z402wGqO+mQAZmSW85EgPF5kJUzLheEbGPB28DKgJKgSHahbT5CgxAQgilHHvevIPRN+RkLBbhwPN1QdQsqjMMUIfJ1aU+5GXYoLqmQVidGBRSgR38Ybtjmw5qhqG4sKzEQQ8Ijygy4AjyIhiyuxaiqAGUQWiUmPsVHFms4VicSY1ZJc7H2tYc0HoolzsQM2GsRos50OhcVWIay9g4NRO6aK9JECIpVqs/YlPl3NZtUsCTKcwlsZE7/By3SnBkWFPWU5JBn03UJ4AQgKtRREedmbOueNkYyQojmLACaWAEt4etTiVkzlVX16f7W8KS6Zc8k5Q9TYK9qw0byVIvWVKp2LxhtYLgVULYAHkNCFGFJOhhlh0hElcVMXi3yFuxjtjJyVAftYJozaMxjG3U3qGiaSqyCfV5HibAwpfINK0qzpQISSDEnGVPM3BW6wJfeEAcFNUegkBIbLK4hosoLM9LPxruwLIIKvUhyzBmYNVCRebBLgnODvhFCnlC0EphiFsoKtrLhWrP50bCqwZ/c5NrqAQ1U8rqeaNhNxTkRCeI8Ppw8m4x8JuR2NZ9eXES46Uh8dQ18lP6pIvHCroqsDqPWXKyodDYzw7MXocA34hOLo34xLRZrPMeOxup6ZWZZmXzwOClZg7Y4JtN8ZUXiDiU/pdai2igsb8tckXcr14hANNEKlSzFiRVCGqzFY6W1GFeUawupCnM4RzGj+SF9HFAGLLLl08Vk7NiCzQedxw+t/bJ/9uri3NqXDIhk9TjuVUAWP1/W0eyVwt3+dQyI0VrLBHp2HR/4DqnlG4gakiNkhj+j91oHOrKALGY6uGWVWGBq2MZINiaFPTGPeS7jlo3Igb5aBzM1PHl4bK7VT18W5Ei1nsgwh3AlZdax4a6M+lxLIlCkxkU1VPmY8o21BPnsLbYD38pdV1AOu2vbe779umOlqznisDGxQcQ+/Q6u2+rQ1dJHX7MHAIX0hoANhXwU1dGilljVF1W1DMOjyeGJr9MfHzvPxilpkQiDhLerfTpB/kuGK3YJaTRVilx887qSH/IJ6ZkAiyUsFDveO/7k5z/3WXljHl9/+WdbxCOq7KBoilxUNIVzqQU3FPdH9UWIPGzJqyxeHjL1/Gy+n2aNlLWJJydj9opN6fVOT09p90yrOrLOBnSTz2HW0RpEUjhtVteney1ksxQKLaLcQQq/0v3GpfrRNXr59Mkj32eyssgAbg7HxC8Gxs77zALW7twCKJIcMY2/Bafi7eoGQFr0k6/RWzBy4ovBvneH6s8/fP/b778zo8h3xdplU/jSSKnQg6iVIIbcamz/0jiD7KfctT7OoqfuZGw5iI+oZtDG/g+rcBgG3fpO1+jepXXRGfnxfSr+QRAwiizQTP9QPz41wWY6/Y303BOg8TNrSTIcOpg6WHV5Y5OdL0n6Bm2+3i5m2KyZdf1KV8cQ67xBOoSslZJ3yKvhLqGeSNYSa/1fPMFYAdt4Mh454CI+OWtWeR+0gC4go/2FuxpUGJwR2a2H8cKehmkMGfbYC0MWWgQoOqLsom03UJeDEzh5+NgyFNXm/uRhxQUlcgxVVaWJ3MRVkzot1IMmCwk4aADVyQlf0+klK8Ih+iZtz7d8bsc+q5xPHwPR0mx/1CMBgnaEKEEjUw9xOlmzYaVKiMspXl1dHB6D/0Cn/uHTxy+/+Q4hK8CJ3WFyhWaxXHhEqeMXARbYVELP46JiWGOV8DnrF2mFD21ezzZXVyJoS4fNaF/Pb5yoYDFdloJlMV3unNeTEZgYrJiv9LrBFqzTRiNwo0KeAddjtlP2ph8EQVVjR7MPuyd7E19/NtljuC+4xiyoa1uPnzH78dNinqhoIzqgoT1bzGAMomfPnzs+bHJ4gKWZTym49MECG/qXOLgnZmABrWrVADDAuUcmnWz7e3qWMIkNZrN1Lz03rdn84puKxP7o8OT502e+sogcSEAkrSrNBGBRMuGlajJeDVW031av8m1LnjMyiZIqUMNSIcbDkyOnaqLCQ1Zzf29hTwLTkAB5C2LqKRFIRZUAHbgzEhdW1n1eWOCji0uMrfxMIMbpiJes4qFwUbqQZJsynEBl6QLBLEkwZMI96DkudZxuNvMZ/2AWQKipIUW9on64y+I/ffb8g/c/AjwkYxBufEe3ak8DwZINwK4MV0Ddr9ZqQwMDw0O6l0WEPLFo5PjEJ3fZNJ+4sqfZSFFlIb2NOa3gDvZQM08CVcQF/rIb/0JGBsBBhGjoSZZLmdTKopQUIaoJzKSSMaDmIZqwCnoEujFpzNVGIFEAcb41YqXLTDCpK92QPZ6Ww+rq91oEb9whxwOvLODKTpusWRP4IGh5hbg1TQX5IMwol/9rELe+bgyDCNtGBd0O6yxrXBFYjG62VxjVKdpRB+5Jvymo7FLCPFqNDPShVMY0TghR51uQZ1GU2GGk1vGg1rHCNSjAu84mVyg0aPUVgWiXECi+CBUSMpkmYtoXNz0fHeUFoysJaxOEFUbZA7dYnV9cwqx1XVeLyEJ6eKjAMhcJ0kSVLGLUfXuiFqkExvuojOluW2hsqDImEaM4vdZ3MeqWbOFe1VNGUeYGuqv7ljSQnDGfKyfS4p97j+haIy43GQWVcphO5lQaJKyCWnbxV+wa0hJ4nWcT5zcG/abd1ehydDixIjYDqNUrzSnJem7UPZ0Uhpa0x0Rb++Qz2KYJM+gIMijn/7I6jeQN+OLuFnr3+IzewccSspzIfmv2FhUEfYeJgWl31CpUrbTDOoQonAoZ4hlbm4QJDlAiTW4CMWD3snjXXKtfIOMv0lygKdUImDGXGggXQ8cs2beIUgHGkPiLIHNMh61/6fPToDAbawwLEg3agXSixJhUo69qQcWoRNrwLhYBrNEGTRUad5fgU9SI1BH3ZU/0he2jnrMH1nafc4mCoazYxytSQFXVdC+xKZ6XsECsoZQaOUutyEgKSAI35sZCLI+iswbFQoeYOlF13RKHcCf0YOS0BM8of+ass9J6knhcW2q2xhDd7ARpTVBHFNlCQoh8O4OJ4MTYA2inheKW3JCIvBYKfkrCelC2aEqbwimzEuZB+aTYa4YqZp0TS3czApxp3Wg1mkRz0yeOUfDQcTo2pYKDsPLaKEodYq5ER0Muc8RZivmsrQDIuAbRdEEaPUlqKQNCp1FuM8G1RQd2vOBfRi1GzsRmsplwY3Irw19FP/bP15ePxiacmFQisAK5K9Ac0McmhSeRhJj2uotehABF7qJBkdxd+FD3cAIIUqCN+ywHC+nj14oo6XiR+tCuRCruHSohb1Kqou+SlrtZmAkBsb3hG/16mBQphbGmYIJsCtg+mG1tbqvLr1FETsThxKauYFENwDDipjbSzlpnZY5QpZwR/kM4Ts7P4IEvqTWjGHlXANXVRQrPmyYq4nd+JuyJjmSGIFyI/wafl576l7FuOuJpZEQTeRfKFlXTOIaVtVCbDMENXeJwjGXYx5tPyQMGRbkMqamqEtUKTPCoSqm2YrwoHIUXa9tik32dhnoYEavFZ1aor6Y3Vj/qRzj2VMMEzRJhZ54TEGFIQOZQcDDB0o4K0LufAKpJKSiXsQwh6hE0JXT01v+YXdDQNFY6XsoSYcN+Cb1JUlFBvuCNHv7fMiNSphaY63QcnOj5HTMHUGrICyjV1KjfiqatcEap6GrardrTNVw4xFPoYznFeJMVXZ0r66AnCcb1FZgYfCb5ewfDw6MjQZlyEUY8S7iwzLqmO+F0UwZHnnApf/zDWgiVLBQc4UORJ5cShuhlxndt1Vh0HalpxmdUY5oKRWyaLKRqnOQ28i+aofMboWMF9o4enOAYeUYCzZmwzclhtDdjqGmExDCQaRJEqmJHyhgwhznN8vrq3GIE9nRiuRlakZrsoaDU2Se9Nzo5PPbhjaPJQXbcV1gcA8ReLhYXjkq+nm7npkiy+pEh7e5SI8Tu1/Yv8xkmVFPwRKQCE4iGsaihlVWDxHChY6TGBHTmqnlytI/K5F/hEIaUMRsaCBnZMca+rmw7GJFehr0IWCKZGCshIfzScoSB/EepjTteT6fn11cX09moN+lUd7SGDIXiAwtkxAi6CS+eOefzialxRjirY8h3RGHtm0q+IHL6+g1rXSmin4Fp9ItfCk22b9rrJg4lkoEkOioVovkjEXYeK+G61hjdGAE9GvZNTfjZpB0+/rlnWBQTKaa7jkWi5vSwSgEZSCtTh8N5t1blEGB2MU48MCEGm+ZABItVMPxyPm3DAiyhkNHIqVFgEwp8rOxIYCjk/fffe/r4MWwpj1AvDMp4U6Ieg2xClfQp79CFquRnNRfU2s39qwg/JjCSXpwJVNDJMoSgrZtcgzRQzGhFbAqcRTW1qyey00xF0RsFYFuJX8wEfEKmWuCvt5cNShVC8QusJspXs6CPWtlSa+mS7ZKLGwOCw+5YH255u9jrH40i96OMjPR6es360TwikcTdchOmquylLJryQNWCt3HuRQoyh0KaiiyEHK3ZIOtf7BIcRLuxl3mmplBKSsZcy9/0Ro3b4gE3TTOCAnGrfxGkmE4BA+Va6zWM921nSW8C4eiIf4rxdhEQUwp2PRuF0K0JbCAQoYlGIkvhvzmoPQfuEASdUmNdN8fWb9l8eHgk15PHT1XL0DjiFZ88SV9jZpVrTJOe1YcffvzsyfN0kRomYa1cQWdLAlRvGAaE3BXWIUjLSFfdsBIMPtGAs8NI6TrPY5IQMjJGYvLIlTA38nkc1lawUAuxU2H9hDl4dPt1eyMapCkVGzCJwkXsIoJWG9Pb9Bcp1Lg/zrSvDL3OomP2bWq198HRof7DyaPHdmGqnBe4Wl5ZB08o0haISwYQSHDFQtC+9DHxB3ZESkO4V4wNunmaP8X00gBkghCTGHrlkr/MP2LUmHmHqQ7utEGTiFISlDKJIyNl6qIf+aM3aVdqRjYytAUNLXthsNcYpG67oy8MZQKB2OpKQLZayyi6FhDOP8X0tLgFN+pn9n1T0bE5e/uX6Phy+pIWPDp6aFuQ/VFGnAAeD6h1NwnYQVmGidOO8fbGWAQmFeI/eUnbJQdFmWBF1VKWoqRycXh/5dst4llxP4MGbG+pQO5Bae6IRlkDGQgkTYNeQg0/1WcvT35mNYQT2ym/tULZahk1YnEYiFIulSErwhmRqgqE7hn41USdnLM21tb96kujbGKjhyePfvGLXzjlUHABAdSOvYpCEzQkwQzEEMRayhuSxwJ5mdpgkLvQpCSB3IeGYPVEXQpX6BaXQyoqDtdIvZUpppKAqSmniqFJ1ugTrWhHygMkWeRBJ5TxXgOBLpsc0hcoVU35jKNZQBfv5q1U1ceOMILm35izqoTFlHnD5q9MQ718rWxmoPf3DydHMiuoKjABLhh5XVXFWzdCuobMhQOsit8N97R6P6niTlHkRxf5VBguoJIaGq+iI15B3J7INKuJvNUwOyeTuyhNxu807ZpXlRgzn0D7wx//+O3XX1sO4LDO5GnMwZdQUt64YtS1fX5h56Maio76uhheE3BRbpxlKXkKo4zqViRd8VLUMIhrK9vHjmUWg05rCOMj4PVCmTRV1zIasesQIU6eby8lGoG8frdmXNUQuHiMIqhJG+2GU+oLAYoc2fXq1Pk4iIw9Hxxgl4Ik/Pzy4vWpE2tzqszwIHGdqkq9yhL7vUsAiT+MxKK1rlTgVTs1A2x0plJcxv5+4ASbN8USFErZUIZGOEDA0elldwIpFlbSgBqKZcFaKjnKu/YzT9JcOuDqxvb7BUMDMVTmeFvVKs6NPFp1Z+CH8huzEzs/fPzo5NFDVDBTZDyItYa/bNy+/CmZ9ZNRPRTULnumj0F75NEXcI1WdxeRdp2YsCTnx5tLEgjoRKlZBrW5Rn2rhtBJPmO2cspk5BxbfFokHW8uXPToOKtYeNmKtoVfQ/4OEzWgE+CK3EFPtJH1oC1fKaFAkTXkneTJxcMSNNJgUmRvcnBw8uDJ8+eHvqR3cggZR4GIERhP7EAIkm16XORrXFc5yMNcgyAsfGrNPUY44nI8dCq+AWDxoJPqGBGbKQl85KISCpYA+xlVCJCBSVnWyFhWWBr+aMMoRehuOiTWTA31El5b/jfVLXkOqkWbiEPQi1KE+XnqinskCRXkVmkkI7k8b9a0o+/w6MkTh5gfPDjm3g2Qg9aZQPtHR6JaE4rGo40RL6y4NH/bpKARmJeGErlVYRkdrAZ/rSvOhy1ufC8mC4vyoUb2MWJQYUKDDYoBgzg0PxGw62iVEt6MoKWqKuAeuUIzKOW/d6mhvv2dgCYin0Gm6EWo2TIoG1gzyRMqpEKpXuaHsZDBQFDw5NlTQ4zOVsjaPGNlqa8Pfw5fLnJKh6WUbUQsG5EaoiV5WHM3lg449Hg/9pk5Fkv1TQLHmoCH3tEFzeFXLOOOe0hY3IVhtgglmcxs7FJ7cIo1qjCrCNEgaAKxu29yggzmcsqWEL8igWtaCoShu8pottsQtMwSEpBELkwk42gwvtqVN2XR5aO7Qt3pFa9/fTg+cC9ZsZcD6ln8qniHSX6ycAjFtImRLIQwB8s+HW863758c3MlqmQ1ou9NXuAFjJLcCggJZhFl8N133yG89fKBOR8SoiZxxUU1zUWAgwCVbqFOrEUGsyUylVeFdn7v1CFVVSxFFjIapKxHYWOIrBqDo1HNGp5qNhKYpEATWCI6cLCYf0e+HMekZWQzDhX4oWCz7QLHvu/RptYyk06wtKLTeOO++QyB1mRy/Nmf/owzcJH02V1TSa3iATRgw6xcewNfJ5PqZJC2wiihL35qLe0WtgonhRn1MEFbMbyokHOHi89g1VJ7laxhbVLuKglv/RARHoxHz9978ey9FycPHhyeHOs2wF/d9sB+8fmfBQuUAi0UDCShKdqj5zaiaVzx1qsSumZ1Vwfjo2fvPe+PJ3T74PDYR1VMh8d71ISwT0TowoYn4CltiNmOgGZ9ODfZZZC0Zz0CzkS5rGXPwRKkINiKSMAYeWjlY1cqlWNWUAIiKtA/IlomJdhjlD8NaBDTOvcIIr8BVkeCIac1WOrWIpUyLsIu4goHIWdCIYwqa4UUcVdF08hEE8yaW0dWI0wEzTiqhRv0gguhLEAsCoYZ7sPkBIfKgjY2AQlKyrOrrPAERUUm2uZfmwRuMS0kcrmLkdqPqtptQmW1F99abWmVec1SiGRKnpqGcIPnEYuaQcDgjCmV8/NEC3rK1PPi7VtGoZlZXAEMQ5j2pUb8uiK3d1nGWzbYjgXhATYZfBmOJo4h1GiLOGRv9eQgCgMNkYdtWNSqzNRoaLmLZNGCRalWthcoJQQuZGSTlFQEa6S83aX2k1LoCRWmsYhAoV2ySWoR3aZQKUtqqAw10EfyfVw9hweSBZmVUtxNgyPglo9Us5tqC+Vz1lxQqggio3uO+re0sVoMRSqmQguZoKYSQzSlRkTSM6ROvNF7+vSpeklgKivH7qYkedu8n1KA2F3v3+RdJRnuJ4win1sDs8vjb4hRyX3oExEIqhGKmM/6DmktFPdcHiCBu+XJ/S4Jx6qad0DWAFocihuvFMUdTcBWtelmJDJOs962ajxuN4PDg2Oj9FclnBgjB4C0mu9bAlqpXRJWbW8pfjQ2KEVK/Al5xXZthiWvGg7lImLHW7fPTSvlxgSCx36SHT4e9IaEKcKlVUfXiRotZCBTjf9lmavaqkGNqZ9ElHSoLXgZTKwTBgTJJBf+as60WREXtm40WTYLyFpGYn9iIRJU6GkIWnZUzt+ClbwlpYkqpOG0XenuJq/igKVQUGr1pDlxTQlXKizWVbZcZHOFeauuyjkWbTG78sWP7K1URBOemzGQLaykIlWqZZbh/s96mJVBjb1+NjVNJWkj5jCLKuLjG7RbvNSiqt4HH30obqHnxjBjEmoyugGaAkVv1y2mXtxPGUbbJu1JEYL4tK3Uobayjee7jO/+xnEGzwRskQiz5yZWK4FMgpXU+qftiasifGYLQwJh2mraHoXypKVM0mfMb8u29lDb29cVcIm5EChyo+NtTSD5kSOKWixq1UVadqkVLqwDh9TQvrtPDX7s1JVXJH9bzpREpM5SQ7nUGvzmGR1CJva2GbbSCEuCss9Wha0J17RS+Af9Sg2ulsE9dyNpQorklSFsSLUieUgYWrC0qzcFiayngDECSS+aFCiJ3u63VL9XQHYPq3lXElvKWb/luvPPBU74l0rC5i17i8cmEKrbWqLVoFS28AoOCZwvLslIKtwGZgQsHGrwtJx+umFrPKzBAH/TuyJJfpLrslqho5ScO/MhG2EtWjT8a2jCb937NghhYkQeVhVFkEbhlhtAhelWlhrQnrjJvZeowTImoqF94XNsQlLGF4FVElByDfISIkrrGy4Si5tSOQQhs5KhmfCpkjlYs2+mK7d1FuY6kcm/S5mjlard4rZW/AhgNGVrGlrBusrbKicC6mlJ/QOeMrO6tGt1a+BHNTVikxX0vG/GC0KFIkR6ym4IaqJO9I5x1QcHJ5tyY4VqhtB0Z/LFBaMN1mNnFEPD/o9vb63iTwYDa5UdTw5tsAGdUdCHMNESfe6JI53mPCE5aMEpsfzCUu2qBjhKaSgk3yX9MQM+wAZt09fjybFTN0LXbBFLwCovGKgae9AYEpBgKQua8e0I4V4+ph0SRWIZwMN/hgplifIcHWRupEmOiupUAlV1EkgtG8MLl0thQ4XSapWoOWJ6LyEECJ3VK0/grf0OippxZiBmw7lzTc1SgKB4HDsdHjReRg7LfNAaQIbDBSFeoUoFeAWFS8aqEKDKQTMUIVnF17WetQnS6nAU5sG2BKDBDejAXXEEDNDZ80IkalBUD4XkIQ4O9YlOsu2hFJuH3uEyatMLgEgAbRQAoptUnpSDUaihJbr2PoNAThTxxHOSsLRgTisRvvjlet+qS4WtkibrSqWVUCGq7VWDuXKHerFmqSMSirvKe5rZKyWLTHGqhWairvvcK2UvVBulqspgfi9FZu1FKjsEENCKTeHv/8a2LRwFlrItFaqEeHPtSxWZsM+EiqZRWt1k/l2CGAQMTYSQ7qNjLakKzo0tDWw/yaURB6+25AfKNgYBXUlAGXQ10u2sGEZyjbUacZFGoEL7mSyhrJ/+RpAacXcobTUNx2OLtWQVO/Jm4UwMEOOVbWwhQyQmxN+lBq524xftac4hOaGCq9xMTfB0HFWGf6K22mYtsAnKEf0SQM17nipd6iY5d1GDGMSbQFX62IBQK4RSQ4lR6qn41SckVB6eyd2Qd1XmXWrCUA/bq7RbPzPbHwn1Mxgq0pqERgKbqiJtVn7G0w0SS97Uw1Cv3Jyv+vhOw7XinqcSg8qS7V8xHPnZKgvU9zBHErNDqq81StVlCBVkzrb5vAgXk8LRIpbS2BFmM4vZ7JbRV1sV4p+RUMZk37ZXjbYneSqFq3JK7XdsC/DKXLpXOf03dRx2QYSS6MZmWUTKNqFVlhWoqC4kqkpbmLw22uhfiZtS4QdLGTkt42om05NqGgxl2xqwjSJaLLKi150NQmuNUqKWWnMBOGSLw2C4SoSdtl7nNaWwPg191pevqAkht2QAa8MjN1v8GyFkoHUe0VMfXMF8C5vgn+UA2X2RYd54mV0NSiWppiDDm5JfVjCzrCxCvqZc0tsQNowPlrSC02x2ItstAFvMwpl3D9V9H2GOz5Psb9u9EGlHPdUaTsS/+QHEfF4ByZs4hJ0RoTD8jgS5uZOqaqS9LXDDsWBcGYSoHJ2FpZoN4xqI5dtjiSQYlulWpcU/GhWfwdy+92JG7AKLKAQ00qX+cMKRGjra+ZbUTgQynRTljQmMMr6DjoghqxgsEpQeTDjaJIJUDayBS7U5qUYwVmODlNGqsFrhoIzcgAZnAV9iEux3wlRP7y7teXvfKIJpaJdKyJT/szg+6hpFKKOwY1LctIJgJeuozzQakwIuGKyy8sSrqrlc61bao60lFD68mk1dDZJY/xIq+SPhmF5DKSrg6co4pK0w1QgtIWKsHebNDASv2HqfngAzrdjKQr1I5SGEHwVotb21l/VsK3vu5XTd6mdEPpJGIeOfWNmslronnGI1wrB7puBW4SsiuN3MbFPzhEagRQEQPIHX5i/ApEh+ajH8j5LJRrRKwrwP2DFFZu4AXXTM6zIuFeugomDGxh7mE7BlToWK8eYGHTvZN7VFfkf4QjCttoa1vcMoillWUPGkbYnAZuoxzlURtNCjI69GyD0N90BetG2KBUrhEXdgGc5tb7xYzZ1fIX4n+dCkC/M6mkD9EaZwLXXSlDInuJKGvAn3+diEdAVIZVWqUdkxd2hldsOEpVUsOTbauNHaorfaFJZ1JCoqyHlKZWlOcyoNrdBWg5XeEUK+NO1fOBTipz137HDgKBGMOAB46GwMIFgMnINgohNRX9TknchQVsGt3r59i/njW/Ou6YV7UoNDAIn3ogOlz2QrHVyLE7k1wtAAox1aL28SMLiCGtIsoGqdWcRqsTEVjFK+J3E9vdRX6h9EUQyqqtKCkEzx1yyRGkIFDQs2FYjVhFsQTgpjK7EffoIhmBQFqkEh8lZFGYKW05XAQcxJ6jEJxbeqbHtRgzzw0gwitqeEHQCW5ElXl1N50LIYE/9PTSxYclB6MA7yAeP+vSf1vJxxJyZmeWHzuwOksq/eVAsyG8dar0/Ge8yxECQDxOJ1C2KZUf8HPf0Zndz0avLtnhDhfhualDwsQWjt1a9dtrwtVbeoBOG2dqWKhLIFdytQNW2pgFxMowXsej1ZvGRarZK+JvzVo4j8CBSqxdiXXQhksUepNn9QWuvJGfjqactak48zFdjmF7KLUHkHtESVoS6PlY7XWSBgLZivOgPUQKsZU5vhFuV4HKWvCQKZOiul9tw3/SdR0UkAxOZ5HlgzjiOF/9W1jKNyxrapGot38yZK43/gwiqIVeycBQyjHEpsqQGhNXNp6hY/2AjLn/jJWMMihyKNEEG64CpzmG9PRgkTs5ckhzbep1GdSupCMpxy2uk8ZEFIhGerm9vzi7OX333v877SlVWfosgXL16Yp/TR1vOzHJIb/4+lsaD64XoUd32K8DJmwJNEYHQmax8FJyQ43TVWIHOtTpjMaSCGBhx2l1kXmSLC1fWOGYonhpXo4OjkIcNCKkeGNkxSW4GQmVXBF3oFf0k/1Zk6mXKo9fHRtGJP4MuYWGY7Q4AEfKyJNnHI8HMMP/4AhjH+9ps/z67Pc+rxaHBhz/l1BMHEh+2xRDKdcOOO86troujr3g64Drw7QYh0lSgWPKWTRL5wYqxCd7BGSghwUIsBi8fKRJhtpZRTxKcEonkshsh0X5FAtCZhuYlAzZGu1m4GoHZ9uWo0HsLitgYGg5Bh9kppumSMidn6YTZYPaEAdEPWRe2S89ny65eX33ytBeDRithBNKIPAkaWJ6dcapWENxuZLaHFBRe1RHRzeENS6giSZA17twISQEKCJG8V8SciKv4Jz2sqPKziKlK65VEtDmAIYz6cmEzyrdHMrx8e5iMa1SWXMfoQhat/FU5tjYVXdxbHvabB1hI4M8NI+8dxEFP7S2cJKDVUiPji4thxDFrBg7hq+4FJjkVUsKMg6qo1mJnHgAwJCGbVSfNDjdD0e9ta/c37kGAba8qmPcwucmW0ItXEyMRFujZ/2bKhvsWI5uP2jg4sgrBUnTqYv5YwSsXQQ9FoeSWlPJHiXKrB9hwEBViygQ2civsiE85m3rJFWWFAnBcZ4QaQQBb6KD+qyTN4+fI7QpgAvsadG4hpbNu8B+FAQ95NSa6fGLUlSkHTSBbQ05VgOur0ODYyYMdsoEikMUCLt7MoK50XEuGmmtsiYFUeKM0v56O0CQy0FUMryebJjsapLIO3IGO2QQsbw54oWsOucloSKliySDyDHYrWmDWo/aPOLWRSBhV7zEPW1N1My3Fu53Zak4Xe9tKoEPksCSyQ3Eqai9JuybTTDsCma3EnrlsxKI8XkUGtbSzASscjFC3AxGnQTYkiMH1FxJA4LcWk5L765oDf8iZ02N3jPQ6JZJBO1CwipdMaVBDa1LpV5WcrpV1SmRE3bTc4Qj/V+F9EvJN/xVoq0Yyh25qGQhybU12ipMCkYA0EbKU3P5PAFDEoJOARWtIdSevXV9ORaGZ/T75QVEqlhN7PGr1NKA2eTJSjtwP4qtpWMx6nVvKW3nz8RJr0hL1MVWLSxbK+BZExlHQZcvp5nFY0xk8bERhT+ANJn5bzRBVZVQEa1kxq+HuyIwWaxxD4GY1AgRrvS9ZkLiKiNxgqATd/2zBnHpZk1ytDzLa5GXOlFI0HWgzcFRfIgg3IkS4qOx1j5QfqoXVRASkrpV0PE0OHEIGq5D/fKa/+baQ/a/BZCQzartdQlEWIARErOvECCa7NEadznUF7nXP1bqFXvKEDBk4wXiJtR1tDLCs6E3qGP7XLWSnyIa9MGs6uwiJolpbqzyaGZyOz1j8udIczElh3E6NkqabvnsxnOQamhuRUaNxRdGbTo6Af0I0f4BNVIA+GYwpgNBoKBV+zX3pzuswQCaghSxLTkPjQAai22dRuw2xoz0eZlzfRQILgBUFILZU0VngG+dolGfqEBAkrSwVKvFFA9sqUDIxQ9A7/Y5BTkSIwadeIAbnGH0WiE3ibVQJsAkK0bx4IY9hpD9WmighV6lFHVRJ73lxujE7VEWkMYIDgnjKXYQVQeIPmab7A4K4aDHfAEITkrwQSZ/0N367S62AnwsLUHmkrw1wU5uVyDhqZS5VxHfVavcx92lNN9DB00oxdmrZ9t/rzU+AqcABq3H/0R+a8tXLFsth8O9FMnD2QAhxfXZ1fO0CkCAaddDYjz1rMpksL3B1xQFAIi0pFiUAgQhnbRISmAZEJcJlJyJcDkC2WIuY0+tjkXX0RcaqVNUFZ/VsHyHvk4Ra63NdtkSOVxkxI6qpWtzfJVzmTeecaMocWQ4hO4UQTnLoR8yFsGJcK/c8FZLhRADfnu40rsALNRlTFVVyLFS9rmgJneVbpc9WbyLp6xuTFY6y1zjODSrHrhtEyC1c2oRBQhElwhUgArkQiYhf030Tu/hkR4Va9D5RZehQgwkbsdVv6xvAQScJaDwvRWLsop2pLvLbGL2VixpJonMa4KJhoXNbAEJrW5mafe7nus8m2hcoRk1kWOpjEkdfwWJHC0hYdUCTMnrv0ClWSiJwJVO+OQ2FYiXpcQJxNjF3sHUiyV62sY8SimAvFfEXC/0ImAZulBOqlo1YOxBIX/+P7in6qaOU8Tw0/Sg2zHYqFfzCIOY9ZJnaAgn0tC8AMgMams6q3OfFW16mb3cBDxtPIR2hWLVY7Qdw/gaakaidNtW9GBLzCX9VKaMjPEN//jdm8anaSOEYoqgHIZgtUHoC54xpbsWdP031xtSMgmbQ53S0bEIqWkmMqLjbXmo5ALf9HiEI7ShTcAmYeARrw6FYhoyx5gkuwIhBpu6IBew29ykAlUctHIaMh5QoTVUoRZiqP0eBRcfsnHLU7uM7a0oROWs5xLF1HO0HWXpco+TRbwabKtpMohxAkgYGhlwCf8CkBSLyh545hp2nrvdFAd9rhUwaOlxFbmY1UZrB3S4IY5xKCaGIIKUdwruHNoIsQgT00wOoidov0Qh8/46xDnrY6M1olVFUXthE9Gpw8Ue51Th8rUd6RNaoEVssdjpxltGfgaIlyabqGNaO2MRZZHZLGuFzAqLRMADFB2sYM9fC1rTYVNs0BAW0ZPDw5tgRB1U67QRG1M6/8qAK8EUpE2BKmp29TtA/9NFhwZyQiCgoSfsEgn9UHkXWeLIudwwQ9Ah5STXWMs+ciGL0DXMZDMxhErUVojCVKo5YJCaCHQvGAaQGV/SE7RmVYBUfT5TseHlgcxyUa3Yv01PyNVVwInEGz7E7lOllf7i/FyWYu+Z630pQF1t4CcvD+s6fnJG04mB4e6Nzo7dqArR/KxTJUyiBeY1SYb+tdguuQ++6hPGmjmnEP5GSrsMxGC+KfHgp04/yJMolJlIftgKIbQgxkBROeEQIpCl+q6z7sNdFgT9j19PTtuZGlAzt7cryKOsL9RE2qcWCYIVXCiAIb5zHoQcXYo6DZ8BabqpOYasiIljhoNjXAWVOhTiN/8vhRDqzb30MV/W34n+3ti16MDzQqApGPDkdQoQ5ratRRaXQ43q8Mj61XbnY0IkIstFhouE7Ap9UsjnGm/H4GUfBKhWQBnPFp+eaLo9aiufLeCTASMByl/l4GHjFFzkxTMP4rksj8OuyN5lI6H+VzNLG+aMygw+mc9WtBxNzJZQ78zal11EfX2vG24nbjCyBBiNn13GrwYceO/NoooYyN+v7p3kLVigJXuAELdV0ZOUBezcwsh+GhQhmYoJnjpXEhM04BoewzFY/ChMl4FHclPmWORLmBQAyh8piqJMVlaJ28oEcGQ5DohQQRB7Hpf8Ztb/IdRpQDXluoycYkKLMU2tDC/q1tiBxKhnoSt+cfXiFB2jISlzVH4SoTglUGR3Pou0+DDoWQg94ip3g6cHK4dDhmhoOCEigU0R5QBmNGpSPAc4UGoAmYxRIyFIeNOcq4tHh1amCv6K0e8hxRr2E420S3qwqArYJmEkh2OC4SCFkjDvnNaEZDWqSndbJwOeUx5+QKnJrWYSx+oh8SpNtC4qyDQnF0iYLobmAzXlAbTmRzezmbDt6eW65VZ/oZmNWZzHdlbjSsDa2jbscpDaXVnti1UNVHFrATczTMlqCftsO9aINdalkyIsG/ZlyWr16++fLLL1+9epUdn3UINsK7wUWcjBakLxTsCIA6qIZ6WrIzmvXKqSqZKpApSTXCDsH16PKaV4u41eFOqA8koEtYHYtbh4dqIy6rtRFC6ODkRFHwCNJvN2/ytYTajsC6ZCygRrGDucRZYgqfwELFjZY5VFhqaJPf3GejQfUmy1Py3MpqmEIpAbLL63yoqA7fBn9CfVDFVlVsCjN0jMa7aa4HKD7hqsXu+uD4iPLbYYfzbVkol2aiBPnZD2yJKBt9r809YJtnxDeaEyvDOJZDqSfpFjG7UQcoCFjYrDr9ioxzQgwtKkFHfRBopAye6THWkdrBu4biwwipxA+mKpS4Q/N+Hifksv2xFmYBWkoY1O2afTN+FaLoFFfXrQQBFSBupUcEt2dTfsw4AgUH9eRIaVtNHzxyz4xnhHqWb0rzXO4FSSyiXh1/E+joKc7HwsZ6BCy6jaw7VyWPmtPRkofmtoMeco6PQTlep3S8PtCViewgQyKIq1PIhjXgY894i5FKFtRlCiVWL1PNLF88sMkveXI+QoL/rMFwMprnaE6SX758HcGrL5/G60eESiO4aRpA57PfTu2JspkA/xCdcyULvij7cPRoUj7bxJGahzNs23aEM79rWqWZza3EO/qJZME/oxaagzak/A6lmMGyPWIShHBCK8IzVC28dfSYFqEfYVYEMqiA5Lmvg4EhgyeFlSMNpkwdWyqPmyRThxiItF3TgemV+4iLx2F1jiO7UhV6xbMKGMKfLc/VgKw0sDXauloqkTyR8rakA2wZhk1UGB8JQjVn0AURK6KQzRMF1Bmhgx6rUHJxV10Jd/Sd+KuQURNUNefOhFhJGefS4yMShyJldzZfXJ5n0J5dCc8X8ZGK2+7rzEEYNuC8QiBjMvlZ0bSQLdTVRyxJcQ2VYtvATzlbZ7ZQbJcEqVv/oqcDOM1EVOofwQ5WJaFZGdTtIgGVNHqoFa+CUnVT5UFOEVlRAc2jFB42WiAJ+JGLqCEg5cNTFMtxdpqXjD6Ia52TYMNVyKlL48yei6vTs9ObWRrL/qJdDOMnnGhMtRoQA4tYMU9iFMOOkCROzvNUWMLFAgHC06BXyoWqbgJBjflZhL5n3FU3OdFLalaZSnaZy4FEn2KzEFeeVrncnGXL5tqkgTuWN78K8ixHbhqXAQEnIXtDEVY2uCUaqPB+L/tPE+8P5qKdXnd2NXt7cU4c5jWBGcIm9okhA5Zr6q4uoVYrKav1GuooJAuIxFE0mRSQTQVhyJpFYohsOtfBsEmmaM2cVHaD6shFxuNFoFBESOCcHmW6McQmX9wzJ6pwRizRMSeFxX/DgMzlVaKiEFxF7GgYXfjBkhZwvc73MB8DHqf2ZO7Kzj3fSKhdt8ksxsCEC5HzdM78JuavU3w0V0TAxvAeGcoHgTOMBW9oS7eK8HlS3qH9jKRUEZkjk+Fn/K1MEipwiyUBBJ5hjNjLSTJTcULKoghEIwRJMriqU4plrDyBRByB4cKPfCeF5Gf8BboOaebv1UrfDNM51oCJkWCuHpAgMcpmsr4ODxR+xeCRCPgLkxl8MBQ8uTb0IxQVNcHWw0CMJl6yzGSgSWQbbbon+fIhXsxSPER8G1z8ryNAEPCl3F3GV8FUVj9Kh+EBsz0QbkeIHNMwdqqKVoHKTyOjimR27F3iht5GT8g5DnoK+odOtjl54AgW08D2Kw+cEBhUVVHbQBpNCZIWSATSEIXs3KqPUtLufAYhhono7oxCwMr4fyKfNmBARgu7MBhKvLKOtfCfKTAqIX+xsFjMKERkCjecktJBzXD4eJT+Czji2VIekfIvFVYcmbzZ9YHziWiFk/qy5Dn8MA1MiOuTmweH+4l6uhunTGTtg1Ejp/aMx0cHkwASPHPAubMetKtUhlrLLWkIYKCQ2NoBy4f9Iu+pnlfWmQZhSp0TeYszriyIYKrGx9hGiIVXnosBDXtjlh4+wRSJ5KFScKsEEz8T2Ia68cce6xImzBCGZ/UVdvX2DXll9f+2b5IiJV/a9fXuYXdi54YaFIKkzt6Rg2bqOw9KGY5MOOIVhMlGTK2pSkC5izfxKoysznsYDJ1SCplB517Nvn6RZSXl1ULogOtvdRYVKFxdVRWMinUhw12K35Ec71mS3DhPXe+oQIHcR+6LHO4RLFfi6js2Zbn03wqwUBdwrawruoEnYGWYI0BAjAyfHCGCr12Igpj/hFxecbgeBJgdNWEoYEk9pYlI4j6eK4bGJY5J7vTO3f1F0rgnjds7KuSvAlBxR9sLxGRLZbXoQn4/XYJt7EpQbTXnpipqVyKgKhpdlpvthk5IGdMv5WVeS/jZGlKbZt0zaZTcSfBl1Uz4O7JBATgFLVRoAFQVuTQqxPIUD1wpkJzpwzF14k3IxMT+VEp5fuudLASJ0LSsPeTcwDCVlr65QZJ6W7eEZpdne1NUoKpeowI48Lwl960SCHnCfXqin8qes/xuG4BF/Kxfp8kTZ9TTZ9KR5UaRhUaFir/StAobeO3aRNq9FGEHrKkmZiwhIu++/n8BM287HyfWqUIAAAAASUVORK5CYII=",
+      "text/plain": [
+       "<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=87x244>"
+      ]
+     },
+     "execution_count": 26,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dataset[110][\"image\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "bb86837f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAD0AFcDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDm/FehWvh2yjjt2kLMQpJHWsLSG1a0k3Wds8hc+nBr0zxTb2V3aRT3Um1c5U4zx61PY6poFno3nC6gKoOSCMk/SuZXZo7Hn2ua79vtGtrzTvIvFA+YelcjKo5rV8R60uravLdRJti+6vGCRWKzk1rFXJsaVrOF3KJHCH+Dcdv5Vo22oFZ45I40DxcodgOTjHNc5EWLda7rwhpEV3cKZeR71UtEOEbsrT+LNaWbzQY42IxlExTLbXNSuSy3L+ZGzbiWGea19Z8K6hJfyNZxq8WeBmp7DwhqCxZljRSe2ayvc2cLHKzRl3dlBA9NxqnDIEm2+prrZ/D95FK6NH+VZR8NXrT5ChRnvWhDZVvLK2kMeWILDkg0VpT6JcqmW29cUUhaCXfiD7d4RSynceeuFBJ5wO1Y/h7w3deI7oLFlIFb55COPoKf4X0SHWb7NzKwiX7wHU17Fp32LTbWO1tLfbGowAOKl6Eo5Hxh4esdI8Hhba2XzIQP3nc+teWumUDDvXsfj66aXw7LGEOGryTTYTczCDIyQcZpxK6lSEESDPrXs/gjSRLp4mzjcMdK86i8OzSMp29SOte/eGtITTvDsKkcqmSampK+htTjbU47VfEEXh+Z/tMTMd+BisxvijaH5VsZGP1pnjFhdTyb/mCuQo9BXCPbhHzjinBKxNWXY6O78czTSM6WYGTwCelV18Z3DZzap+dYpi3DoaRYMZ4qjE0rnxPdznAiVRn1orNMOT0ooHcPB+rJpWvxSz827nZIPb1r6IsLeynhSVEQqwypHcV8z6jp8+malJayrhkP5+9dhpPjzVNK0pLNCrhU2qx6j/GqUeYzPTvHCWI0CeJmjUFDgAjJNeH6cP7OkL7FZ+xPan32rXOoy+ZczM7e54FVfN962jTQrs6a08QvG43ouK9H0XxyJ7b7K5jXK4FeJiXnrT47qSNso7KfUGk6UWUqkj07VtL3iSdP9IVm3bVPIrh55zbXoDWcg9NycCrFn4kngiVGkY4HepZ/EAucb0VvcrR7KwOdyB3W4beYtgqMxAHipZdUjlkjjKbTgKDV6OFSucUnASkYbqQx+XNFbxtkx0FFZ8pVzd+IfhpLmxGqwp++iX95gdRXlQfK17x4vuhB4WuJOoMXT8K8DViRnGM9qumybDyaTNNzRnmtRDt1LuqPNGaaYiUvjvU0E+4gVTY5FOt8q4Ip8wWuaUwztYHkc11Glv5tpGWPPSudiTz4iO9bulkQQrGxHHepdmK1jXEYA6cUVYQBl4orKxVx3ijWrfVvDS2dq3mSbVV1Jx0HUV5hNY3NsAZYiAe46V3SabcWTDzoSorI13iFVHc1kpWZuoXicoaSpZIyrYNR4rpTuYNWENFLSUxAa19EsYb1nEjEbegFY/etbw/NtvtvrUT2NKe51kPhuOW0kNvIRMBkAnAqnFaTW0gSQDf1OK6LTbtYnOaoXZvJtRElvbSGNR1AyDXNGbub1Ka5blm2kCoA5waKzpxNDcLNMrrnIAxRXQcg46rdXTH7TMT6ADP9aytRU3FzGvQZ71GsdzbSMsgPBPJH69afcTAukhwBuycVyXuzuinaxj6vB5VyfQiszNbevDf5Uo6Fe1YmOK66exz1FZiZopKUVoZC4qxpjGHUIjngnmol5qSMBZVb0NKSuhwdmd3a3QwMDnNb1hb6pdqGN5HDbE8KMZ/+tXH2Mm6JSDUXiV7i3hjljuJUBwAqtgdK4npI7WnKI3xdrQ+3m1tZDL5BKmUnqc+lFceGZiSeT6nmiuuOxxSjqe0abp8V7ZSCaMMx7kcjiuI1W0NpPJCQcIcDNd/p10scuxAArVjeJrNJ1aVfvfzrD2dkdkXdnCXNwk2nRox/eIT+VZZxVq7jZHKgVUEbVtT2MKu4hopdho8s46VoY2YBtpp/mZWnR25cgYrSi0h5oDjg/SndWKUWXNEkDxFCelavia1+0WFqmeS3H5VgaWr2l0Y3GCOoNad1evPdQxFsheg9K4ai947YP3dTm9R0iXTJFMhBVhwc4orsvFGnpLo0UuBvBHJFFbxloc7Wps2U4+0IrdCat6hEJUKntWTZzQeeqsw3A8VrPJvJPrWn2QTszkNQ0XfLuU4Unmqo0Be8uK6m7XEbVml651OzOynSjNXZkjQI+8p69hViLRbRSNxZvarRcik8wjkiq5zZUIIt21lZxsNsK10dg9nGjf6OuSPSuWjnAYDNXY7kBcA0uZsPZxsUvEkEceoJcxIFDDaxFZToqalAx+6w7VoavI8kQDN8gOax7qUmWAoeQaTOeorbHYaxG0+joqDJAWirOmTLcWao+CQBndRSRlyM5WyV5JwMV1MKsQKwNLRjehx90dfautiUFBxzXXBXOaTKl1DvhJAycVzUsrxuflGAe9dr5WVII4rlbyDZPIpHRjXPUp2Z3YapdWMr+1ERyHXFOW/S4IWMgZ9arX1spBPTFZ1jlb0L2NZ2Ou50BiwM7sn2p8MTs4LHC+lEcLKPvVKFYnAOKpFDNSRTZSe1cxFP506D+62K6icFo9h5BqjPpEcd0kqDaG5IFU1ocVR2kW7bUBZXphZyAy5HpRXP+JJtl6iR53KuDRWVi+aJv2+uWumllkhdjnkrWlD4y0ogFlmQ+myuNvCHmf8A3jUCxgkV2RdjzG7nqNt4g0u5iDJcAcdG4rFv5Y5buR4mDI3ORXO6dDZglrjp6CtdzCT/AKOu2MDpU1Hc6sL8RRvvukAdaraJpsup6mYIiocLu+arN1V7wEwj8ZRMf4lwAehrA7puxKyGCRon+8hKn6igeoq94hi8jXbpQMAtkVmgkUJlQleJJ5bysFjXc3pXTR+Ho59JW5lISVEz1rl4rkxzKR1B7V1omaXTi28gFOhrZK6POxE7SPJ9cVptWbYkjDHZSaK9ssI7S3tIyLWLfjJZlBJoqeUi9zxOTmQn3pAQKfdRmOVvTNV8nNXc51sWFcqeDWxYzb49vcVz8jMgyK19CbzN5f8ACom9DooO0i1d4APrWt4JtN+tx3POY6zL1ecjvXVeBYGE7NsOMZJrnudtSd0Q+LiW1xjtwMfnWE8oRctxXTeL5rca4sSgl9vzHHFc1fIrRkCmmTCdkZtvqLyXhVF6HjPeu10WK61O1kZvljThie4rzaW4ayvhIg5Wu48Ia/eapO8WxI4VXJCjk11QehxVdZHS398mmWiF1LDgYFFWjGsnDoCPcUVVi01Y8WuL9bqZht2sCeDTAcVBqkH2HX7qEdFkP86nU8is0cyHuu6M+wrR0YbImPrVL+E/Sr2nMFgOfWpqbG9HVlud9zKvcnivZvCumRWumRuVUEpzjv3rxOSQNLHjruFe56bMYvDQk6FUJ/8AHa5zebPJPEN+L7xnfmMnYjkL9M1DMSyfhWPbXBm1i9l6lpm/nWo7/KabLprQopo41SZ137SBya2/AVqbW6vY2O5kbbkVU0aXF6y56itTwp+71LUR6yf41vS1OaroztFI70VCGIPWiurlMOY8V8XSbvFV6c8mQ5/OmQkGJTUPiNH/ALeu3b+Jyc/jS2jZhHrWHUSLqHjmpIJdoKioQpPNV4Zv9IZfepnsa0XZmzbAvdxem8V7VeXa2vhCRjgYiPP/AAGvErWTZNG3vXpOp3Et54Slij+8YR39ua50tTpnseP2Nywv2OeXYn9a6IMWSuTtz5V2d38LEV1cHzRD3FORVLYn0z5NQX3rc0MbNZvAO+01gQt5Vwr/AN081saLcBtVmI48wDr7CtqTsc9VanYA5Wis+fUILP8A10ir7Zorb2hlyHlXia7ju7lZVUAkc1RsmLKRS6rbSRTtnoCcVFablJqGxJGqvC1lozC8xV0N8vWqYGbpD6mpbKgtTWSQrtr0XQbpbvR3hkbohFefm3Plg9sVraTqj2kTxgHkYyDWKWp0yehyWpRi31m5jByBIcfnXR2J3QL/ALtczqRMmqyOTnLZrrrGDbbJhSAVHX6UpoKDH20QebnpVDXZjbzRi3ZkYkcqaukvBNkdKzNXy80D+4qovQKi1IdRVnt4mcu5IBJJzzRVu6g862Reneii5CidlqfhuyvlYiLBb0FYo8CNuOxjjPetvUtbktmxDgY61iz+Jr5xjzNuPSqVyNBf+EBnJ/4+EUfWsnVfCraQi3LXSSAHgD1qZ9cvPmPnvz1way727muVwzsw9zVJE7EP26TdtzlfStHT0MwfA5rGSB2kHB610ulwiEZHUjmqsgu2crf2k6Xju0bAZ610Vjq/k2cSSIWKrjrV7UIvMgckc44rmlhuixUROfTAqGkyoya2Ne51S3mQjYyn61lTSNNNGQxKg9Kli0TU7ogR278nqRWvYeDNXaQMUC49anRA5yZUv7pTbx+UNhAANFdRF4BvJxiQc+1FToaRloZeqZ89uaymA20UVujGIxUG6nmNPSiigaFjjXeOK6PT4I9i8daKKmRSN6HTraZPnTNamm6XZozbYh+VFFZvYZvQWduBkRr+VXUhjC8ItFFYlvYUnavAH5UUUVSJP//Z'"
+      ]
+     },
+     "execution_count": 36,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pil_to_url(dataset[110]['image'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce9be966",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'The image shows a person from behind, wearing a dark blue t-shirt and pink shorts. They are standing among a group of people, and the setting appears to be outdoors.'"
+      ]
+     },
+     "execution_count": 38,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:8082/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "\n",
+    "def generate_content(image, prompt):\n",
+    "    \n",
+    "    url_of_pil_image = pil_to_url(image)\n",
+    "    \n",
+    "    response = client.chat.completions.create(\n",
+    "        model=model_name,\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": [\n",
+    "                    {\n",
+    "                        \"type\": \"text\",\n",
+    "                        \"text\": prompt,\n",
+    "                    },\n",
+    "                    {\n",
+    "                        \"type\": \"image_url\",\n",
+    "                        \"image_url\": {\n",
+    "                            \"url\": url_of_pil_image,\n",
+    "                        },\n",
+    "                    },\n",
+    "                ],\n",
+    "            }\n",
+    "        ],\n",
+    "        temperature=0.5,\n",
+    "        top_p=0.8,\n",
+    "    )\n",
+    "    return response.choices[0].message.content\n",
+    "\n",
+    "generate_content(image=dataset[110][\"image\"], prompt=\"describe this image\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8ebeb3b6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PROMPT = '''\n",
+    "You are an AI assistant that helps users describe a given person from an image in detail. The image is taken from a surveillance camera and focuses on one person. Your caption must focus on the person and cover the following aspects:\n",
+    "\n",
+    "- Gender, age, and pose of the person\n",
+    "- Upper body clothing such as shirt, jacket, etc.\n",
+    "- Lower body clothing such as pants, skirt, etc.\n",
+    "- Accessories on head/face such as hat, glasses, etc.\n",
+    "- Accessories on body such as bag, watch, book, etc.\n",
+    "- Accessories on feet such as shoes, sandals, etc.\n",
+    "- Activities and interactions with other objects such as holding a phone, sitting on a bench, etc.\n",
+    "- Transportation such as car, bicycle, etc.\n",
+    "\n",
+    "Here are two example captions. \n",
+    "{EXAMPLE}\n",
+    "Please mimic the style, expression, and sentence structure of the examples without copying the specific details. If the example is unusual, please ignore it. \n",
+    "You must describe the person in your input image truthfully and in detail.\n",
+    "'''\n",
+    "\n",
+    "def make_prompt(prompt, example):\n",
+    "    return prompt.format(EXAMPLE=example)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "76cd677f",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "lmdeploy"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/openai/.ipynb_checkpoints/ping_server-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,292 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# !pip install openai\n",
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:8082/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "# response = client.chat.completions.create(\n",
+    "#     model=model_name,\n",
+    "#     messages=[\n",
+    "#         {\n",
+    "#             \"role\": \"system\",\n",
+    "#             \"content\": \"You are a helpful assistant who is proficient in translating English to Chinese.\",\n",
+    "#         },\n",
+    "#         {\n",
+    "#             \"role\": \"user\",\n",
+    "#             \"content\": \"Please translate and paraphrase the following sentence into natural, fluent Chinese: \",\n",
+    "#         },\n",
+    "#     ],\n",
+    "#     temperature=0.8,\n",
+    "#     top_p=0.9,\n",
+    "# )\n",
+    "# print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "24"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(client.models.list().data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'这个男人穿着红色的衬衫和蓝色的牛仔裤。'"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def get_output(english_text):\n",
+    "    response = client.chat.completions.create(\n",
+    "        model=model_name,\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": \"You are a helpful assistant who is proficient in translating English to Chinese.\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Please translate and paraphrase the following sentence into natural, fluent Chinese: \" + english_text,\n",
+    "            },\n",
+    "        ],\n",
+    "        temperature=0.7,\n",
+    "        top_p=0.9,\n",
+    "    )\n",
+    "    return response.choices[0].message.content\n",
+    "\n",
+    "o = get_output(\"The man is wearing a red shirt and blue jeans.\" * 5)\n",
+    "o"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "21"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# !ps aux|grep infer|grep -v grep | awk '{print $2}'|xargs kill -9"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "APIConnectionError",
+     "evalue": "Connection error.",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mConnectError\u001b[0m                              Traceback (most recent call last)",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpx/_transports/default.py:101\u001b[0m, in \u001b[0;36mmap_httpcore_exceptions\u001b[0;34m()\u001b[0m\n\u001b[1;32m    100\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 101\u001b[0m     \u001b[38;5;28;01myield\u001b[39;00m\n\u001b[1;32m    102\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exc:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpx/_transports/default.py:250\u001b[0m, in \u001b[0;36mHTTPTransport.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    249\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m map_httpcore_exceptions():\n\u001b[0;32m--> 250\u001b[0m     resp \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_pool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandle_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreq\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    252\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(resp\u001b[38;5;241m.\u001b[39mstream, typing\u001b[38;5;241m.\u001b[39mIterable)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpcore/_sync/connection_pool.py:256\u001b[0m, in \u001b[0;36mConnectionPool.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    255\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_close_connections(closing)\n\u001b[0;32m--> 256\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m exc \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m    258\u001b[0m \u001b[38;5;66;03m# Return the response. Note that in this case we still have to manage\u001b[39;00m\n\u001b[1;32m    259\u001b[0m \u001b[38;5;66;03m# the point at which the response is closed.\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpcore/_sync/connection_pool.py:236\u001b[0m, in \u001b[0;36mConnectionPool.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    234\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    235\u001b[0m     \u001b[38;5;66;03m# Send the request on the assigned connection.\u001b[39;00m\n\u001b[0;32m--> 236\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[43mconnection\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandle_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    237\u001b[0m \u001b[43m        \u001b[49m\u001b[43mpool_request\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrequest\u001b[49m\n\u001b[1;32m    238\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    239\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m ConnectionNotAvailable:\n\u001b[1;32m    240\u001b[0m     \u001b[38;5;66;03m# In some cases a connection may initially be available to\u001b[39;00m\n\u001b[1;32m    241\u001b[0m     \u001b[38;5;66;03m# handle a request, but then become unavailable.\u001b[39;00m\n\u001b[1;32m    242\u001b[0m     \u001b[38;5;66;03m#\u001b[39;00m\n\u001b[1;32m    243\u001b[0m     \u001b[38;5;66;03m# In this case we clear the connection and try again.\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpcore/_sync/connection.py:101\u001b[0m, in \u001b[0;36mHTTPConnection.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    100\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_connect_failed \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[0;32m--> 101\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m exc\n\u001b[1;32m    103\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_connection\u001b[38;5;241m.\u001b[39mhandle_request(request)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpcore/_sync/connection.py:78\u001b[0m, in \u001b[0;36mHTTPConnection.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m     77\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_connection \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m---> 78\u001b[0m     stream \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_connect\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     80\u001b[0m     ssl_object \u001b[38;5;241m=\u001b[39m stream\u001b[38;5;241m.\u001b[39mget_extra_info(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mssl_object\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpcore/_sync/connection.py:124\u001b[0m, in \u001b[0;36mHTTPConnection._connect\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    123\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m Trace(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mconnect_tcp\u001b[39m\u001b[38;5;124m\"\u001b[39m, logger, request, kwargs) \u001b[38;5;28;01mas\u001b[39;00m trace:\n\u001b[0;32m--> 124\u001b[0m     stream \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_network_backend\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mconnect_tcp\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    125\u001b[0m     trace\u001b[38;5;241m.\u001b[39mreturn_value \u001b[38;5;241m=\u001b[39m stream\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpcore/_backends/sync.py:207\u001b[0m, in \u001b[0;36mSyncBackend.connect_tcp\u001b[0;34m(self, host, port, timeout, local_address, socket_options)\u001b[0m\n\u001b[1;32m    202\u001b[0m exc_map: ExceptionMapping \u001b[38;5;241m=\u001b[39m {\n\u001b[1;32m    203\u001b[0m     socket\u001b[38;5;241m.\u001b[39mtimeout: ConnectTimeout,\n\u001b[1;32m    204\u001b[0m     \u001b[38;5;167;01mOSError\u001b[39;00m: ConnectError,\n\u001b[1;32m    205\u001b[0m }\n\u001b[0;32m--> 207\u001b[0m \u001b[43m\u001b[49m\u001b[38;5;28;43;01mwith\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mmap_exceptions\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexc_map\u001b[49m\u001b[43m)\u001b[49m\u001b[43m:\u001b[49m\n\u001b[1;32m    208\u001b[0m \u001b[43m    \u001b[49m\u001b[43msock\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m \u001b[49m\u001b[43msocket\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcreate_connection\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    209\u001b[0m \u001b[43m        \u001b[49m\u001b[43maddress\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    210\u001b[0m \u001b[43m        \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    211\u001b[0m \u001b[43m        \u001b[49m\u001b[43msource_address\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msource_address\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    212\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/lib/python3.11/contextlib.py:155\u001b[0m, in \u001b[0;36m_GeneratorContextManager.__exit__\u001b[0;34m(self, typ, value, traceback)\u001b[0m\n\u001b[1;32m    154\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 155\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mgen\u001b[38;5;241m.\u001b[39mthrow(typ, value, traceback)\n\u001b[1;32m    156\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mStopIteration\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[1;32m    157\u001b[0m     \u001b[38;5;66;03m# Suppress StopIteration *unless* it's the same exception that\u001b[39;00m\n\u001b[1;32m    158\u001b[0m     \u001b[38;5;66;03m# was passed to throw().  This prevents a StopIteration\u001b[39;00m\n\u001b[1;32m    159\u001b[0m     \u001b[38;5;66;03m# raised inside the \"with\" statement from being suppressed.\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpcore/_exceptions.py:14\u001b[0m, in \u001b[0;36mmap_exceptions\u001b[0;34m(map)\u001b[0m\n\u001b[1;32m     13\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(exc, from_exc):\n\u001b[0;32m---> 14\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m to_exc(exc) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mexc\u001b[39;00m\n\u001b[1;32m     15\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m\n",
+      "\u001b[0;31mConnectError\u001b[0m: [Errno 111] Connection refused",
+      "\nThe above exception was the direct cause of the following exception:\n",
+      "\u001b[0;31mConnectError\u001b[0m                              Traceback (most recent call last)",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:993\u001b[0m, in \u001b[0;36mSyncAPIClient._request\u001b[0;34m(self, cast_to, options, retries_taken, stream, stream_cls)\u001b[0m\n\u001b[1;32m    992\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 993\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_client\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msend\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    994\u001b[0m \u001b[43m        \u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    995\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_should_stream_response_body\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    996\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    997\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    998\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m httpx\u001b[38;5;241m.\u001b[39mTimeoutException \u001b[38;5;28;01mas\u001b[39;00m err:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpx/_client.py:914\u001b[0m, in \u001b[0;36mClient.send\u001b[0;34m(self, request, stream, auth, follow_redirects)\u001b[0m\n\u001b[1;32m    912\u001b[0m auth \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_build_request_auth(request, auth)\n\u001b[0;32m--> 914\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_send_handling_auth\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    915\u001b[0m \u001b[43m    \u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    916\u001b[0m \u001b[43m    \u001b[49m\u001b[43mauth\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mauth\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    917\u001b[0m \u001b[43m    \u001b[49m\u001b[43mfollow_redirects\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mfollow_redirects\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    918\u001b[0m \u001b[43m    \u001b[49m\u001b[43mhistory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    919\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    920\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpx/_client.py:942\u001b[0m, in \u001b[0;36mClient._send_handling_auth\u001b[0;34m(self, request, auth, follow_redirects, history)\u001b[0m\n\u001b[1;32m    941\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n\u001b[0;32m--> 942\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_send_handling_redirects\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    943\u001b[0m \u001b[43m        \u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    944\u001b[0m \u001b[43m        \u001b[49m\u001b[43mfollow_redirects\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mfollow_redirects\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    945\u001b[0m \u001b[43m        \u001b[49m\u001b[43mhistory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mhistory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    946\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    947\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpx/_client.py:979\u001b[0m, in \u001b[0;36mClient._send_handling_redirects\u001b[0;34m(self, request, follow_redirects, history)\u001b[0m\n\u001b[1;32m    977\u001b[0m     hook(request)\n\u001b[0;32m--> 979\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_send_single_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    980\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpx/_client.py:1014\u001b[0m, in \u001b[0;36mClient._send_single_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m   1013\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m request_context(request\u001b[38;5;241m=\u001b[39mrequest):\n\u001b[0;32m-> 1014\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[43mtransport\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandle_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1016\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(response\u001b[38;5;241m.\u001b[39mstream, SyncByteStream)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpx/_transports/default.py:249\u001b[0m, in \u001b[0;36mHTTPTransport.handle_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m    237\u001b[0m req \u001b[38;5;241m=\u001b[39m httpcore\u001b[38;5;241m.\u001b[39mRequest(\n\u001b[1;32m    238\u001b[0m     method\u001b[38;5;241m=\u001b[39mrequest\u001b[38;5;241m.\u001b[39mmethod,\n\u001b[1;32m    239\u001b[0m     url\u001b[38;5;241m=\u001b[39mhttpcore\u001b[38;5;241m.\u001b[39mURL(\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    247\u001b[0m     extensions\u001b[38;5;241m=\u001b[39mrequest\u001b[38;5;241m.\u001b[39mextensions,\n\u001b[1;32m    248\u001b[0m )\n\u001b[0;32m--> 249\u001b[0m \u001b[43m\u001b[49m\u001b[38;5;28;43;01mwith\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mmap_httpcore_exceptions\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m:\u001b[49m\n\u001b[1;32m    250\u001b[0m \u001b[43m    \u001b[49m\u001b[43mresp\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_pool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandle_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreq\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/lib/python3.11/contextlib.py:155\u001b[0m, in \u001b[0;36m_GeneratorContextManager.__exit__\u001b[0;34m(self, typ, value, traceback)\u001b[0m\n\u001b[1;32m    154\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 155\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mgen\u001b[38;5;241m.\u001b[39mthrow(typ, value, traceback)\n\u001b[1;32m    156\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mStopIteration\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[1;32m    157\u001b[0m     \u001b[38;5;66;03m# Suppress StopIteration *unless* it's the same exception that\u001b[39;00m\n\u001b[1;32m    158\u001b[0m     \u001b[38;5;66;03m# was passed to throw().  This prevents a StopIteration\u001b[39;00m\n\u001b[1;32m    159\u001b[0m     \u001b[38;5;66;03m# raised inside the \"with\" statement from being suppressed.\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/httpx/_transports/default.py:118\u001b[0m, in \u001b[0;36mmap_httpcore_exceptions\u001b[0;34m()\u001b[0m\n\u001b[1;32m    117\u001b[0m message \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mstr\u001b[39m(exc)\n\u001b[0;32m--> 118\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m mapped_exc(message) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mexc\u001b[39;00m\n",
+      "\u001b[0;31mConnectError\u001b[0m: [Errno 111] Connection refused",
+      "\nThe above exception was the direct cause of the following exception:\n",
+      "\u001b[0;31mAPIConnectionError\u001b[0m                        Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[5], line 6\u001b[0m\n\u001b[1;32m      3\u001b[0m port \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m2000\u001b[39m\n\u001b[1;32m      5\u001b[0m client \u001b[38;5;241m=\u001b[39m OpenAI(api_key\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mYOUR_API_KEY\u001b[39m\u001b[38;5;124m\"\u001b[39m, base_url\u001b[38;5;241m=\u001b[39m\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhttp://0.0.0.0:\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mport\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m/v1\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m----> 6\u001b[0m model_name \u001b[38;5;241m=\u001b[39m \u001b[43mclient\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmodels\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlist\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241m.\u001b[39mdata[\u001b[38;5;241m0\u001b[39m]\u001b[38;5;241m.\u001b[39mid\n\u001b[1;32m      7\u001b[0m response \u001b[38;5;241m=\u001b[39m client\u001b[38;5;241m.\u001b[39mchat\u001b[38;5;241m.\u001b[39mcompletions\u001b[38;5;241m.\u001b[39mcreate(\n\u001b[1;32m      8\u001b[0m     model\u001b[38;5;241m=\u001b[39mmodel_name,\n\u001b[1;32m      9\u001b[0m     messages\u001b[38;5;241m=\u001b[39m[\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     27\u001b[0m     top_p\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m0.8\u001b[39m,\n\u001b[1;32m     28\u001b[0m )\n\u001b[1;32m     29\u001b[0m \u001b[38;5;28mprint\u001b[39m(response)\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/resources/models.py:91\u001b[0m, in \u001b[0;36mModels.list\u001b[0;34m(self, extra_headers, extra_query, extra_body, timeout)\u001b[0m\n\u001b[1;32m     77\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mlist\u001b[39m(\n\u001b[1;32m     78\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m     79\u001b[0m     \u001b[38;5;241m*\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     85\u001b[0m     timeout: \u001b[38;5;28mfloat\u001b[39m \u001b[38;5;241m|\u001b[39m httpx\u001b[38;5;241m.\u001b[39mTimeout \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m|\u001b[39m NotGiven \u001b[38;5;241m=\u001b[39m NOT_GIVEN,\n\u001b[1;32m     86\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m SyncPage[Model]:\n\u001b[1;32m     87\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m     88\u001b[0m \u001b[38;5;124;03m    Lists the currently available models, and provides basic information about each\u001b[39;00m\n\u001b[1;32m     89\u001b[0m \u001b[38;5;124;03m    one such as the owner and availability.\u001b[39;00m\n\u001b[1;32m     90\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[0;32m---> 91\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_get_api_list\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     92\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m/models\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m     93\u001b[0m \u001b[43m        \u001b[49m\u001b[43mpage\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mSyncPage\u001b[49m\u001b[43m[\u001b[49m\u001b[43mModel\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     94\u001b[0m \u001b[43m        \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmake_request_options\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     95\u001b[0m \u001b[43m            \u001b[49m\u001b[43mextra_headers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_headers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mextra_query\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_query\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mextra_body\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_body\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtimeout\u001b[49m\n\u001b[1;32m     96\u001b[0m \u001b[43m        \u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     97\u001b[0m \u001b[43m        \u001b[49m\u001b[43mmodel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mModel\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     98\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:1329\u001b[0m, in \u001b[0;36mSyncAPIClient.get_api_list\u001b[0;34m(self, path, model, page, body, options, method)\u001b[0m\n\u001b[1;32m   1318\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mget_api_list\u001b[39m(\n\u001b[1;32m   1319\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m   1320\u001b[0m     path: \u001b[38;5;28mstr\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1326\u001b[0m     method: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mget\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m   1327\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m SyncPageT:\n\u001b[1;32m   1328\u001b[0m     opts \u001b[38;5;241m=\u001b[39m FinalRequestOptions\u001b[38;5;241m.\u001b[39mconstruct(method\u001b[38;5;241m=\u001b[39mmethod, url\u001b[38;5;241m=\u001b[39mpath, json_data\u001b[38;5;241m=\u001b[39mbody, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39moptions)\n\u001b[0;32m-> 1329\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request_api_list\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpage\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mopts\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:1180\u001b[0m, in \u001b[0;36mSyncAPIClient._request_api_list\u001b[0;34m(self, model, page, options)\u001b[0m\n\u001b[1;32m   1176\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m resp\n\u001b[1;32m   1178\u001b[0m options\u001b[38;5;241m.\u001b[39mpost_parser \u001b[38;5;241m=\u001b[39m _parser\n\u001b[0;32m-> 1180\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpage\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:957\u001b[0m, in \u001b[0;36mSyncAPIClient.request\u001b[0;34m(self, cast_to, options, remaining_retries, stream, stream_cls)\u001b[0m\n\u001b[1;32m    954\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    955\u001b[0m     retries_taken \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m\n\u001b[0;32m--> 957\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    958\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    959\u001b[0m \u001b[43m    \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    960\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    961\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    962\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    963\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:1017\u001b[0m, in \u001b[0;36mSyncAPIClient._request\u001b[0;34m(self, cast_to, options, retries_taken, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1014\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mEncountered Exception\u001b[39m\u001b[38;5;124m\"\u001b[39m, exc_info\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m   1016\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m remaining_retries \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[0;32m-> 1017\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_retry_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1018\u001b[0m \u001b[43m        \u001b[49m\u001b[43minput_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1019\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1020\u001b[0m \u001b[43m        \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1021\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1022\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1023\u001b[0m \u001b[43m        \u001b[49m\u001b[43mresponse_headers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m   1024\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1026\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mRaising connection error\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m   1027\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m APIConnectionError(request\u001b[38;5;241m=\u001b[39mrequest) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:1095\u001b[0m, in \u001b[0;36mSyncAPIClient._retry_request\u001b[0;34m(self, options, cast_to, retries_taken, response_headers, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1091\u001b[0m \u001b[38;5;66;03m# In a synchronous context we are blocking the entire thread. Up to the library user to run the client in a\u001b[39;00m\n\u001b[1;32m   1092\u001b[0m \u001b[38;5;66;03m# different thread if necessary.\u001b[39;00m\n\u001b[1;32m   1093\u001b[0m time\u001b[38;5;241m.\u001b[39msleep(timeout)\n\u001b[0;32m-> 1095\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1096\u001b[0m \u001b[43m    \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1097\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1098\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1099\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1100\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1101\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:1017\u001b[0m, in \u001b[0;36mSyncAPIClient._request\u001b[0;34m(self, cast_to, options, retries_taken, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1014\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mEncountered Exception\u001b[39m\u001b[38;5;124m\"\u001b[39m, exc_info\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m   1016\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m remaining_retries \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[0;32m-> 1017\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_retry_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1018\u001b[0m \u001b[43m        \u001b[49m\u001b[43minput_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1019\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1020\u001b[0m \u001b[43m        \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1021\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1022\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1023\u001b[0m \u001b[43m        \u001b[49m\u001b[43mresponse_headers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m   1024\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1026\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mRaising connection error\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m   1027\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m APIConnectionError(request\u001b[38;5;241m=\u001b[39mrequest) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:1095\u001b[0m, in \u001b[0;36mSyncAPIClient._retry_request\u001b[0;34m(self, options, cast_to, retries_taken, response_headers, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1091\u001b[0m \u001b[38;5;66;03m# In a synchronous context we are blocking the entire thread. Up to the library user to run the client in a\u001b[39;00m\n\u001b[1;32m   1092\u001b[0m \u001b[38;5;66;03m# different thread if necessary.\u001b[39;00m\n\u001b[1;32m   1093\u001b[0m time\u001b[38;5;241m.\u001b[39msleep(timeout)\n\u001b[0;32m-> 1095\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1096\u001b[0m \u001b[43m    \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1097\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcast_to\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcast_to\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1098\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretries_taken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries_taken\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1099\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1100\u001b[0m \u001b[43m    \u001b[49m\u001b[43mstream_cls\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstream_cls\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1101\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/openai/_base_client.py:1027\u001b[0m, in \u001b[0;36mSyncAPIClient._request\u001b[0;34m(self, cast_to, options, retries_taken, stream, stream_cls)\u001b[0m\n\u001b[1;32m   1017\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_retry_request(\n\u001b[1;32m   1018\u001b[0m             input_options,\n\u001b[1;32m   1019\u001b[0m             cast_to,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1023\u001b[0m             response_headers\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   1024\u001b[0m         )\n\u001b[1;32m   1026\u001b[0m     log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mRaising connection error\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m-> 1027\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m APIConnectionError(request\u001b[38;5;241m=\u001b[39mrequest) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n\u001b[1;32m   1029\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\n\u001b[1;32m   1030\u001b[0m     \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mHTTP Response: \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m%i\u001b[39;00m\u001b[38;5;124m \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m   1031\u001b[0m     request\u001b[38;5;241m.\u001b[39mmethod,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m   1035\u001b[0m     response\u001b[38;5;241m.\u001b[39mheaders,\n\u001b[1;32m   1036\u001b[0m )\n\u001b[1;32m   1037\u001b[0m log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrequest_id: \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m\"\u001b[39m, response\u001b[38;5;241m.\u001b[39mheaders\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mx-request-id\u001b[39m\u001b[38;5;124m\"\u001b[39m))\n",
+      "\u001b[0;31mAPIConnectionError\u001b[0m: Connection error."
+     ]
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "port = 2000\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=f\"http://0.0.0.0:{port}/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": [\n",
+    "                {\n",
+    "                    \"type\": \"text\",\n",
+    "                    \"text\": \"Miêu tả bức tranh giùm coi\",\n",
+    "                },\n",
+    "                {\n",
+    "                    \"type\": \"image_url\",\n",
+    "                    \"image_url\": {\n",
+    "                        \"url\": \"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg\",\n",
+    "                    },\n",
+    "                },\n",
+    "            ],\n",
+    "        }\n",
+    "    ],\n",
+    "    temperature=0.8,\n",
+    "    top_p=0.8,\n",
+    ")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_name"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response.choices[0].message.content"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
+      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
+      "100   617  100   404  100   213   5970   3147 --:--:-- --:--:-- --:--:--  9208\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\"id\":\"chatcmpl-8b3b1360415d4805a44f33bd81fc3447\",\"object\":\"chat.completion\",\"created\":1734879441,\"model\":\"Qwen/Qwen2.5-1.5B-Instruct\",\"choices\":[{\"index\":0,\"message\":{\"role\":\"assistant\",\"content\":\"巴黎\",\"tool_calls\":[]},\"logprobs\":null,\"finish_reason\":\"stop\",\"stop_reason\":null}],\"usage\":{\"prompt_tokens\":48,\"total_tokens\":50,\"completion_tokens\":2,\"prompt_tokens_details\":null},\"prompt_logprobs\":null}"
+     ]
+    }
+   ],
+   "source": [
+    "%%bash\n",
+    "# Call the server using curl:\n",
+    "curl -X POST \"http://localhost:8000/v1/chat/completions\" \\\n",
+    "\t-H \"Content-Type: application/json\" \\\n",
+    "\t--data '{\n",
+    "\t\t\"model\": \"Qwen/Qwen2.5-1.5B-Instruct\",\n",
+    "\t\t\"messages\": [\n",
+    "\t\t\t{\n",
+    "\t\t\t\t\"role\": \"user\",\n",
+    "\t\t\t\t\"content\": \"What is the capital of France? You must answer in Chinese without adding any comment or explanation.\"\n",
+    "\t\t\t}\n",
+    "\t\t]\n",
+    "\t}'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "lmdeploy"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

a_mllm_notebooks/openai/.ipynb_checkpoints/serve-checkpoint.sh ADDED Viewed

	@@ -0,0 +1,60 @@

+eval "$(conda shell.bash hook)"
+conda activate lmdeploy
+# # MODEL_NAME=OpenGVLab/InternVL2_5-8B-AWQ
+MODEL_NAME=OpenGVLab/InternVL2_5-4B-MPO-AWQ
+PORT_LIST=( $(seq 5911 1 5911) )
+for PORT in "${PORT_LIST[@]}"; do
+  # get random device id from 0 to 3
+  # RANDOM_DEVICE_ID=$((RANDOM % 3))
+  # RANDOM_DEVICE_ID=3
+    # CUDA_VISIBLE_DEVICES=0,1 \
+    # CUDA_VISIBLE_DEVICES=2,3 \
+  CUDA_VISIBLE_DEVICES=1 \
+  lmdeploy serve api_server $MODEL_NAME \
+  --server-port $PORT \
+  --backend turbomind \
+  --dtype float16 --proxy-url http://0.0.0.0:7089 \
+  --vision-max-batch-size 64 &
+  # --cache-max-entry-count 0.4 &
+  # --tp 1 &
+    # &
+done
+PORT_LIST=( $(seq 5972 1 5972) )
+for PORT in "${PORT_LIST[@]}"; do
+  # get random device id from 0 to 3
+  # RANDOM_DEVICE_ID=$((RANDOM % 3))
+  # RANDOM_DEVICE_ID=3
+    # CUDA_VISIBLE_DEVICES=0,1 \
+    # CUDA_VISIBLE_DEVICES=2,3 \
+  CUDA_VISIBLE_DEVICES=2 \
+  lmdeploy serve api_server $MODEL_NAME \
+  --server-port $PORT \
+  --backend turbomind \
+  --dtype float16 --proxy-url http://0.0.0.0:7089 \
+  --vision-max-batch-size 64 &
+  # --cache-max-entry-count 0.4 &
+  # --tp 1 &
+    # &
+done
+PORT_LIST=( $(seq 5171 1 5171) )
+for PORT in "${PORT_LIST[@]}"; do
+  # get random device id from 0 to 3
+  # RANDOM_DEVICE_ID=$((RANDOM % 3))
+  # RANDOM_DEVICE_ID=3
+    # CUDA_VISIBLE_DEVICES=0,1 \
+    # CUDA_VISIBLE_DEVICES=2,3 \
+  CUDA_VISIBLE_DEVICES=1 \
+  lmdeploy serve api_server $MODEL_NAME \
+  --server-port $PORT \
+  --backend turbomind \
+  --dtype float16 --proxy-url http://0.0.0.0:7089 \
+  --vision-max-batch-size 64 &
+  # --cache-max-entry-count 0.4 &
+  # --tp 1 &
+    # &
+done

a_mllm_notebooks/openai/.ipynb_checkpoints/temp-checkpoint.sh ADDED Viewed

	@@ -0,0 +1,25 @@

+eval "$(conda shell.bash hook)"
+conda activate lmdeploy
+MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct-AWQ
+PORT_LIST=( $(seq 3162 1 3162) )
+for PORT in "${PORT_LIST[@]}"; do
+  CUDA_VISIBLE_DEVICES=0,1,2,3 \
+  lmdeploy serve api_server $MODEL_NAME \
+  --server-port $PORT \
+  --backend turbomind \
+  --dtype float16 --proxy-url http://0.0.0.0:8082 \
+  --cache-max-entry-count 0.0075 --tp 3 &
+done
+# # PORT_LIST from 3063 to 3099
+# PORT_LIST=( $(seq 9000 1 9000) )
+# # PORT_LIST=(9898)
+# for PORT in "${PORT_LIST[@]}"; do
+#   CUDA_VISIBLE_DEVICES=3 \
+#   lmdeploy serve api_server $MODEL_NAME \
+#   --server-port $PORT \
+#   --backend turbomind \
+#   --dtype float16 --proxy-url http://0.0.0.0:8082 \
+#   --cache-max-entry-count 0.025 --tp 1 &
+# done

a_mllm_notebooks/openai/combine_chinese_output.ipynb ADDED Viewed

	@@ -0,0 +1,526 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "thread_0   thread_18  thread_27  thread_36  thread_45  thread_54  thread_63\n",
+      "thread_1   thread_19  thread_28  thread_37  thread_46  thread_55  thread_7\n",
+      "thread_10  thread_2   thread_29  thread_38  thread_47  thread_56  thread_8\n",
+      "thread_11  thread_20  thread_3\t thread_39  thread_48  thread_57  thread_9\n",
+      "thread_12  thread_21  thread_30  thread_4   thread_49  thread_58  thread_92\n",
+      "thread_13  thread_22  thread_31  thread_40  thread_5   thread_59\n",
+      "thread_14  thread_23  thread_32  thread_41  thread_50  thread_6\n",
+      "thread_15  thread_24  thread_33  thread_42  thread_51  thread_60\n",
+      "thread_16  thread_25  thread_34  thread_43  thread_52  thread_61\n",
+      "thread_17  thread_26  thread_35  thread_44  thread_53  thread_62\n"
+     ]
+    }
+   ],
+   "source": [
+    "!ls output_chinese\n",
+    "# thread_0   thread_18  thread_27  thread_36  thread_45  thread_54  thread_63\n",
+    "# thread_1   thread_19  thread_28  thread_37  thread_46  thread_55  thread_7\n",
+    "# thread_10  thread_2   thread_29  thread_38  thread_47  thread_56  thread_8\n",
+    "# thread_11  thread_20  thread_3\t thread_39  thread_48  thread_57  thread_9\n",
+    "# thread_12  thread_21  thread_30  thread_4   thread_49  thread_58  thread_92\n",
+    "# thread_13  thread_22  thread_31  thread_40  thread_5   thread_59\n",
+    "# thread_14  thread_23  thread_32  thread_41  thread_50  thread_6\n",
+    "# thread_15  thread_24  thread_33  thread_42  thread_51  thread_60\n",
+    "# thread_16  thread_25  thread_34  thread_43  thread_52  thread_61\n",
+    "# thread_17  thread_26  thread_35  thread_44  thread_53  thread_62\n",
+    "\n",
+    "# json files in thread_x are i.txt, which i the id of instance"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# /dscilab_dungvo/workspace/vlm_clone/a_mllm_notebooks/openai/output_chinese/thread_63/4752443.json"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# help me write a function to get all the json files in the output_chinese folder. sorted by the id of instance\n",
+    "\n",
+    "import os\n",
+    "import re\n",
+    "\n",
+    "def get_all_json_files(folder):\n",
+    "    json_files = []\n",
+    "    for root, dirs, files in os.walk(folder):\n",
+    "        for file in files:\n",
+    "            if file.endswith(\".json\") and 'checkpoint' not in file:\n",
+    "                json_files.append(os.path.join(root, file))\n",
+    "    # json_files.sort(key=lambda x: int(re.search(r'\\d+', x).group()))\n",
+    "    return json_files\n",
+    "\n",
+    "json_files = get_all_json_files('output_chinese')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# json_files[0]\n",
+    "# 'output_chinese/thread_0/59266.json'\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'output_chinese/thread_0/59266.json'"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "json_files[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\"prompt_caption_chinese\": \"\\u4e00\\u4f4d\\u7559\\u7740\\u9ed1\\u8272\\u5934\\u53d1\\u7684\\u7537\\u5b50\\u7a7f\\u7740\\u4e00\\u4ef6\\u7070\\u8272\\u7684\\u6c57\\u886b\\u548c\\u9ed1\\u8272\\u7684\\u88e4\\u5b50\\u3002\\u4ed6\\u8fd8\\u5728\\u80a9\\u8180\\u4e0a\\u80cc\\u7740\\u4e00\\u4e2a\\u9ed1\\u8272\\u7684\\u80cc\\u5305\\u3002\", \"caption_0_chinese\": \"\\u4e00\\u4e2a\\u7559\\u7740\\u77ed\\u9ed1\\u53d1\\u7684\\u7537\\u5b50\\u7a7f\\u7740\\u4e00\\u4ef6\\u7070\\u8272\\u957f\\u8896\\u4e0a\\u8863\\uff0c\\u9ed1\\u8272\\u957f\\u88e4\\u548c\\u4e00\\u53cc\\u7070\\u8272\\u7684\\u978b\\u5b50\\u3002\\u4ed6\\u80cc\\u7740\\u4e00\\u4e2a\\u9ed1\\u8272\\u80cc\\u5305\\u3002\", \"caption_1_chinese\": \"\\u8fd9\\u4e2a\\u7537\\u4eba\\u7a7f\\u7740\\u4e00\\u4ef6\\u9ed1\\u8272\\u957f\\u8896\\u886c\\u886b\\uff0c\\u9ed1\\u8272\\u88e4\\u5b50\\u548c\\u7070\\u8272\\u7684\\u978b\\u5b50\\u3002\\u4ed6\\u8fd8\\u80cc\\u7740\\u4e00\\u4e2a\\u7070\\u8272\\u7684\\u80cc\\u5305\\u3002\"}"
+     ]
+    }
+   ],
+   "source": [
+    "!cat output_chinese/thread_0/59266.json"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "4791127"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(json_files)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# sort by the id of instance\n",
+    "\n",
+    "json_files = sorted(json_files, key=lambda x: int(x.split('/')[-1].split('.')[0]))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'output_chinese/thread_31/4791126.json'"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "json_files[-1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# check if the json files are sorted by the id of instance\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
+    "data_list = []\n",
+    "error_files = []\n",
+    "for json_file in json_files:\n",
+    "    with open(json_file) as f:\n",
+    "        try:\n",
+    "            data = json.load(f)\n",
+    "            data_list.append(data)\n",
+    "        except:\n",
+    "            print(json_file)\n",
+    "            data_list.append({})\n",
+    "            error_files.append(json_file)\n",
+    "            \n",
+    "for error_file in error_files:\n",
+    "    os.remove(error_file)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'output_chinese/thread_31/4791126.json'"
+      ]
+     },
+     "execution_count": 22,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "json_files[-1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(error_files)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datasets\n",
+    "\n",
+    "dataset = datasets.Dataset.from_list(data_list)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'prompt_caption_chinese': '穿着黑色夹克和蓝色牛仔裤的黑色鞋子的女孩。',\n",
+       " 'caption_0_chinese': '她穿着一件黑色夹克，外面罩着一件灰色上衣，搭配一条蓝色牛仔裤和一双黑色皮鞋。',\n",
+       " 'caption_1_chinese': ''}"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dataset[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "list_ids = [int(json_file.split('/')[-1].split('.')[0]) for json_file in json_files]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "list_ids[0]\n",
+    "# append the id of instance to the dataset\n",
+    "\n",
+    "dataset = dataset.add_column('id', list_ids)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'prompt_caption_chinese': '这人穿着一件蓝色上衣和一条黑色裤子，脚上穿着白色鞋子。他的头发是黑色的。',\n",
+       " 'caption_0_chinese': '一位穿着蓝色衬衫、黑色裤子和白色鞋子的男士。',\n",
+       " 'caption_1_chinese': '',\n",
+       " 'id': 4791118}"
+      ]
+     },
+     "execution_count": 35,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dataset[-9]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# dump the data_list to a json file\n",
+    "# 一位穿着蓝色衬衫、黑色裤子和白色鞋子的男士。 nghĩa là"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# /dscilab_dungvo/workspace/BA-PRE_THESIS/dataset_pretraining/SYNTH-PEDES"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1bd1611110ff416e8556f83bcbf014dd",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Saving the dataset (0/4 shards):   0%|          | 0/4791127 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# dump dataset to above path\n",
+    "\n",
+    "dataset.save_to_disk('/dscilab_dungvo/workspace/BA-PRE_THESIS/dataset_pretraining/SYNTH-PEDES/chinese_translated_annotations')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datasets\n",
+    "\n",
+    "new_dataset = datasets.load_from_disk('/dscilab_dungvo/workspace/BA-PRE_THESIS/dataset_pretraining/SYNTH-PEDES/chinese_translated_annotations')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'prompt_caption_chinese': '他有一头黑发，穿着一件蓝色的上衣和黑色的裤子，还穿着白色的鞋子。',\n",
+       " 'caption_0_chinese': '一个中年男子，短发，黑色，穿着一件浅蓝色衬衫和黑色裤子，手里拿着一件紫色大衣，穿着白色鞋子。',\n",
+       " 'caption_1_chinese': '一个中年男子，短发，黑色，穿着一件浅蓝色衬衫和一条黑色裤子。他手里拿着一个粉红色的瓶子，穿着白色的鞋子。',\n",
+       " 'id': 4791107}"
+      ]
+     },
+     "execution_count": 43,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "new_dataset[-20]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "dbaf83950bda495f9ac2373748da7646",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Uploading the dataset shards:   0%|          | 0/4 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "9a7ae97afb0549e9b9fe9638e8f0be48",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Creating parquet from Arrow format:   0%|          | 0/1198 [00:00<?, ?ba/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ac4f5b0cf06647dd9132725fd813d2a8",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Creating parquet from Arrow format:   0%|          | 0/1198 [00:00<?, ?ba/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "3353dc46907841ee8bb0d62b50531229",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Creating parquet from Arrow format:   0%|          | 0/1198 [00:00<?, ?ba/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "5a2e8418bd9e4b30a6cc404731354691",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Creating parquet from Arrow format:   0%|          | 0/1198 [00:00<?, ?ba/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "CommitInfo(commit_url='https://huggingface.co/datasets/tuandunghcmut/synthpedes-chinese-translated-annotations/commit/0a6b1aa1c1aa34ff201543ba4ad2a2abddf5204e', commit_message='Upload dataset', commit_description='', oid='0a6b1aa1c1aa34ff201543ba4ad2a2abddf5204e', pr_url=None, pr_revision=None, pr_num=None)"
+      ]
+     },
+     "execution_count": 44,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "new_dataset.push_to_hub('tuandunghcmut/synthpedes-chinese-translated-annotations')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "tbps",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

a_mllm_notebooks/openai/openai_api.ipynb ADDED Viewed

	@@ -0,0 +1,408 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "65815b1f",
+   "metadata": {},
+   "source": [
+    "# Image URL"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d606605d-b949-4b3d-b582-9316734320f1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ChatCompletion(id='1831', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on a grassy surface. The tiger is positioned with its front legs extended forward and its head slightly raised, giving it a relaxed appearance. The tiger's distinctive orange fur with black stripes is clearly visible, and it is surrounded by green grass, suggesting a natural or zoo-like environment. The lighting is bright, indicating a sunny day. The tiger's expression is calm and focused.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735906949, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=82, prompt_tokens=1843, total_tokens=1925, completion_tokens_details=None))\n"
+     ]
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:8081/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": [\n",
+    "                {\n",
+    "                    \"type\": \"text\",\n",
+    "                    \"text\": \"describe this image\",\n",
+    "                },\n",
+    "                {\n",
+    "                    \"type\": \"image_url\",\n",
+    "                    \"image_url\": {\n",
+    "                        \"url\": \"https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg\",\n",
+    "                    },\n",
+    "                },\n",
+    "            ],\n",
+    "        }\n",
+    "    ],\n",
+    "    temperature=0.5,\n",
+    "    top_p=0.8,\n",
+    ")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "370fea1d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ChatCompletion(id='6', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on a grassy area. The tiger has distinct orange fur with black stripes and is resting \n",
+    "text = response.choices[0].message.content"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "46de478b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\"The image shows a tiger lying on a grassy surface. The tiger is relaxed, with its front legs stretched out and its head slightly raised, giving a clear view of its face and stripes. The background consists of lush green grass, and the tiger's distinctive orange, black, and white fur is prominently displayed. The lighting suggests a bright, sunny day.\""
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "text"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "f60099ff-ca4c-46f1-9dcd-3a4fb776ea4d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "5"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(client.models.list().data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "e51e6cd6-9ca3-4082-8a8c-f1668f0de5c9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ChatCompletion(id='1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image features a tiger lying down on a grassy surface. The tiger is positioned with its front legs stretched forward and its head slightly raised, giving it a relaxed posture. The background is lush and green, suggesting a natural, outdoor setting. The tiger's distinctive orange, black, and white stripes are clearly visible, making it a striking and recognizable subject. The lighting highlights the tiger's fur, creating a vivid and clear image of the animal.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640960, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=90, prompt_tokens=1843, total_tokens=1933, completion_tokens_details=None))\n",
+      "ChatCompletion(id='1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on a grassy surface. The tiger is relaxed, with its front paws stretched out and its head slightly tilted. The stripes on the tiger's fur are prominent and characteristic of the species. The background consists of lush green grass, and the lighting suggests a bright, sunny day. The tiger appears calm and comfortable in its environment.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640964, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=73, prompt_tokens=1843, total_tokens=1916, completion_tokens_details=None))\n",
+      "ChatCompletion(id='2', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The image shows a tiger lying down on green grass. The tiger has a striking orange coat with black stripes and a white underbelly. It is looking directly at the camera, giving a calm and composed expression. The background consists of lush, green foliage, providing a natural and serene setting for the animal.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640967, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=62, prompt_tokens=1843, total_tokens=1905, completion_tokens_details=None))\n",
+      "ChatCompletion(id='1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image features a tiger lying down on a lush, green grassy area. The tiger is relaxed, with its front legs stretched out, and its distinctive orange fur with black stripes is clearly visible. The background consists of well-maintained grass, creating a serene and natural setting. The lighting suggests a bright, sunny day, enhancing the vivid colors of the tiger's coat. The tiger's facial expression is calm, adding to the tranquil atmosphere of the scene.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640969, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=93, prompt_tokens=1843, total_tokens=1936, completion_tokens_details=None))\n",
+      "ChatCompletion(id='2', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on green grass. The tiger is relaxed, with its front paws stretched out and its head turned slightly to the side, giving a direct and calm gaze towards the camera. The tiger's distinctive orange fur with black stripes is clearly visible, and the background is lush and green, suggesting a natural or well-maintained habitat. The lighting is bright, indicating a sunny day.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640973, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=82, prompt_tokens=1843, total_tokens=1925, completion_tokens_details=None))\n",
+      "ChatCompletion(id='2', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The image shows a tiger lying down on a lush green lawn. The tiger has striking orange fur with black stripes and a white underbelly. It is looking directly at the camera with a relaxed posture. The surrounding grass is vibrant and well-maintained, creating a peaceful and natural setting.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640977, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=59, prompt_tokens=1843, total_tokens=1902, completion_tokens_details=None))\n",
+      "ChatCompletion(id='3', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image features a tiger lying on green grass. The tiger is in a relaxed position, with its front paws stretched out in front of it. The background consists of lush, green foliage, and the tiger's distinctive orange and black stripes are clearly visible. The lighting suggests it's a bright, sunny day. The tiger appears calm and at ease in its environment.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640979, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=74, prompt_tokens=1843, total_tokens=1917, completion_tokens_details=None))\n",
+      "ChatCompletion(id='3', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=\"The image shows a tiger lying on a grassy surface. The tiger has its front paws stretched forward, with the rest of its body relaxed. The background consists of lush green grass, and the tiger's distinctive orange, black, and white stripes are clearly visible. The animal's expression is calm, and it is looking directly at the camera. The lighting in the image is bright, highlighting the tiger's features and the vivid colors of its fur.\", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1735640981, model='OpenGVLab/InternVL2_5-4B-MPO-AWQ', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=91, prompt_tokens=1843, total_tokens=1934, completion_tokens_details=None))\n",
+      "2.86 s ± 846 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%timeit\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model_name,\n",
+    "    messages=[{\n",
+    "        'role':\n",
+    "        'user',\n",
+    "        'content': [{\n",
+    "            'type': 'text',\n",
+    "            'text': 'describe this image',\n",
+    "        }, {\n",
+    "            'type': 'image_url',\n",
+    "            'image_url': {\n",
+    "                'url':\n",
+    "                'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg',\n",
+    "            },\n",
+    "        }],\n",
+    "    }],\n",
+    "    temperature=0.8,\n",
+    "    top_p=0.8)\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "094bec32-0324-486a-809e-d919891c2167",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# !ps aux|grep lmdeploy |grep -v grep | awk '{print $2}'|xargs kill -9"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07a1fb36-e361-4d59-870e-0a8a3f15e5d5",
+   "metadata": {},
+   "source": [
+    "# PIL Image"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "e56e3874",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/dscilab_dungvo/workspace/bin/envs/lmdeploy/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "import datasets, huggingface_hub\n",
+    "\n",
+    "disk_path = \"/dscilab_dungvo/workspace/BA-PRE_THESIS/dataset_pretraining/SYNTH-PEDES/annotation_english_vietnamese_processed\"\n",
+    "dataset = datasets.load_from_disk(disk_path)\n",
+    "\n",
+    "image = dataset[110]['image']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "c0c2b27d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from PIL import Image\n",
+    "import io\n",
+    "import base64\n",
+    "import uuid\n",
+    "# {\"url\": 'data:image/jpeg;base64,' + img_str}}\n",
+    "\n",
+    "def pil_to_url(pil_image):\n",
+    "    buffered = io.BytesIO()\n",
+    "    pil_image.save(buffered, format=\"JPEG\")\n",
+    "    img_str = base64.b64encode(buffered.getvalue()).decode()\n",
+    "    return f\"data:image/jpeg;base64,{img_str}\"\n",
+    "    \n",
+    "    \n",
+    "\n",
+    "def generate_content(image, prompt):\n",
+    "\n",
+    "    # image is a PIL image\n",
+    "    messages = (\n",
+    "        [\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": [\n",
+    "                    {\n",
+    "                        \"type\": \"text\",\n",
+    "                        \"text\": prompt,\n",
+    "                    },\n",
+    "                \n",
+    "                    {\n",
+    "                        \"type\": \"image_url\",\n",
+    "                        \"image_url\": {\n",
+    "                            \"url\": pil_to_url(image),\n",
+    "                        },\n",
+    "                    },\n",
+    "                ],\n",
+    "            }\n",
+    "        ],\n",
+    "    )\n",
+    "\n",
+    "    # send message to the model\n",
+    "    response = client.chat.completions.create(\n",
+    "        model=model_name, messages=messages, temperature=0.5, top_p=0.8\n",
+    "    )\n",
+    "\n",
+    "    return response\n",
+    "\n",
+    "# print(generate_content(image=dataset[110][\"image\"], prompt=\"describe this image\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "cbf16d3e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAD0AFcDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDm/FehWvh2yjjt2kLMQpJHWsLSG1a0k3Wds8hc+nBr0zxTb2V3aRT3Um1c5U4zx61PY6poFno3nC6gKoOSCMk/SuZXZo7Hn2ua79vtGtrzTvIvFA+YelcjKo5rV8R60uravLdRJti+6vGCRWKzk1rFXJsaVrOF3KJHCH+Dcdv5Vo22oFZ45I40DxcodgOTjHNc5EWLda7rwhpEV3cKZeR71UtEOEbsrT+LNaWbzQY42IxlExTLbXNSuSy3L+ZGzbiWGea19Z8K6hJfyNZxq8WeBmp7DwhqCxZljRSe2ayvc2cLHKzRl3dlBA9NxqnDIEm2+prrZ/D95FK6NH+VZR8NXrT5ChRnvWhDZVvLK2kMeWILDkg0VpT6JcqmW29cUUhaCXfiD7d4RSynceeuFBJ5wO1Y/h7w3deI7oLFlIFb55COPoKf4X0SHWb7NzKwiX7wHU17Fp32LTbWO1tLfbGowAOKl6Eo5Hxh4esdI8Hhba2XzIQP3nc+teWumUDDvXsfj66aXw7LGEOGryTTYTczCDIyQcZpxK6lSEESDPrXs/gjSRLp4mzjcMdK86i8OzSMp29SOte/eGtITTvDsKkcqmSampK+htTjbU47VfEEXh+Z/tMTMd+BisxvijaH5VsZGP1pnjFhdTyb/mCuQo9BXCPbhHzjinBKxNWXY6O78czTSM6WYGTwCelV18Z3DZzap+dYpi3DoaRYMZ4qjE0rnxPdznAiVRn1orNMOT0ooHcPB+rJpWvxSz827nZIPb1r6IsLeynhSVEQqwypHcV8z6jp8+malJayrhkP5+9dhpPjzVNK0pLNCrhU2qx6j/GqUeYzPTvHCWI0CeJmjUFDgAjJNeH6cP7OkL7FZ+xPan32rXOoy+ZczM7e54FVfN962jTQrs6a08QvG43ouK9H0XxyJ7b7K5jXK4FeJiXnrT47qSNso7KfUGk6UWUqkj07VtL3iSdP9IVm3bVPIrh55zbXoDWcg9NycCrFn4kngiVGkY4HepZ/EAucb0VvcrR7KwOdyB3W4beYtgqMxAHipZdUjlkjjKbTgKDV6OFSucUnASkYbqQx+XNFbxtkx0FFZ8pVzd+IfhpLmxGqwp++iX95gdRXlQfK17x4vuhB4WuJOoMXT8K8DViRnGM9qumybDyaTNNzRnmtRDt1LuqPNGaaYiUvjvU0E+4gVTY5FOt8q4Ip8wWuaUwztYHkc11Glv5tpGWPPSudiTz4iO9bulkQQrGxHHepdmK1jXEYA6cUVYQBl4orKxVx3ijWrfVvDS2dq3mSbVV1Jx0HUV5hNY3NsAZYiAe46V3SabcWTDzoSorI13iFVHc1kpWZuoXicoaSpZIyrYNR4rpTuYNWENFLSUxAa19EsYb1nEjEbegFY/etbw/NtvtvrUT2NKe51kPhuOW0kNvIRMBkAnAqnFaTW0gSQDf1OK6LTbtYnOaoXZvJtRElvbSGNR1AyDXNGbub1Ka5blm2kCoA5waKzpxNDcLNMrrnIAxRXQcg46rdXTH7TMT6ADP9aytRU3FzGvQZ71GsdzbSMsgPBPJH69afcTAukhwBuycVyXuzuinaxj6vB5VyfQiszNbevDf5Uo6Fe1YmOK66exz1FZiZopKUVoZC4qxpjGHUIjngnmol5qSMBZVb0NKSuhwdmd3a3QwMDnNb1hb6pdqGN5HDbE8KMZ/+tXH2Mm6JSDUXiV7i3hjljuJUBwAqtgdK4npI7WnKI3xdrQ+3m1tZDL5BKmUnqc+lFceGZiSeT6nmiuuOxxSjqe0abp8V7ZSCaMMx7kcjiuI1W0NpPJCQcIcDNd/p10scuxAArVjeJrNJ1aVfvfzrD2dkdkXdnCXNwk2nRox/eIT+VZZxVq7jZHKgVUEbVtT2MKu4hopdho8s46VoY2YBtpp/mZWnR25cgYrSi0h5oDjg/SndWKUWXNEkDxFCelavia1+0WFqmeS3H5VgaWr2l0Y3GCOoNad1evPdQxFsheg9K4ai947YP3dTm9R0iXTJFMhBVhwc4orsvFGnpLo0UuBvBHJFFbxloc7Wps2U4+0IrdCat6hEJUKntWTZzQeeqsw3A8VrPJvJPrWn2QTszkNQ0XfLuU4Unmqo0Be8uK6m7XEbVml651OzOynSjNXZkjQI+8p69hViLRbRSNxZvarRcik8wjkiq5zZUIIt21lZxsNsK10dg9nGjf6OuSPSuWjnAYDNXY7kBcA0uZsPZxsUvEkEceoJcxIFDDaxFZToqalAx+6w7VoavI8kQDN8gOax7qUmWAoeQaTOeorbHYaxG0+joqDJAWirOmTLcWao+CQBndRSRlyM5WyV5JwMV1MKsQKwNLRjehx90dfautiUFBxzXXBXOaTKl1DvhJAycVzUsrxuflGAe9dr5WVII4rlbyDZPIpHRjXPUp2Z3YapdWMr+1ERyHXFOW/S4IWMgZ9arX1spBPTFZ1jlb0L2NZ2Ou50BiwM7sn2p8MTs4LHC+lEcLKPvVKFYnAOKpFDNSRTZSe1cxFP506D+62K6icFo9h5BqjPpEcd0kqDaG5IFU1ocVR2kW7bUBZXphZyAy5HpRXP+JJtl6iR53KuDRWVi+aJv2+uWumllkhdjnkrWlD4y0ogFlmQ+myuNvCHmf8A3jUCxgkV2RdjzG7nqNt4g0u5iDJcAcdG4rFv5Y5buR4mDI3ORXO6dDZglrjp6CtdzCT/AKOu2MDpU1Hc6sL8RRvvukAdaraJpsup6mYIiocLu+arN1V7wEwj8ZRMf4lwAehrA7puxKyGCRon+8hKn6igeoq94hi8jXbpQMAtkVmgkUJlQleJJ5bysFjXc3pXTR+Ho59JW5lISVEz1rl4rkxzKR1B7V1omaXTi28gFOhrZK6POxE7SPJ9cVptWbYkjDHZSaK9ssI7S3tIyLWLfjJZlBJoqeUi9zxOTmQn3pAQKfdRmOVvTNV8nNXc51sWFcqeDWxYzb49vcVz8jMgyK19CbzN5f8ACom9DooO0i1d4APrWt4JtN+tx3POY6zL1ecjvXVeBYGE7NsOMZJrnudtSd0Q+LiW1xjtwMfnWE8oRctxXTeL5rca4sSgl9vzHHFc1fIrRkCmmTCdkZtvqLyXhVF6HjPeu10WK61O1kZvljThie4rzaW4ayvhIg5Wu48Ia/eapO8WxI4VXJCjk11QehxVdZHS398mmWiF1LDgYFFWjGsnDoCPcUVVi01Y8WuL9bqZht2sCeDTAcVBqkH2HX7qEdFkP86nU8is0cyHuu6M+wrR0YbImPrVL+E/Sr2nMFgOfWpqbG9HVlud9zKvcnivZvCumRWumRuVUEpzjv3rxOSQNLHjruFe56bMYvDQk6FUJ/8AHa5zebPJPEN+L7xnfmMnYjkL9M1DMSyfhWPbXBm1i9l6lpm/nWo7/KabLprQopo41SZ137SBya2/AVqbW6vY2O5kbbkVU0aXF6y56itTwp+71LUR6yf41vS1OaroztFI70VCGIPWiurlMOY8V8XSbvFV6c8mQ5/OmQkGJTUPiNH/ALeu3b+Jyc/jS2jZhHrWHUSLqHjmpIJdoKioQpPNV4Zv9IZfepnsa0XZmzbAvdxem8V7VeXa2vhCRjgYiPP/AAGvErWTZNG3vXpOp3Et54Slij+8YR39ua50tTpnseP2Nywv2OeXYn9a6IMWSuTtz5V2d38LEV1cHzRD3FORVLYn0z5NQX3rc0MbNZvAO+01gQt5Vwr/AN081saLcBtVmI48wDr7CtqTsc9VanYA5Wis+fUILP8A10ir7Zorb2hlyHlXia7ju7lZVUAkc1RsmLKRS6rbSRTtnoCcVFablJqGxJGqvC1lozC8xV0N8vWqYGbpD6mpbKgtTWSQrtr0XQbpbvR3hkbohFefm3Plg9sVraTqj2kTxgHkYyDWKWp0yehyWpRi31m5jByBIcfnXR2J3QL/ALtczqRMmqyOTnLZrrrGDbbJhSAVHX6UpoKDH20QebnpVDXZjbzRi3ZkYkcqaukvBNkdKzNXy80D+4qovQKi1IdRVnt4mcu5IBJJzzRVu6g862Reneii5CidlqfhuyvlYiLBb0FYo8CNuOxjjPetvUtbktmxDgY61iz+Jr5xjzNuPSqVyNBf+EBnJ/4+EUfWsnVfCraQi3LXSSAHgD1qZ9cvPmPnvz1way727muVwzsw9zVJE7EP26TdtzlfStHT0MwfA5rGSB2kHB610ulwiEZHUjmqsgu2crf2k6Xju0bAZ610Vjq/k2cSSIWKrjrV7UIvMgckc44rmlhuixUROfTAqGkyoya2Ne51S3mQjYyn61lTSNNNGQxKg9Kli0TU7ogR278nqRWvYeDNXaQMUC49anRA5yZUv7pTbx+UNhAANFdRF4BvJxiQc+1FToaRloZeqZ89uaymA20UVujGIxUG6nmNPSiigaFjjXeOK6PT4I9i8daKKmRSN6HTraZPnTNamm6XZozbYh+VFFZvYZvQWduBkRr+VXUhjC8ItFFYlvYUnavAH5UUUVSJP//Z",
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAFcAAAD0CAIAAABhIi17AAB85UlEQVR4AbXdWXdj2XUneMwgwSnmiJwlWUrL8rJ79epV7vL3f/SL+8FV1pDKlHKOgQxOAAiAQP/++wAMpjJtl6pXn4y8vLj3DHve+4y3+5tfvtjb2xtWevDg2P3V1cV0Or2aTW9vbzudzsnJyd/88tMXL14Mh+PJZDIejkej0enp6RdffHF+cfX69euzi/P5fH677mw2m26no1Ruut31cuW+Pxqq85OPf97r9d774P3FYvHVn7+qmjdq66036/V6dn15dXX1f/4f//jzn//8/fffl/PV9y/Pz8/39ke/+tWvTg4PgXG7Wv35z39eLJfeqvC3v/3tcrm8uHx7fX66XC46nX5n01uuN/3eQOPgXyxWMoAEtKPx/mAw6PcHnV7vtru+WS1vlzdavL6+Ho/HQBqsViuVHh8f7++PNTAD0eXlzc3N6nblNWRkkPvt27fHxw+kRw8fHh4eHRwc9Ab9s9NzReYBojObL+CDCveT4oh4cXHx2WefodTnf/pCQ511B/6gQtnNciXP7PoAWf18/vz5o0ePPOluOvhydX3x5ZdffrVeq3M+m3399de9fv9pJdBLEOysN999/c3p6RtIDsf7nU4PiVXQ7/fB1vihuJvNZu1Jv9ff7/WWvQ5o5ZTcDH7+ycfvvfcesPqDLlSvry6gvd6sNrfJ2e0Nlb68uFjc3Jydnl5dnq8++STY9ntPnz5Bi6vZ1Vff4u2y1+900GBNCjbuJE1q/ujwZIGkt7cyew7W8d4IZ7xdLlfzq6vAenuL4liN/4eTA/XPZjNP5tMrOUHrydXl5dnZGdKcv3376tUrrFIhLg8HkweP3uv1j1ar215vgP+97u1oNLhZLrq9Dey1udmoYdPvB77hIGK/XPaH/UGv08VgMA/+6Z/+6emzx5rBf0Cfn5/1uhuEgHxIuLolOciCtH7iBhXAjOOTh1Kn18cNgHrVUkO+FKK77ob2hNPD/f19Erdapx4/teWqqu7tmjyPh6nnu++++/bbb9ehWIiyul3cLpYKdrsdP4eDwjDSOht++61GHz9+vL9/uJxTwxFakN1bcPbG3c56tVxREIyAv3+lo6jQx5a90fjo5BiC6HWwP5kdzmA6+Kf/9n9BhNDiw/nZqZxaHfb7S3KIqrepfFpyi1J+np6+hs/Jg0dPnjwZ7u2Tcyh5VZIQZndV0CiInJsOoAcDKAyJHuqQi+64v+n0NuuNBke9/s3NfPTg+MWz59eXF5i8vJlFOojvcrVaLG+6dLmEvARMQ8iKHnRT5fMZ7e/OZii5Ds/DigghqNUg3XZid7p0gUyQ4E0f/oNev9vvzKez9WrR72729vcGh4cTbXk3v5lNZ9enb94ADqXDVcDGonRXy+AfSRoOqT5pnM5ucLI/imaig8zd0oJ2bRBstL7pEFSQBaDb28nBUbtxz9p1CEKfxGzOTy8gPOh1S1UwP1p1OxzN+4V/hPmdkiMKQycLZhJxVEDfwbDX68bGqR+6t2vmrNkCD1CfsvQgpdlU3u0Ouj1UxkI0wtTB7ZLg3QwJC3lCnAVjp/W98Xi/PxiwQDCnwCWBTycHB7PFzdn524vzK/Zss1iqYl0+oRf1o0Jd9C9zFhKUTG5I92ZDttbDEaMYgXUP5wVzoobhEFg4sz8eoml3M9iwR2uAk6t9urC+XbI4dFj14OkNBxzBoBc1xKgecaK4q5vbzXIwjtKjxWrTXxC9XjQ6ZCgQ6QXA2DvmJuLWiQf0ltEYUIdOZxTmNGs5GDQd3t8/YN4fPX58eHjItGj+2bMX+5PJ9Gb+/auXf/7TV7PZDbkYDkarmNuqLsx3r2L1R7Ogp21taVWKtq9WdCb3kUYIIMpmOO5N9kYYjhlMmOeqwGqZIb9cbFSfygZ9Vhx4LH1poMqXG9anT+A10R2NGK9wfjjqzVm5+JatRACrQebKImJqaLXBoQVrwAKvZCdiyIuok/GINT86OoIz+/f0yTNGaLQ3hs/e3gSz2PvDo5OL8+ne3rd0gQnrD0naan1LgfvaSMsdxrkHbqKJInCmHeSCiWFk0Yr/5YCQjF2UO9Zx0Nfo3miInXCjletepzfsr28Hm2VvWACwAWynJrgGlbrZG+8vbhFrwxihenmBTm/QI980n/D0Ov3bJQPnR7eJwpJS9Pvz6RSVUQHpgS7GqAiHCSlL/uzZs8lsXpx/xhc8f/aCHRIoeSujBtkCVhzEcL6ZL+EwuCVTSB7k74SieJAinjfSqIF0tIbZJBmIQXfTG48Gk73x4WT/0YOThw+OB5vu2ZvTy9vNanaDUSjIqqMgkDBwhCFD7nGE9KqFTzXKbPmVVK3gaBLpBlN74hoJbHa3xNUTKh+g+QgOyR+vSf6Dx48up9eb3qUMRP3o8PjRk6cQVrsaS1RX0/ni+nK6mC/wkskSH5D5OOIVKaCIYsZgp/a656madLTfaSiWw9V/fTreYw6ODw8enBw9f/Lw6ePH3dV6NZ+96rAGN8wGF6oVFiLa2+35FztZpg7Yqo4DWm1WkM2bYMVykqK98YR4yuMJGNy4Yvtefy+iQUyQBYipZR2KupEEIZiPmUdHV0o+evjIk3iF4TAYNMQWN9PXp7/73e/EgtRpuQxp4pZgvcEWtI9KV9ORDrKQsu23u13iuij/ZG//YDyiBQeTvaePHz19/BA5RJNHB4f0gkWA82J+Q3RJAKMxHI+VGvYS8PiHmURwnShHqyiFDMXbHh33doVYBD4IrktQN4mDmjkIU2O3Eu97MhBUy6qSvYPJA36x0zs4PCZHYmWmiHrHw8cox4KyiK5vXr06ffX6ZjqraIpFG+LJbTyiVO1taRF64Hk9j2QWF13UR2j7+3ujg/29o4M9GvHg8ODR8bHAcbVYHR8dPDo5ZrBmQujrq4Seqy6rJeacTPYmh5PR/gRvBn3tbnqDm+LBUv3CEteGOTXV+rCTWAvMFIQDUop+gQBpZFOnYEeGAZ/J2w0HiN2/upqen19eXU+rwLXc5Eq9aqaib968+dOf/vQv//Iv//7v/34zX6i3QmTs2PD2HSrKNW2pEPx3KbJQlVRFvFsxuTETWAK4vYqECQX818v19PL4448+fPL08duzc0EXHDCJ5z6gtMcnk8PDAeuIMf0h/K9n07cJ+a6ro1UCUtZnsphQdmTCajg3xwdHigwOTwJ/J/2uUOHqcjoe6YR0UOX07dmr16/fvj2Xr9P56mBy9N4HH37wwQdMMaNNEf7t3/7tD5/9XtQUPiSVtJfAl0neeojiPqGIDWwJIdzw0elrVAKEqImnYB2JUxLfsb7lGQ4PJoPnL2A+fy/hMS13r/RI8DIGy34vfoRx4S+j6g+uL96eXYZe6t/0yhrECuAc7SdTwGJc+8NxXEX0dMPqNJHRLMQHr1+fPnj0mF2Z3QgFluzWxdXlV19+c3l5jVHPn7/34YcfEmExkjhfUjVdhkPg6PRKzbSVTlfZ4/jIkghwJ2yBxIopqn/uPY1IxPfjREwSgQJK9XSYxWu2ZUxhxHPD4aOTBzRHJo4ptoxV63RYB2qWjiyihQrLTY/DGszmkwKGRyxX1Q1fI63pB8QjbjjP0D0G8XblAYt2K4/r4M9ffvPehx/v7x2S08Pj429fvTl9e/HyzWkINp1fTW8++/wLwabmOV4FCMCTR091gWkN2SFRJVTe3LLJcYxpJXCUJih1y5aFJAKT7hAcoUXCxc0+IeT/eh12gUTczOaIOtk7WCxuEhqmks24H6fYHY4Go6HwSLG4oA1KxexJ4cXCwEJaQ02CEFQrdfdGwE6RYSQxssi6FnyMrLIETeXDzWjwh88+//Rv/47ww0HAoQf5+s2ZogP5oj76fEtUCNSQ9HwwMPLx61//2kiBrsRvfx9/MZud40ZPDJ0oroSixk9CFCnBk6LAYSHyP1D8VZVxFFYJFQ72x8E77jBBLoM22jsAM+Jy6+JKvjv+MmIWLaE/8cdrcI5uO1NwshTIJqzSYAWfqT/84L0SUkEmHiHglOMnOFBjFtPfOzs7/5//87ebXl+8JCapKJYgK6wftEqcPujjn6hTTfqRv/jFL/77//3fPvroI1r65Mkj/VADIUiftqUyv+49AYHkWa4kIKodjkHFQ64Q8iziHonYG/GaqBArW1CKtPt5s8ey98fUYYCQqYLkpKH0fDUAq83ihjMVxcC2eAa7SC6cBXshi6pCxHKZqLBObE4RXGk3vBAijuQPf/hDABmNgAh6YYLhFuUloYUa09hmo0/x93//97/5zW9+9rOfyaOKeJ7hUAirUt48jTUuFPKNEOU0CvmQgTj6HxM3JEdZRossSAzfiDxzfx3aQqiSMtozX9xy5gIMBQXUMaEBSVSsp+SNjt033319eX5B/2FocGV5s1imZ61bEe+OoBpqclEAbGSAOcQRCBaug7cXV0cnD6+vZl9/9e1iOf/+++8Xs7nIRCahC6CRDFPee/H817/69B//8R+Jw/5ohA/GBfwTH6tqRZPjq0UpJbQhmjelDlv7EEGBSyixNvYnaNmMhn0RwrOKl06OjnSOqfTF+TURxVsG//zi+uzy6nI21zGmjgovO+uF7mbZIBBKV7PLs/NzzCV1BYrYc4XReACkxj8yTnqAxAZz70tegJrvwkUQJR+pMMijGM7I2vgjH2hgInb66IOP/u7v/u5vf/mr9z94gW9bfreOjcE+Etv4HIVIuMY3qLYIb1SUFgeBKEIMVFnptd6UCKO7L146OWJiSMNIWI/Pm/73r94sL6+vXr25mM5enp6/PjtbbjrXgnYsWeu+RUjVzrKsEkNThJgtzYEZY0PtskftIZDIVIhf3fkMOFZ3u731PPQ6OXmgyNQAy/W1gT3kwF05GLDp1TV1ff/Fe7/59ad/++kv33/x4mByUGKfztpoPNhPMJKhNA2E82zgJv1i8avaW0DNv7ljaIEmrgIhddAEhpJgGdkdYw2hUZneA5HTwcHl9fTN29ffvXzz7avXXxNPQXC3J1ZGAeMOQYDNTMSstXSIBRxhdVRym4DkiR8expSCIRFJ8SQPkuRpGQaaxPMWCGDg2dnpZG9P3xZR6ZVg4R/+4R9+/emnXCOEY8QNBwwFbam9JbKAnMpKbJCHBEDSgCf+Iih5DnF14RMDpW8NfsxvcufKY0EhEWGnO9rbf/Dw8fXNsv/m7eIm8jtkhkLidIpr2CkxophY+IcKMc3F7ab81Whab1RwAxgQlluJv5ChPQzzCk59ys2Njvtq/c13pAB2nfObG01xxLTgn//7P/3yl798/OBhxB6fqw/DI5M9QDNpBEFFTQM9BJmfomP8LmdcvTXeSxdagTgIcrxkSbnD46PDhw9PEJcxJ1FahwvDNt7fO+72zy6vwWik9OlmM2X2Q4UQMuaV+Qw3w1PuORazn65BnGG8aIS/eS3AxC4hUyll5DJYhEvytBvEGnAHaOPRYmEMm13NoBXI3n/x/J//+Z//8R/+nrCI/LSJpApEHZlpg0X6T7HA8W1eJsYpt+QVeqMLBhhEjlUgu0pQixJI1gtKfOTh/t6JHtQR4dvTxybuqr+9uZ3F8Ka/jzqZ1BFQs5TzuYhPI1HxYoYemfiznEaeaTddmrIIDaS7KxEL/CUUwPdcNvlbZs8HM8gXSTOeF2kcv3j27OOPP/77v/vbv/mbvzGrQ1i40dSRJqJp+bPpEFRdTC4nPwJLJ4M2u7AEFbiYGwE7P0RqtVqjAwYUkF6MpBONAmRB2tfPozWVzDAJ4S8urq5msxpSGRlIVMteZ9QxoLrMgI7mQusifmiBuuGx53FO5MG/hFYN7bLZ8pd/qkksAN/9axpRTIuQkPn90dAE0d//5te/+tUvnz99qhvLPiifKvB5fUuT42PDky4KYBotaDDJ02RKVfL4OREKjMc8PDBZQ+aABFNoIFClhr8hHLkM5EFJqfWNgdauPsLF1TUlFVeKq0aCvJs1Cb1lGBJ/hRMZjk0Q2eHUg3WTU8izHYV2iFUYugYBkkKb3nHRbVLLo6+SOint3v7+48cP/+bTX/3qbz998ugRZDKIok9iTPp0qgBCCG2g7caYgu6DnrhuePXfVaehVIUWgMH/g8kkVna+GQsMjAMYBfEiI4c3J8dH7714ZnzNtCA2JEQmL/pms+X1/CaKoaoSY/0C9cojbGFYhgKne35ec9Do8DYQDgli19gPv4lcXjVCEP6AFmJVF28rC1U4BWv0rgw+p22syYirJq+mU/aXDdQwtPECUVQzJ+fDhJhCBpbo2rREzWjxJtPpPPQuqyPsZ03MODKEF3E6ukx9Q6yueEX2kZvQaYsBUnOrvLPsofiMiqldPWKkjAOJAiPnakYp5DTmgxBSngTZQqmwBWHpO7zeGQj4A0zyqmFf2ZRLne5dDXYyqjGuRA7oHulQudcplQPorjSaXTPCJ65hJuTJUF51tFIZb7wzP/erZmj3DvfYAdOJ6lCRcQTOUsUPHzxAAoMmqIDo5ESVIMn0j4pNyd3qCwRDBIIA7EyXJUc8I/eFlk28Na7LHGSkALZ9bran/dq+Illqo9ZFCbfvSKDgQLPYzMgRbzzWaoYRu+Y1+mw3Y2R6CZZAFfWnlkz2dfRzD8xcz2bHhidfvuLpZr0lTcQg+ie8uyAn8wXTcMzXTTJgxUckOipgzGsQn2HGKUaGTJiOgNwNheckb3U7XywpnYEJwKLRYD3gVjg3InhLdQrV0nEcDAVTf0liQ88TpIuhrJRXfjKsCWXz8I5kbiSRx5iD8VjgeHp29t4HL/jusAfra6COQrplp+KVSy1Lag5pynB/QtJfvzn/4k9fU3dFTGugAjP7+MnTvcnRaHJE4nvrW+aQHCD3SBzf6yBgxhr6IyFYp8dNjvjq7jBTcTOuR6S0NCHYS4gWBt6SQk43ckEdbswjqIpLG1G6GqDpG/7SLyKSrBKsOFwFS/6aSJRZ4EL6EWFUcJUh6lkJ5OlK6BaqxbjTbHpjcO/o6EBdQOc4FCCNWCd/4MC+oDTgJgejieHY4wcPMxYwumYWZdifxCI8evx0cvRA1jLlt6hNJQCO6mDY9A2LjparzXy11k/Sf6dxAsNbg87rzdU1AynKCScOjg7X11MG31AN0hAWuYadcQu00yMd7MFGuyGEUdrSizIBWwwDNgsoRbE2t4ndog7kumRCiW5IIJEWYw6nBlhOTzHLTIxIRtUNeU/w3xUzEcUr8+qElgQRV3OwJvLfnl+a4iS+htf9EypeYevNam8gDF3SD7Zxb4/vM8s+2PTXby/n3cH54rYXZxnpMyJiNAY5dH7W8BVE6XsfEsL9wwQMq5Xurxse1HwU8UQC2NB/14xFiN+IA5NVQWB6LRSnhEKGok6MY5x2gqutxa3YopvxBblDj4Hx6Jvvvn355vWZEVciJ6vkOXLIgArBOuB22Q56qz8xCec51idffvk1nbegQRdhect93NwsbjmFeIUMVTFvBhNHBwLGvb3Dhdnb169Or46Pz48fnBwc7DPMar64vDK48vDx09v0gBOkSJNRXwRhOgj+eodYzH1aAmQUAeQmLxDITbsC1b2yrnjOkojY3Tcq1JM8lzwsV5K/MVj+mN9lC+BsfOHzzz//5GcfJapVl66bOLaMfLIjA7zSDMHu13iftSnmL/f2J0fD0Sp2rtefzvT3tz2WMJeP7ne4z/FKj8As4/JyurqYpbd2cDF9eH1z8sAQR8ZC5tOb/njy/sc/P3rwVB9PN/f16Rurp+bmJFarq5s56lfM3p3e6GelX9fpoHfNzBTyw3jZAFlE1HOFaabJA3z0goeqHkeUI4TIlUWr1yGVgu4vL68MPelK8mFZg1QChjoKFzlV4Rkp7DPo4iXzovP5Yjabi/c5I5DorVg+EEji1EivsrHjOGL0gyFgiK5mq6NFRi5MOa86F0hjPkOHgsEzhgu3k0f7G+NdzMn1/PuXp5aaSQkoF7en529VC5+aVD4QvopmB0R5PBhHDI3lN86DMyyGb5d2tJApUiJ83ZoMEXBkK1SIZ8byeDB4mrD66quv/vVf//X4wZF+BOujSdGLZqgAxeI15QSFWfvhYH9lwdF1bNzB0cPoT/pltI5tCwOy6GzACzBFylS3Thyttludjumq05/f3r69ut4/v2RlHj56YEkJXLhnhBiOD07GBwbh1xWnZrFPb4jthp4kxAobOPLFylqSqKclToMxemorYMIdqXBbiBEpKB3h6ZMSx5fSIA6kN81RB38OyTXKeXFhDuq3v33GRlJXmiaYk7WoYd3VFUKQArrz1TfffvbZ5yzC5UV6wdRKDd6qih6oVOvV+wkV8qJUMq3wq0AFyBWCrk1Yc3JmmbI8YZwuJt8kIQUz8d4HH+uzHD94/PL1K1R48PixMfvpfIZ5EpwODib4bQw82CZ+L5TBYjQj8EQeWo8rvzJEEk7R6PykI4nuM6qbNWtGnEO8CHX3anr9+99/9uTJMxM2bDgvV8FVT2T13cvTt2/NBZ1/8813f/zjF99/9yrDm4QxNUagCAKQwgoK2hV01ugCyVBFBhnSyXZHsXVn48NMmnn3snM02Y8IDMcPTzJRTCXVVayD3dC4y/7hEXe4f3Q8vTRPYkUjbQtKf/vrT81nnl2csUbGrBUheF5VhBK2B7aKD4ajrMIEQF/MTARiSjWS8ZGouqF68IVE8b0DDfzxj3/85JOf/fzN2Wq5WRxEV70SUPzut3/87ruX/mW53cV1TAwPnkFqVKBvIUQEMsqpjTIRASE1Z1xBwBMnJU9dk62ruXOzfrqHAtbRweXjue62+Fol2mVEkQ6grI2FpyfdvtnTSc2kCXNoRyRcXNUfLtvsXPXDtJC4MhhHXngqjbvRFwmvwREBiQeBdYyKR/wlTHgA99o24kL2zNP927/9T4s9wQSU69nNV19984fff04iptcmMyyNMvORMX9xAWeiIB6UULiQhWhJRtfi0RO05Vk6x6ERM1EtBj1CakC8e5N7M2/d/nixWmsO0J5IasuV5GSUg7sQ3BDBEJ97EqpzorFtNX9rZFbDmdFiFRPjZL6IAdgCgC8FHBj8pQacxG45lNHrmxsjDIBTe6awe0MKL4yjES9evE9jr6azb7+1pumsQofwDUFTN01siIWCQdSvRs0gq8ZQAWCR/EBQmaMJlSpnUx/USSTC7PN/JFROsqChQBwXFh4KdGFv4UarxxwfDfWEcuBueg+ZqQSeeQ1DbYQh8tBexRjGSkI9SQ0IbYx68MknH5lcqqWZMarpdQC705uw0HuHwH7z5u3L78+MgPASENaYt5JmK2eqC0Vi7sLzQi2X/EhLSLV9Dn1Ve1h5cvVezamCLREMRWOW/bkVCxmH8JQIlEYUyTLKIkvJQI2mkA7MQ9yqoSrUZ0sR9igaQS1CkjiKYBe0Q02mSGNREDAYMB6YbhKcEMiodLRaX4DmuLFCVYQS+PzM0PFtZBiLwo/qtKCx1+4Lq9S4Sw2g/Cr4chMwYjgjDn5GhOK9Wv2eBEJtAYCZoKGu3qGv1JiWUikaQsAcRNCoqCSr1RSR3NDf/slR4sHUidv+VPv5zQgm9mm/240ig/fe++BnP/uFxWrT2TetmFlc+mhhRQKEmkMALmpnknxp+tDDyD2PnU4QoFo3DliFiiuVd20pchA2QD6ZgVUjQ7lVJUSa9AAFgTzFSWqPsjEE6egrFZwNi2pT8VAhZrgE0nxelkvqfdKTNftgCpKjXa32DYEiUawCKmRdQEhepdHOj/AvzK2U1cSmHkWpzL5i4ZD6BcKWQ8SYZ/ElcN17ATKRWghcMWWrAogRhwx3B0nX4lh7mWvEHP8STATZlgEBCxs08UITHkdLm5SlWDURnpfAiJIUlwFwBb03MQQGZoxIp96qosYuV2b4rE5DwnA/RkEdkWF/qoYmodsiwBo8evSA4jGEX3/5jUgAtEGloungllmS0vB4j6TwLv9RnAhEq7cAShuZf9ulRhEW2IOEC/zxHXmisUEueVnYdkOHK1THUGWbeKdYOIqbDZWySqFXXFLeRmOF89EwRv16TbsXoqzhjVGcLJzwRnFNZ9wegPqUqGchZJxIyQJPR6KMiNuecfrm7e8/++zN6/OqnY8pghWEqBJwi9UKooX7e9dCpjI0nCMXTGJmJAFdqGt3+67kRYkCoNVzVx5YQWyX3IfwmW0RU0Nb49tiHsscdiWu3RVIdTrm6fVbnm3B1H5nP/6A9DQh8rpnoISnaMVTUvH0rCnSRx9/bCTVbPHUKqnsAyjnD1tSG3ErkxYzKYEysLlr13roknzb+1CKCKQ4SN/l3EpAdKPVU9q6LdSKlyfbkRizCWfVwfjI1zfjL8yLLJZpLIV3G4KwOLLWcBPHMev3Jjf760ysFtjRI1xJv+qOA1QsxcSOqqbtDx+dcJmvM8pyfnp2WRZBSykf4PyXVNWV58/DeylV/eBJQ35bw13GbR5/ykvVz8DxLkMEJ7p396SATOu45SGquELVE3KOwQHOk0oJDGMs47aYWKpBiMrK5rVs8jcbrXiVqIIoEBOQJflDw/Cffvrp2zN7ZT5HC+EZG4nwCjewSIT7n0qR3NbMT729V0NBzOrJDIwCpWxCWexWg7pKTFpNMIx3hT0/4RH6BPpUkCQz+Mtcpk9gTwUqqJYrlRnh0EKnTM5WXagWEkchPCkzy8RYhq9MNaz38t57L37xNz87PX1rwZIIQscmbk34WUKhrqqiVZhr+/kXD+9e//j5Hd9a2XcZivl+VhPNHW6r8dBLEMbRFNoQcwOfuIBIR1KYHFT9zaCGjqJ7m8TcpwISxk5RHO4ypmprS6pIYAm3i5zJUYR47/0X33/xxz8tZglFJOtO1KgANmxB+xH+slG5u7fbm7KjWzXaFUmb5VNJRGlaMNFEagiIFbhVTWl7S50id4APG2J1BQAmv1qbGROJQCWz4QxzCxnQSYUhQRxTRiKxMjMd1c8L8i1wqlKazPBee6iU/qUFOx99/MGT3z0y11TeqUi4Q1/VIUel+zjXw0hsGriX8vzeT+Xq17tn7cndczCgUti1KZa23BiRVLpQrVe8kJ9MXXJv4fEgKbslhMXlxP0KIQoo8sDe8Y8whrW6K7QP+TJDR5esBtGrpGK6X6ZSzCS/Hp8aaKZaogltccd6jTJHrMK0hm0wbzjEBZPPLZ4F/vayxXmH6jZXUJAqXoRz5U1tEkwyv+NV2UJoaJ7PUZFXQUBbkfIax+A4Ci+vAI/teK8882jBEPWxo1K5xJH+aTQuR3NhpwJqFherM9KusGvBANrck4usEC3GQ769LUwbxJV3R4Ltj//vf4ooIJD+orIowC5xAulYZcun7Y7w0T0CfKaIa8C8tAzEFCeyExwQQmk3eXwvNfQz84VyCRn9J37HfZtJE2KnMAvkSr34G3xWPDnfJVU32nn0rrF373P3Tl5+/DxiJd2TiILYJWlXNR5gbgx11AVKpMOw42oR51FRoCr4uvBaDG5WRI5M9kTmm9iH282CahDZ8LNogTJZW9D4LAf0SURGNrM5MOPg5VuJDJucYajMC0VeGhV+IBEQAfTdNYilrWDRnteDH1x2z1Oq5WyvE2Heo/MPqwWwSpGVtlvSobNDJiyAM7tHU403RU9ME1akBMKyGAnhSce2oVDgh0m7ZmsTlCGhsVsKlgUH19PLt+fXF5eTo+NWjfCE/BVvSpkD71aHG+h3V23s0PPsLvP9h9u8P8B8K/9N9ZPhrhI3xK9kINCD0FuEEORkWYh+un4BdosbIgoRnzEyHNhhNrC2OiTDwExtmbYSU4xiFWJZlJG56IF++N9a9aANFxhOs4tb4GX0UyWNMQWWJiDfpKA9DrdbUryB7mb3LI2o4w6lu+dudg+3LPqpV/ef5V5tLTUIYh1rz01BSTQoLwwhb09edmaWDSAP9DyeEfLRK3Ohle5qV2f6ESFWlq1kFJRdmF1PSYSBcCUSPu4mc8KDwLKVgqCxte2twrzdMrUeFLlzp+YdzvXiv7o0SgfhKosPubVjUC8gkT8+uGSJUNhktIUR02eKjgAWw/OgdR/KV4SRCf2MTd6YgGnBQUFbBYCT4WYg8ijcjIUk9ilJKrOXSb+qNyZUneXGZMZQZc3xKvYjEmx5e58K95H9awnxF2XzM1RJAptAhvU3GOIasxUXqKedRQ6j7F5OdGTmVxGjeJml66yXvfRELL057h1boqES6a4VmyuyEkfCeVLk+ub1a+Ighw56JrPpXC9GNKKU0ZfSoOLHfQ6z4vdSGmjOpHVf/ITA/fy7zJGsH9Bu5988LTC3Lz2WUSVIILyBJumGSPoPDPdSdLg22Niz5MQ6sAyTxrXK3t8MV8a+LC8bZBkZY6DTHLGqzXkBw/iC+b8GnAag6qdxCdOP4337jphfQmApdWYlK1X/Vu20ITL1nydYbNXnP8/3H729TzV8QsZGW/QhoiURfHlCm1o01VsvLLub2VICSvPHOLpe9C319j6yIfSqsFpV7vOzxEG9A/Og7RH+c49sqTV4/tlQnlGGzKGgiu4HlYhJQOGtXlRMC4EmBTHju8Qu5/nu51/7N8hXZa0G7Kka1JkbIYzlktnJwLFhqdlJnLIQdzgiHkavbZcwV1N+w14rW20yYFTg0aCQMfKdMChtpHLjC20k2wpUQ99vpq/NPqKFbtXrs1ML6g1Z6YRG90xuxb5k7sGQqQooxn8kEe8o9dcSYJe/gbglxhZWv9h6+8D2s1Tm5ND6QbqABrC0pPzk8Gi4P5rNb7797uXsq69tKjfVPMkKoiDKV9rMILfpDSKwrX+nihnt14/eHD8kESLHs7M3b05f2UahePlYC7t0QITlN+Si4NluQkJFlTRB2AG/o0umQ8hDU4fGybss/9VNbH/6CHf5NKGK4FKcs6vSsnlL5Bhu+zJqd+7AFixbLfi1q+n8+upy325zg27dPUuqLFK2+WmwN0l/MxsLFyM+Jt7GHBbXEw9kedLMSS+H4+zTtPHGHJTJKJqDOhwH3luwZvckQaKLw+5Q/xRQOm3wjJ38UfqPBORHGX/6QePSj95VF4hmlsVEf80Tb36c+UeeDBZtLM3tPnt8ghamJE7P3lqIR9LPrSqPyvTkYxZ40VZ54oCQNU5n8Pkfv4C/7aqu4gxPnj9/enm5F3pVl/Z6Oj87u7DujL01q13djpT+C3BDm3CseNgsZx781UnNKfNOFLY1aM4L1yzAMnU/n/UthrJi3JIx8g5dm28sRxz1Hx5PBsNnIkjaaxUCey+wpBgRJ+rTT0DRdCSbdmtXVdbEa4eumDE27rY/thJ4HwkYIQ7WmvRHjzoH+4evT0/tKqJcKMlf7RRqC+KP/9xFFj9+9b/3pFEBjfAQyk6uGff2s3JY7MR/EZIIIY+4tJTOhJLwmYNbZIO+BdQ1PcNO7o08xGq1laQnzIKpqCkLivyQ2kAlKEkUp0OKdEz4SDb55sAOaLt15lGI6GjcauxCbtPXalLQJCI9tko42jRcq/eRD0qN5/efbssUz1tx1SXbu7Isf6ar53PqLb5hedK1SBcCQLfpWRR0/MXByJJVu+e6jq1wFgj8+3uHo8mB5TV1FFVHp7Gt8nXGRHarZMOSfgYzaGKOULmxUjPzVBmuCIFjSKydzd5/Lkc4hAPgy7xdielfwLpF5x6GDec7WrSf997/9O29bMjabG06RwkE/Cv/j86qlfKznoAQmHGjXat8LD62Gm7eXwwmIqakqJVuswUhCOFe9DxAXRPRHGKsZcJNQUh6AI47YRsl3jXzx21FSi12L+uQRQIRrYK/VLnxP7DuePcO9IblPax+Gu2UrcLbev8il0lDHCcO4rmsY4G1/DgSChGDUAectuL3Y90BznpaNGuTeO9mwTrY+5c42ODMfGYBiOppgCs7muMVWI10yzs9myIty0RAkWgxe5moPAeOZE9johRWWeym5ZCg6UF69SosduVO0cLFvX87mnhxL91RBDfuPf7p28oTbkfubRXJGo6Msq87WaDPHvjfW1uuY+0MUkPWbvNsrLBdeUaDIMgbYr3Vt5m8YSgpQO3pVJh9iamAjSEE3jQSLsDMAF1QiOMmJTAxWergq+hfhZC6LA1pfkSfFq5ArFQUyeU/Ry6A79L9+x+XS7hA6jPlEepqh6mKE2cUM3rQiT2oWTwkgDZfRpstR3CuyNnbN27Qh0E3U7Oeza22Fm5yJZ3Ofi0E6yDo4GF2EhrxzaCvOMn0JnJrjFZphBPKgvwE0bELvA0CpspsXkC1mIiSAXKzxSkL9RFIPU0i3iG7Q/p/428xR+gLVNofRZXKChggA4YVUwOrUDt18Bx69QZWHBkmOT97O7u6RKlgYK19tvjphscPmOnOki1damHHzz75OLJkOa2NsvPlenGzcQIazA09ZT2WG6YhA5I0MGvelRYtYYczp6KMOq+oshUECAJhh+Z/LRG7nO/+Novwk6QDAAQRn503Pn5nF7K8h/rTlZrXyPaf2lInSDg8PLDE/5bE7++v+2Pm0jw/mvCHoeOVs3qmODmwadrQ3cJE7dXl7XQ+vbyYX2V4xpJNGDWYUInkRUsy1JkrnqQWsXnmusxgpZfWQCdHHjUj8Q65H939UAvevUa5nzSN8ktaQXEprVeycSrPIw4W3WY1dDSC+dp0BcFHtqYdHppush/vhlUf7TluwYJqRbJ6qncq/AL4wArtxXzNJAoevHQVPt0IS0MBqGZIOoZY45vbdmSTUyAoYL8rAoXvQL0aZkXqyhyUL1LE26jVOwzv392PvsF0ZyzlQYj7he6IAnnNNBIowjJE78qRx8jX0Av4mX2QHEwwv/v0vReTBycGCOaO5bNFuTdcXE91Q6xVur6evXzpFMlLq3YG1vtGKhZLiuD0GZsZrbpDEocuaAlMrhp2owFb/rRn+i89NegSQoYicIvpoFLUSoktFnc39/H/i/uWx3VLCBW7R8pdiqGt1B4UZUsSYx4BtyU6GOI3kwhDNN/0o+Wrl8Jtpw1k6+PqyjZzkDpRcm4tYQKnOA6bYb/485cO00MF/S3kHMVDUPguiWhU55EaFdAYIURZBmvcgEbiNaUdwPn7X3qH+5l/fP8XtclQDI+kJUXkejTRHpWb/SxjsoCVUfKvpmAyGZn1a7c9IwPLzfrt1cWNhd6j8Y2tF6zCwXEn27n2LG6+uLJjN+vyX52eDpwtaXxFOD7JTtYeQoz7g4Ps8DEFbml1pmlQIRvHamkAGluax2WSBrRgDEKG0spyoSTjr06F3V+WunuY6smGHlBGSVA89sDwKMBIQ9bcY7WBBnOTpCahVOaqbam4nE2ZvyvjT5v1wcmj588e7x89sAPIIkprFV69+vLly9disG++/i5j0NiLCgzig8OjD54/U5mTfyI7cU3onlkequB4HIcd1SxY1gFRCoZBKVoYa5MExq0S/SVOP/69E/k7bNvN9nH98USl2+fp+FScyPBVmJAerMM9jTBmybPAbxlKSToUfXs3FroJ16uZw50+/uCDZ+87aejQKMPzjz6xfv3/+dd/+x//4384e4laE4escEI+XKUCin02nRGHmtdNQEnzVCsPEdDdPDywVH9AYWwcKSOU8XtlhZMIUS4zBvP/j6S5RCBkIt4fe5Deju2V3jB5rRg2KxkNvfFjINYzftjr/vrj3zx+/sw2i5Fhp/EBKsxn8y//9BUNOHv9BjUhSODDVQJWKBE5z/3jIR1Fmp6Cn0wNcnilJWsdrK1lM+bjjNz3O6tSAPhvE6IQh4q4fmgtdhnu/m598I5kCt69uruJdCWpD/6EIrzGowQz5J41d35hDgaJ82C+MDTrxyMyMZ/2PX340YeTkyOugS/P0vf+4Jtvv/3sd7837iCDStJzMECbVjIoJ9QwCpUxPM5Au+yCTOBAIIKgjHtPEkOnAxpLGdrFMDTTnJoCd9ns/PhrkoKNEO2aekoGq44QSIQQcWC8Iw9ueISsR9CzCm+qafwLzBCWEgkaHzrTqejYf+eEE0Nlg/GfPv/i97/93emFJfj7BEcMRq51sJZOS9OtRgIrq9uKddWItZmMhj8g7gihR2GdsZ+aBhkLbvVQjhG6z8vS50Jge2lYtR/32Y5Bdz/D71qE4kkhfPcKziTBITFbd4EP+B9PYQTM4N8YvNmOQo5xRqBT0p1H39nH8sXnBMEM041hw97g33/3R9sr5FgJQU3HZQdTZ8CILmZTB3yOR9k6S+BTzXpteBMhHUQKAjDRwIhQTn8Fa3qWIQSeIGZZbZC5+V9JKHKH+Y/z371qN3UNdeqGoioLqUwOkf6KafUdjUEDKeCovEjA2WXr/9n5m6+/+ebNxcXbCycSoVV/ltOnHIF9sNJLX29EQI6bGLw9e2OprH0B1oyyf3sH+6hm6M2OeiO8MQcEEKYhdZP86IV7bQpkWRLwYWOI8aP0HyH8Hz1XQbMX8A6xGYTIWFHBqGAWx/nBRVvzqA6nIDr407wZzA0BRRnBFoPHta/7wqUrexMd8n01NbBs6MTQG7+Zza/WhoayA1kj8j3jEukY5QxpXan0plYi5T0beTVeC8JDXfViPigROF2HCiVdQRlyJFz2S/KngL7n5NqL//L6H5GmCKFSTVTfLVyhI+k0EVDroG1exKMYMVnLwIGJ8ab2jgt8cHwCC93HqWG61dp6pWkNPDgJkyG1Aw9ugw/efyGc9u7Rw5Ob65uAkr1UzHGJQHovwa9AiV7ocxv2RhFHVkgL86DlWnRPK0/T7vj5tiEDbf9z/Bvyd9fWlp9uXIJXlQcQ6qs3vEkmr4yGZXc6QOQPp3hJolsJ2/y15xQJxuNL/QVbf0nvjUnKyHE/s4/OYef+YHxyMDnt3IqyzpD1ZoGETv91FiTjr1qQqEshzSCB3vg+jatT4TyUCsJGo3b77qpgQ+bdox/etbd32fx0L8u9Uq2G3XOYI4kAIe4sWwZKSBVq65e2ChsbKoSuno7IocbQcjaaYVuhQUaUSBXJIT6Ww9nyb7jgwdHhJx99aKNqzlA7mCAP3BWo83oTTJWAl75pc6oOkzPo27ZdxyTV3HFkMmjs8Gz45FqsdHMPt22mlmdX4if+3tHFu8ZbdsOt55I/gteMfsSLx4TEizJgceeZeQCb/7Pwi1CtczwMihiCZSD9LrEKpzMmYy+nIUvGIWZDOB4XXH3v9LKzdpbiSIVGDrEnR6Y+HEhEHZgPOmaABkxkFAUbGcDid7NzRYSfwPB//ZGmq/U77Uh4kDHjGLeNzccmZJxfmfPWEv7QItBaiLCcCBmHOUiJ2EO3SBchwBF8tKdVDXlFwSHhCDEUHQ54kmze0l1brK90sTb9rGgwYYG6BKRIjkDZ1JyoudbN0xG7WLp2CkdYd/8Hlp3B1O8MM6Li9zFP1h+lcDirVZIzl4rH/PXE1AkYsL8eN3I7m9zss2MoZ6hwMM4mnvwXeV8ZEqPxgju9RG1BTdlEoQEzshC7wfWbYbD+17YSkmP45MWLR/uTY4O7hh9ECn/8w+fpqChEqu7ArQ00xjPta4y21Aa3eEsSGRbE8DRj2JB8pyF3NfyVN2FgWBAQ3CKB+llHIpmwyS/KazohTIpqZFwUHTMqz/yNa0VSLGZGAoyELOukTMwWNSXkY2LseBY11XlNw1HnyGkSD57YOr6/j9u945O305mxtwQK0T3awznkFIGV+QxnItiMrjNawpLxmBiIXdqJQXiYV2HF7t1/9TeOoExRu6Zfg2sgaAMajZupsP4XQXqFOlvx8Tepggaavc70dMxjpifI642O+I3Az2reoYk2OGVGPgeTm6CMC+CJsmnM7P7BZHRzuHr27PmbN2IM6zqyyr4hydGCyRCNSOOgBl0iDjlhtMZCKraPDys07uMLssD916SWP/OPyhItIlcWMHpX3SlCkh3gtTAPYBLHrwVGgaDAXEFgp39sk5kZVx1N8jtK7ETn6XScfhlRTp5gOzTA9IZ+NLJ1Tdcer7sn1zNnMZsWNY0hj5kQYhCT14wz71xmk14UHaMU93G8+/HX4g/tqPYuBf9IUgRN/6msRpR6S9PqZEI/px+MrODOaSmYn34OsHV/sTXzR5EF/c7RpHd0/MQA5KvXb23cNIRirMbalIGgKgdv951RoyUSoZwww+EYR9ArgwSMdGU1LF9CJCsaZjN6YTyimczAxI2roBC4h8UOm7/y7xbJoJ71rJQqdnObKElAwpJ4TQMOUdn4f8a+lkSHXGAIg8S1SFd7dH1jgTl7+PTps+H45Pj8+5y+cm1Xap2OkKM2JrVpnj7sldQJDUd1xsOJ4zAyJRjjh0Xxl+CCeTaimvwVnNfyt9Aoghq25VJ/E7xEeHJtWLVre56nP5UyqKaOKhtrGD4UTtGzSHhZv3ofb0lWzCETmEytZc4KqDHXXH5MlYTjwNNpzuTcvnNKRw+fPjuYnPRMaq++vb3IkjejLDGk2o0h5jRYU6cPjXr7e5PHDx77wghFMNUHf5Vxk+oFpRsm2CJRSwUabjDKTRDYmgA/i2pbXO+y/RTu757dZaubjGyrOOoWE5CmUykscSMkSgRobaoDYK9nHWeEOUdKNiOyOa8n2yC5kvgOUqNcrGB9q0E1TDuj5vhAQzU6VQHalD33wk1m2LVv9MkKOaeQGWE7mE4PSAErEPEqR6B8CO/UwvRegVG0R8doRMAruQgjpe0wVGRoS52GsUbdNDLFp9RPT9S1pV2Kuw91y9JlcBV+0fR2nGIMRUZbyb6rMWFj0iYdHc3hXi9LLGMtRyK9GhzVAh2BeaIJBtCAyq7TYVfpYyJKBUANJePvUaV8eiCHVk/2EWKqB+tsfNywqm5oYgdgeqkBb5simEnp3xPgUoIg61H+7NJf/NxiW2+35FCFMLjcXsuc+6hjDD2jjgQBPeaJR4gw8JRGT2C3Gaeikp3E9+ZsbPfRFfD5F07N+GFUlnXJB5l6vh72ZDBy2o08L1+fOg/8gaNAVEybrBS0fIEpEajJTrL29vaPjx6YFWQLqYbj4XOSG3WEXQL2xF5MQqqPTkWHtUYO8yfKgVj16IeC0PAM1JXiB9pNSkQG7nqifrD0rBXsvcrIITJkkMXe4sxPcwSGfLyNvKbHJBF8vM5XgDx0SLKlWZlJKt3lctWknsPR+NHjB8evDx0hPEBGYaiwKpM1N4tR9t8Mx85Ui8o5oN5B/YK01eHVFVjn8ykX5KTzmqKMvrWh5zKOAElglxSabKU9IVP82fZnvS5cI/pb29kebq9R/LLRjRxBPuMq6oAuIpgCINjOH1pA3ioGG51Qh1jU1pnObc4xCX+6nWmmoBemp6oTALYCoyKohBH5PolTTceiiYGcvI1/WLxkCxFkkOPrMpXvfdrN8VlcBjrTMbKXQZUCMRIgFYZYhDo4B2QPtig1TO5+7tBtgVXy7TLu3uRJ+JvxjXjCmCIpHI9K68Zlbsm4s2+y5VNM3evLhJdyI4yHfIP+sWPT49tjzXgAw0bNm2gvEahJFSEFaJ1fzPCp0KcC2gfkMFZdQq7bseU8uB31ZpAy0qcPbln0nrNfrq+zD2HdzvrVPWe8qSn5zYgw0HVhIg5lzBtiMe/3UmEe1D2NiN97mT5OHhbSJS46fTXWHbtMSa1OZ9adVeaINGrBvS90JH1NZH4Nklg64o4K+VBZ95YVwVeTV2aWWQVKKvSo74hpllhxruR64hsePop3bZlLRqyyuUDETTTidXVO0BD+IR6z7FB40xwWuh2trnyFRVt237UtqbECbFpGYcPAMueeEMuyCO7uiOAhQJsckQ83YGk/S6RcBGYhTuEf5fQzKxolJwAihC+K9A2hW+FHwvtZ4bheXfe7vrmQ9RcYGXQZgNLZGhaiDgbVojOENP8iMWkpxg070aIXjSBwDG6aZ3RWOQQDkWJ6uCGqnhA+SsBMxEpV0n9XW1iU6mCvypAjCkGPkrZ+Uc13VIgMyBVYwADVKpmsIVjzrJQwSCQGdGJ8bKHBYbrgY0x1jToc6v5UFNvVGR70rof9i7e906nNLct0DPSmIMVNiGhsgKrRc2XToMP/owAOlewPEgMWROKu7CNIo5TLQ8Nq+ZgAnUJUEEBBBiaQ9SAluhgExT9Hk6LUHgs0X9DAOr4DpUotQo/yF9ALt6OTgHNfLKBCIRdq1FqhcMVbNbixLc4J2wZv3OsgO5GIDpN/EYqE6JDx1kdY6jMsOjibm/n01FfAlqvZxRlIcp7+yiwMRQEOuzfe9Cyw6PhCQ0Sdz7p1Dq0jx4J3pGDAa+5V1LRVZE2H1W2eAVMxEVqkgMp4E3aBeJfck9+MXibSQUL548C1JG35L3/t5omnEAHiBYlTzgoj832iD/TQiD1wxn/8FrHSvH0H6eH/di0mtPct9B7tW4+JCsjhKCdWDVGIvGUHB7Z+kmBLFa4ugnyCF2v4x6E+XTDJJgAoA8NxeCVBk0eFG8qk3fRDwAqBjfAj62DtyItTQqsMv5LfWszENGWTbTYs17rTLKW2UN40f1ZkE+oc1an5oF9UeGcOPODdPNeQV6QCs9BF66xbNJPkOycSspLFNbQ9lit+AYW162Mb4jfWwFM/eW9UYJymJo75un5v/mR2+uqVT0loIm6y1BCZ5NeiTjcRUFulfA1SxBAVjEEDVY6xpvwkK2rv0CYlVaEwicz4BaZ6RDOdKXlwOIj4618zIkmmsggzwIi4npmsKoEt3U4bqkUgzLkNcSX3mtPTJ+cWRDCBhSk6BNyaJgqx0np8DwlUxZrTM9g22Jietm3SOgUr1Zab2xsz1qZRMpNi/YnvNhEhG0UwlUMoAU4XMX1qQeWlahn4uIOMIS64CJi5D6BmOjE8zaUg1juvaiQa5xI8ibMkQyUOlEofizNhk8l9GSsjMvnYDBHlRMz7wcU/tGfaonUxLIazgZvxS9nghi0l18EcZM1HyC2lXOQ5+Luml4D+RoMwFuC8/8jOBIMklrPlVAp50qHIFEnGok3A2EBjDBKh8Ult+KQB1FdNMkdN0yh4ooFIHCpHLjWRAq5ycKu0RYue5CElieUunwfMAZlfRKWHgz1fh6O68t84yk48h94694xAG+1JcR4LLbCJBCbwy9Cf6DwdlkyI1ocuiAOoWk554BPrHyMSSug78ohZj87Ym5e16sg00GpuAk1lbAxvgyBG1jhqK0wkvYFUXnohB7xCYFOQS12GKYpQuTzZfSCRXkdA1IHqzi2DAFKggvVOIqhYgSz2QwNSVlgpmtMN8oU8OOuR9w87i/6cnhkDx0pPi3Sxf+EmAGGFjjGLFqjGUkS0MhvsoGP8q9XmQZkgYgvfleYS44SRaMgjsznxfaEidtBHSKsjMheDvYgwG/hZ+p6dtfuXl70V4bNRholNloSJJDFLk/KZzFQPDA3CX5XQNATfZCHPiR9NJrG2VLrWmb/BJcIQWYrYwpVqiEcztmH1/N44GxDnCwJHkOBQ5lHYFgCCTnayZsAb4Wh8+oHFHDbVjTXdGUyspAl1aiF/q//HaMSXmFpldQSjAuE4QtEdwtr9hQzp1qFIDDy2DfIZM+JghoI42+nTo0Faz5KMTK9qsT4WFk7nPgu1xsICHieTtkGRCoUOPQumIKD7KN+WneFhtEgGJam/gD3dG5N8AprNWlhihBtvU0UCrUSyKhFcyoMFnsOQjaStTIW6dP2TgchHOmqEQlnllI2gceZxPBV8ZiCDDsqvHBi13FCSn1gQBLAK/REQCYQW17ORJV/UCK15MYhQBJ9YxRKJn+EZcjqqCm996mqydM6XH8gvoRqiWyDvXtUIpowqYFAQZhpGu2bPIxOMSDQlJGPnLeXgcYJPAI9n2AaDmU9vFEwlKCmbv+Qk7A3TPSxJCznyPxFRZxhQkZtnSVFKFYX+4ZDpFRLJqUVYohFeo4jOHimQnZyrBAp6U8DHPM15ay0TfghAsmRPLOqFv1RGC/CWKVltIHCzPS4zeq0BEAX8yEKuMjgjq+2dCI0wP0MbMS65QARnCqNUnPvAKFySp1KUpXxBZC332+HMor4WFCDDTRILMJVArAm2IqrM/yFCTAQimCLh7ml+vmeRyYLmqtOzak2GUgaHTN+TK2t70NEpypnRU4nRlPLMYAYNSiBOzMko3/pTcZAMG9I0BYxUJx4gsRHa+gfDkMc9sGO1oRHhiS3BURh6rAr9OwY0GVv2FPMwDWi8/UgB6lJ56l1ssldEyzW9HMl8UiiVV9WoCmOadJlqjjXCJaXC0Cvwu4lMlF1Qe6Yzbbmd39QnGtNViq0TzdPLDN4mMFlaD2lcQRloqzfcIn9IoLZyBEihFlWDg44HHKbcrJVNKvKVU0YFuhcKRJ5KnAO02yJSxGaLRmqDaiEYciiOEKml8AdhwIuoCkxKHACDF3kSuBuPQIQwPAulDQ+NF6Z/RBiie02UKoI0aKzfMbu+Us5mmM4mXjN9hsSIvuGxWeo59pe3piU0ZltWDHzYWkEWRQBX7FrNWgYFv1m9pthwCyL0vP6CD6D5PxwpJIN9lkDBIBhTokToEYcUi6y0FFlIgfamhKfkNEKhytQKPP9HiKJNTiXKmvAQBSIlgblvcuJ9TTLLmYUpriiMWxwEirGfVoD4dANjQFaIE8aXqyQ3qTQfN7eDICwNJQIhQgTFWDV+Vv1N+AKKJhMOJE9BjxYNwxSgJZGHAjrK3mojDrK7b3LOk5UhwFzYqAUeasYJV5ITdwLz2BPHEImkZUgMRm2zkC+S4X8xRTrWPsxGP7L8rI60Jdcq5ATpP7uoCtsDeZ18NX2jrylKXiWoNr3FVSO0gUXlI3PdzmT/oGhcZ8dFcD0DOvfnyyZkCV4W2udsRXFJ1jrZmBDmIzIJCRrFLE4Ptn74GTeHplqvn7mJPoTPyV/rZ91GiPNHCkO0mms0wHPY1sQ0vDyLzuoKcDcRzeqHlqXLvAnRi8JWzFLRjha04cMmIUSNoew5udcIjADf7qJIf7hi3Nbm85lGSYSf8TfkrQDimpCgwgpi5RugGZmKUJGCHDLZosAgTkgSOoCgHcQN7EKyMMG0SKYWQ4hKEEZSNcfwxyQLK8hJCFokLQ/EbPlKR6U04ZzgWKVa3mc+TW2RhYgEzAVRkQUy4/NE5UqLtOlxiSk0QTpWvZvpdWIHXQ+1ZjV44GBLIovitHINlLpSmmxsKYHEGO6UM8lDYuAg3qhkvCyalwyBTqEAlKgtLlW1JRo7HOpVqr3Dyk2ji2sKh4CeyfIuT4bMKnlI5qpRjiOfaFImSlegxN8bpSaf9Tmk7CVC4uqLYI+wQXTAXV7MfdTeN3L2Dvcn2kvPWr5oAxSRspIXAUFAXY8jxxGWIhVg0gcQNKfvFEtopCJyQSCyNqTYzDBExrOFB2t2yFHlQiQX7VQTuYjPmjNLMzEyhCFqHwdWeXPJwwQISUaAMwyXWebUA776rw1UtI4DiaYvpPwH1ESrtJCDfUzqWQR7WaN5iefVnm5LYn6Rdo5LFW+DM2RI+xG/pET04ax88ti1puPvaE0aWSaQnJVqZwGhQtFt2ciCJsCaauva8HetuhPbTHNCVBYK4aRQRlOV0UBW9nryVWpo3ArNQBUDzb9lxKXGGQXRzcnmuUQMfYaUaUEcOpEZJo9AnwixaxjCHCa6i6C0C6rWm0KgMF0DrsH3LtGz1EXukNzaiEhN6WzshsgNi267c2YFl/I1LjM8xLkWfhBTqBoLh4CaGyGqt1/0rVYaLVqGhpWBo7RYS1cIFPyVbVdAtUrgqR755ZSKnGSPanhvDWjMNDsvCKohsy3U6knHLOTIYs8s3FquHEe/MZbX4IO6CqQiZcRM7VtSxOps26OPeUxfUsCszdDSDyOczMvagoiMFMS2K67a0tYgyCCjSz3JPgtN3BFZTvgo4qFXUpZh0KeSoDibYkzLH6hIaeEvP1jzFpnibGJS1SO1zAjnK0eU1nRrqU28XOqPOdO7Xflm2HUbRmVIGtqojITJZICqUpqhmFlJWwqSkCkihuJCtZBUrIPu4qjx7R6yz5aOOmJn0FvCvp1CRr7AhBBcj1dIfY8KOQO9pRjTGHCWHBkSG6TBECgsiR4Uk4xWZwaeQqkYTOKbCjrcVgIioFJBuhQMmO/ZZISiqEM8LWkaONuURt9Or2dvh2f5KqSkRm2EvQxSEcIYnnHENFZUcm1wAMbwEuqUgUjr0RMjaQZdruf94QwtvcvAX3weRdD21ihEqrOwjLnamlB14i7i0FCUR3dn3+vwIzy7WC029gIQIUI7DxsFIby73ZKUYUB4lCsBorWRR1AkNMjizAyCtbZQplXlp+Uob8fnoQL44gAynxP4GiEymooJiTxjbMCRTJvEZOkjMUrUcZPvJ/Ghvu7DEoraTaYKo6GDSzqnKix5jxy5LylIsJCfdcWFOJZbH6TKxHuzVekTpA9ND3eaGPyDbcIBlxg5j4K0IENcED8fnUiKTJvF6xzMfb/lNmdypa3IVMmgAVHjpHPRKO1d+5b561NeLuFK0p3nh7aoQ+87YcAw/Q38VldRPyNOZULJY8J/VDCbadJMeIqFgYS9MByU0TCcw58Ap7AmyAbyuy9oS5zLT0OpZdBQMjQSQyvcrVpSUwq1sqjmRmckFNGPxZ34qe1bUJuFPTw6Ics2W9/4XnqogKiGlnP2X/Anxo7o6/hCqVU5t8YRPY3XxxJjUSFHNQh5BFJjSUF6bMgZQsRRxAHlZ1hh1mM05DbznWD8zOZtTsWCYxSQ0FsZ/2LICiVKm+g2Q/41MFDuoyQjKCNXdu2pXTchizJJFvbGSYUMRQhuJ1QjRXAxwyXwM+aVnmXMEZ3esyfg5MjBLTben7/tG0pFBA4m9M33v3zzK7MxVjo5mN85FPERabA6KvF1re+xZX4oLlUMg3yBIy44oUd6t+Ep0aU4UI/sOOuNk6q19zLE25Hq8FNKRSVv1ka4xc9IR13bTd6bm/V9hnSZYhFRqmDb3rRK1KvVetvIG6iURcfWkfYTIQQ+k8lhbI210tFr48m+Nzx2zBe4RIP1LxiHcCESPGPW4hQ0rJaWqurCgbhUKvzDFyPqBJLwexFdqmlthMAg5VsPOyWyuyWJOGJaPvobVYmuMlqwlZK9jlRwr3VfffOWXklwSOGy2xHjeoEInuV98xTiPU2WZ00oXUJV5In3Yb0Pbw8x3oFGFkRPDvJpLDKY45fYDLYz4W6dfqtS3ojwW+nnUpBVS5Aushfpo5e4FT30eMuloABiEmENlDFfLA6U/g8iJObOsG+jJpUklNud6QyXoASLSrkHCTHLBE9S+JFALHraiJAn9Yq+UpVoChyrb+ENVFJfdDDbRZ1wpzyX48OX7B3wwWEgYWrZnpkXwhguFhWqcKa0983BjTNuKdXDXO/S9qFWtqyKm2459EJ94VbEs3S+VSkxVcgIVn33pMRbmb9M1U4uTG50DAFBRBDwjuSAoRWqdlpmchfBKp/lokZZdvXkBn1IPzJ5RRbog1ElcxEZQMkHkrP5waeBnVnnXPysc5E1AVHN3iIKX2Xi0zfiBGWhWRLb1GxaZBWoxRxtpLkQCqdjmln1xDIwj62ODMgSISE88sUgb41DCIGxzEHuCoHUDJ8y3Z402wGqO+mQAZmSW85EgPF5kJUzLheEbGPB28DKgJKgSHahbT5CgxAQgilHHvevIPRN+RkLBbhwPN1QdQsqjMMUIfJ1aU+5GXYoLqmQVidGBRSgR38Ybtjmw5qhqG4sKzEQQ8Ijygy4AjyIhiyuxaiqAGUQWiUmPsVHFms4VicSY1ZJc7H2tYc0HoolzsQM2GsRos50OhcVWIay9g4NRO6aK9JECIpVqs/YlPl3NZtUsCTKcwlsZE7/By3SnBkWFPWU5JBn03UJ4AQgKtRREedmbOueNkYyQojmLACaWAEt4etTiVkzlVX16f7W8KS6Zc8k5Q9TYK9qw0byVIvWVKp2LxhtYLgVULYAHkNCFGFJOhhlh0hElcVMXi3yFuxjtjJyVAftYJozaMxjG3U3qGiaSqyCfV5HibAwpfINK0qzpQISSDEnGVPM3BW6wJfeEAcFNUegkBIbLK4hosoLM9LPxruwLIIKvUhyzBmYNVCRebBLgnODvhFCnlC0EphiFsoKtrLhWrP50bCqwZ/c5NrqAQ1U8rqeaNhNxTkRCeI8Ppw8m4x8JuR2NZ9eXES46Uh8dQ18lP6pIvHCroqsDqPWXKyodDYzw7MXocA34hOLo34xLRZrPMeOxup6ZWZZmXzwOClZg7Y4JtN8ZUXiDiU/pdai2igsb8tckXcr14hANNEKlSzFiRVCGqzFY6W1GFeUawupCnM4RzGj+SF9HFAGLLLl08Vk7NiCzQedxw+t/bJ/9uri3NqXDIhk9TjuVUAWP1/W0eyVwt3+dQyI0VrLBHp2HR/4DqnlG4gakiNkhj+j91oHOrKALGY6uGWVWGBq2MZINiaFPTGPeS7jlo3Igb5aBzM1PHl4bK7VT18W5Ei1nsgwh3AlZdax4a6M+lxLIlCkxkU1VPmY8o21BPnsLbYD38pdV1AOu2vbe779umOlqznisDGxQcQ+/Q6u2+rQ1dJHX7MHAIX0hoANhXwU1dGilljVF1W1DMOjyeGJr9MfHzvPxilpkQiDhLerfTpB/kuGK3YJaTRVilx887qSH/IJ6ZkAiyUsFDveO/7k5z/3WXljHl9/+WdbxCOq7KBoilxUNIVzqQU3FPdH9UWIPGzJqyxeHjL1/Gy+n2aNlLWJJydj9opN6fVOT09p90yrOrLOBnSTz2HW0RpEUjhtVteney1ksxQKLaLcQQq/0v3GpfrRNXr59Mkj32eyssgAbg7HxC8Gxs77zALW7twCKJIcMY2/Bafi7eoGQFr0k6/RWzBy4ovBvneH6s8/fP/b778zo8h3xdplU/jSSKnQg6iVIIbcamz/0jiD7KfctT7OoqfuZGw5iI+oZtDG/g+rcBgG3fpO1+jepXXRGfnxfSr+QRAwiizQTP9QPz41wWY6/Y303BOg8TNrSTIcOpg6WHV5Y5OdL0n6Bm2+3i5m2KyZdf1KV8cQ67xBOoSslZJ3yKvhLqGeSNYSa/1fPMFYAdt4Mh454CI+OWtWeR+0gC4go/2FuxpUGJwR2a2H8cKehmkMGfbYC0MWWgQoOqLsom03UJeDEzh5+NgyFNXm/uRhxQUlcgxVVaWJ3MRVkzot1IMmCwk4aADVyQlf0+klK8Ih+iZtz7d8bsc+q5xPHwPR0mx/1CMBgnaEKEEjUw9xOlmzYaVKiMspXl1dHB6D/0Cn/uHTxy+/+Q4hK8CJ3WFyhWaxXHhEqeMXARbYVELP46JiWGOV8DnrF2mFD21ezzZXVyJoS4fNaF/Pb5yoYDFdloJlMV3unNeTEZgYrJiv9LrBFqzTRiNwo0KeAddjtlP2ph8EQVVjR7MPuyd7E19/NtljuC+4xiyoa1uPnzH78dNinqhoIzqgoT1bzGAMomfPnzs+bHJ4gKWZTym49MECG/qXOLgnZmABrWrVADDAuUcmnWz7e3qWMIkNZrN1Lz03rdn84puKxP7o8OT502e+sogcSEAkrSrNBGBRMuGlajJeDVW031av8m1LnjMyiZIqUMNSIcbDkyOnaqLCQ1Zzf29hTwLTkAB5C2LqKRFIRZUAHbgzEhdW1n1eWOCji0uMrfxMIMbpiJes4qFwUbqQZJsynEBl6QLBLEkwZMI96DkudZxuNvMZ/2AWQKipIUW9on64y+I/ffb8g/c/AjwkYxBufEe3ak8DwZINwK4MV0Ddr9ZqQwMDw0O6l0WEPLFo5PjEJ3fZNJ+4sqfZSFFlIb2NOa3gDvZQM08CVcQF/rIb/0JGBsBBhGjoSZZLmdTKopQUIaoJzKSSMaDmIZqwCnoEujFpzNVGIFEAcb41YqXLTDCpK92QPZ6Ww+rq91oEb9whxwOvLODKTpusWRP4IGh5hbg1TQX5IMwol/9rELe+bgyDCNtGBd0O6yxrXBFYjG62VxjVKdpRB+5Jvymo7FLCPFqNDPShVMY0TghR51uQZ1GU2GGk1vGg1rHCNSjAu84mVyg0aPUVgWiXECi+CBUSMpkmYtoXNz0fHeUFoysJaxOEFUbZA7dYnV9cwqx1XVeLyEJ6eKjAMhcJ0kSVLGLUfXuiFqkExvuojOluW2hsqDImEaM4vdZ3MeqWbOFe1VNGUeYGuqv7ljSQnDGfKyfS4p97j+haIy43GQWVcphO5lQaJKyCWnbxV+wa0hJ4nWcT5zcG/abd1ehydDixIjYDqNUrzSnJem7UPZ0Uhpa0x0Rb++Qz2KYJM+gIMijn/7I6jeQN+OLuFnr3+IzewccSspzIfmv2FhUEfYeJgWl31CpUrbTDOoQonAoZ4hlbm4QJDlAiTW4CMWD3snjXXKtfIOMv0lygKdUImDGXGggXQ8cs2beIUgHGkPiLIHNMh61/6fPToDAbawwLEg3agXSixJhUo69qQcWoRNrwLhYBrNEGTRUad5fgU9SI1BH3ZU/0he2jnrMH1nafc4mCoazYxytSQFXVdC+xKZ6XsECsoZQaOUutyEgKSAI35sZCLI+iswbFQoeYOlF13RKHcCf0YOS0BM8of+ass9J6knhcW2q2xhDd7ARpTVBHFNlCQoh8O4OJ4MTYA2inheKW3JCIvBYKfkrCelC2aEqbwimzEuZB+aTYa4YqZp0TS3czApxp3Wg1mkRz0yeOUfDQcTo2pYKDsPLaKEodYq5ER0Muc8RZivmsrQDIuAbRdEEaPUlqKQNCp1FuM8G1RQd2vOBfRi1GzsRmsplwY3Irw19FP/bP15ePxiacmFQisAK5K9Ac0McmhSeRhJj2uotehABF7qJBkdxd+FD3cAIIUqCN+ywHC+nj14oo6XiR+tCuRCruHSohb1Kqou+SlrtZmAkBsb3hG/16mBQphbGmYIJsCtg+mG1tbqvLr1FETsThxKauYFENwDDipjbSzlpnZY5QpZwR/kM4Ts7P4IEvqTWjGHlXANXVRQrPmyYq4nd+JuyJjmSGIFyI/wafl576l7FuOuJpZEQTeRfKFlXTOIaVtVCbDMENXeJwjGXYx5tPyQMGRbkMqamqEtUKTPCoSqm2YrwoHIUXa9tik32dhnoYEavFZ1aor6Y3Vj/qRzj2VMMEzRJhZ54TEGFIQOZQcDDB0o4K0LufAKpJKSiXsQwh6hE0JXT01v+YXdDQNFY6XsoSYcN+Cb1JUlFBvuCNHv7fMiNSphaY63QcnOj5HTMHUGrICyjV1KjfiqatcEap6GrardrTNVw4xFPoYznFeJMVXZ0r66AnCcb1FZgYfCb5ewfDw6MjQZlyEUY8S7iwzLqmO+F0UwZHnnApf/zDWgiVLBQc4UORJ5cShuhlxndt1Vh0HalpxmdUY5oKRWyaLKRqnOQ28i+aofMboWMF9o4enOAYeUYCzZmwzclhtDdjqGmExDCQaRJEqmJHyhgwhznN8vrq3GIE9nRiuRlakZrsoaDU2Se9Nzo5PPbhjaPJQXbcV1gcA8ReLhYXjkq+nm7npkiy+pEh7e5SI8Tu1/Yv8xkmVFPwRKQCE4iGsaihlVWDxHChY6TGBHTmqnlytI/K5F/hEIaUMRsaCBnZMca+rmw7GJFehr0IWCKZGCshIfzScoSB/EepjTteT6fn11cX09moN+lUd7SGDIXiAwtkxAi6CS+eOefzialxRjirY8h3RGHtm0q+IHL6+g1rXSmin4Fp9ItfCk22b9rrJg4lkoEkOioVovkjEXYeK+G61hjdGAE9GvZNTfjZpB0+/rlnWBQTKaa7jkWi5vSwSgEZSCtTh8N5t1blEGB2MU48MCEGm+ZABItVMPxyPm3DAiyhkNHIqVFgEwp8rOxIYCjk/fffe/r4MWwpj1AvDMp4U6Ieg2xClfQp79CFquRnNRfU2s39qwg/JjCSXpwJVNDJMoSgrZtcgzRQzGhFbAqcRTW1qyey00xF0RsFYFuJX8wEfEKmWuCvt5cNShVC8QusJspXs6CPWtlSa+mS7ZKLGwOCw+5YH255u9jrH40i96OMjPR6es360TwikcTdchOmquylLJryQNWCt3HuRQoyh0KaiiyEHK3ZIOtf7BIcRLuxl3mmplBKSsZcy9/0Ro3b4gE3TTOCAnGrfxGkmE4BA+Va6zWM921nSW8C4eiIf4rxdhEQUwp2PRuF0K0JbCAQoYlGIkvhvzmoPQfuEASdUmNdN8fWb9l8eHgk15PHT1XL0DjiFZ88SV9jZpVrTJOe1YcffvzsyfN0kRomYa1cQWdLAlRvGAaE3BXWIUjLSFfdsBIMPtGAs8NI6TrPY5IQMjJGYvLIlTA38nkc1lawUAuxU2H9hDl4dPt1eyMapCkVGzCJwkXsIoJWG9Pb9Bcp1Lg/zrSvDL3OomP2bWq198HRof7DyaPHdmGqnBe4Wl5ZB08o0haISwYQSHDFQtC+9DHxB3ZESkO4V4wNunmaP8X00gBkghCTGHrlkr/MP2LUmHmHqQ7utEGTiFISlDKJIyNl6qIf+aM3aVdqRjYytAUNLXthsNcYpG67oy8MZQKB2OpKQLZayyi6FhDOP8X0tLgFN+pn9n1T0bE5e/uX6Phy+pIWPDp6aFuQ/VFGnAAeD6h1NwnYQVmGidOO8fbGWAQmFeI/eUnbJQdFmWBF1VKWoqRycXh/5dst4llxP4MGbG+pQO5Bae6IRlkDGQgkTYNeQg0/1WcvT35mNYQT2ym/tULZahk1YnEYiFIulSErwhmRqgqE7hn41USdnLM21tb96kujbGKjhyePfvGLXzjlUHABAdSOvYpCEzQkwQzEEMRayhuSxwJ5mdpgkLvQpCSB3IeGYPVEXQpX6BaXQyoqDtdIvZUpppKAqSmniqFJ1ugTrWhHygMkWeRBJ5TxXgOBLpsc0hcoVU35jKNZQBfv5q1U1ceOMILm35izqoTFlHnD5q9MQ718rWxmoPf3DydHMiuoKjABLhh5XVXFWzdCuobMhQOsit8N97R6P6niTlHkRxf5VBguoJIaGq+iI15B3J7INKuJvNUwOyeTuyhNxu807ZpXlRgzn0D7wx//+O3XX1sO4LDO5GnMwZdQUt64YtS1fX5h56Maio76uhheE3BRbpxlKXkKo4zqViRd8VLUMIhrK9vHjmUWg05rCOMj4PVCmTRV1zIasesQIU6eby8lGoG8frdmXNUQuHiMIqhJG+2GU+oLAYoc2fXq1Pk4iIw9Hxxgl4Ik/Pzy4vWpE2tzqszwIHGdqkq9yhL7vUsAiT+MxKK1rlTgVTs1A2x0plJcxv5+4ASbN8USFErZUIZGOEDA0elldwIpFlbSgBqKZcFaKjnKu/YzT9JcOuDqxvb7BUMDMVTmeFvVKs6NPFp1Z+CH8huzEzs/fPzo5NFDVDBTZDyItYa/bNy+/CmZ9ZNRPRTULnumj0F75NEXcI1WdxeRdp2YsCTnx5tLEgjoRKlZBrW5Rn2rhtBJPmO2cspk5BxbfFokHW8uXPToOKtYeNmKtoVfQ/4OEzWgE+CK3EFPtJH1oC1fKaFAkTXkneTJxcMSNNJgUmRvcnBw8uDJ8+eHvqR3cggZR4GIERhP7EAIkm16XORrXFc5yMNcgyAsfGrNPUY44nI8dCq+AWDxoJPqGBGbKQl85KISCpYA+xlVCJCBSVnWyFhWWBr+aMMoRehuOiTWTA31El5b/jfVLXkOqkWbiEPQi1KE+XnqinskCRXkVmkkI7k8b9a0o+/w6MkTh5gfPDjm3g2Qg9aZQPtHR6JaE4rGo40RL6y4NH/bpKARmJeGErlVYRkdrAZ/rSvOhy1ufC8mC4vyoUb2MWJQYUKDDYoBgzg0PxGw62iVEt6MoKWqKuAeuUIzKOW/d6mhvv2dgCYin0Gm6EWo2TIoG1gzyRMqpEKpXuaHsZDBQFDw5NlTQ4zOVsjaPGNlqa8Pfw5fLnJKh6WUbUQsG5EaoiV5WHM3lg449Hg/9pk5Fkv1TQLHmoCH3tEFzeFXLOOOe0hY3IVhtgglmcxs7FJ7cIo1qjCrCNEgaAKxu29yggzmcsqWEL8igWtaCoShu8pottsQtMwSEpBELkwk42gwvtqVN2XR5aO7Qt3pFa9/fTg+cC9ZsZcD6ln8qniHSX6ycAjFtImRLIQwB8s+HW863758c3MlqmQ1ou9NXuAFjJLcCggJZhFl8N133yG89fKBOR8SoiZxxUU1zUWAgwCVbqFOrEUGsyUylVeFdn7v1CFVVSxFFjIapKxHYWOIrBqDo1HNGp5qNhKYpEATWCI6cLCYf0e+HMekZWQzDhX4oWCz7QLHvu/RptYyk06wtKLTeOO++QyB1mRy/Nmf/owzcJH02V1TSa3iATRgw6xcewNfJ5PqZJC2wiihL35qLe0WtgonhRn1MEFbMbyokHOHi89g1VJ7laxhbVLuKglv/RARHoxHz9978ey9FycPHhyeHOs2wF/d9sB+8fmfBQuUAi0UDCShKdqj5zaiaVzx1qsSumZ1Vwfjo2fvPe+PJ3T74PDYR1VMh8d71ISwT0TowoYn4CltiNmOgGZ9ODfZZZC0Zz0CzkS5rGXPwRKkINiKSMAYeWjlY1cqlWNWUAIiKtA/IlomJdhjlD8NaBDTOvcIIr8BVkeCIac1WOrWIpUyLsIu4goHIWdCIYwqa4UUcVdF08hEE8yaW0dWI0wEzTiqhRv0gguhLEAsCoYZ7sPkBIfKgjY2AQlKyrOrrPAERUUm2uZfmwRuMS0kcrmLkdqPqtptQmW1F99abWmVec1SiGRKnpqGcIPnEYuaQcDgjCmV8/NEC3rK1PPi7VtGoZlZXAEMQ5j2pUb8uiK3d1nGWzbYjgXhATYZfBmOJo4h1GiLOGRv9eQgCgMNkYdtWNSqzNRoaLmLZNGCRalWthcoJQQuZGSTlFQEa6S83aX2k1LoCRWmsYhAoV2ySWoR3aZQKUtqqAw10EfyfVw9hweSBZmVUtxNgyPglo9Us5tqC+Vz1lxQqggio3uO+re0sVoMRSqmQguZoKYSQzSlRkTSM6ROvNF7+vSpeklgKivH7qYkedu8n1KA2F3v3+RdJRnuJ4win1sDs8vjb4hRyX3oExEIqhGKmM/6DmktFPdcHiCBu+XJ/S4Jx6qad0DWAFocihuvFMUdTcBWtelmJDJOs962ajxuN4PDg2Oj9FclnBgjB4C0mu9bAlqpXRJWbW8pfjQ2KEVK/Al5xXZthiWvGg7lImLHW7fPTSvlxgSCx36SHT4e9IaEKcKlVUfXiRotZCBTjf9lmavaqkGNqZ9ElHSoLXgZTKwTBgTJJBf+as60WREXtm40WTYLyFpGYn9iIRJU6GkIWnZUzt+ClbwlpYkqpOG0XenuJq/igKVQUGr1pDlxTQlXKizWVbZcZHOFeauuyjkWbTG78sWP7K1URBOemzGQLaykIlWqZZbh/s96mJVBjb1+NjVNJWkj5jCLKuLjG7RbvNSiqt4HH30obqHnxjBjEmoyugGaAkVv1y2mXtxPGUbbJu1JEYL4tK3Uobayjee7jO/+xnEGzwRskQiz5yZWK4FMgpXU+qftiasifGYLQwJh2mraHoXypKVM0mfMb8u29lDb29cVcIm5EChyo+NtTSD5kSOKWixq1UVadqkVLqwDh9TQvrtPDX7s1JVXJH9bzpREpM5SQ7nUGvzmGR1CJva2GbbSCEuCss9Wha0J17RS+Af9Sg2ulsE9dyNpQorklSFsSLUieUgYWrC0qzcFiayngDECSS+aFCiJ3u63VL9XQHYPq3lXElvKWb/luvPPBU74l0rC5i17i8cmEKrbWqLVoFS28AoOCZwvLslIKtwGZgQsHGrwtJx+umFrPKzBAH/TuyJJfpLrslqho5ScO/MhG2EtWjT8a2jCb937NghhYkQeVhVFkEbhlhtAhelWlhrQnrjJvZeowTImoqF94XNsQlLGF4FVElByDfISIkrrGy4Si5tSOQQhs5KhmfCpkjlYs2+mK7d1FuY6kcm/S5mjlard4rZW/AhgNGVrGlrBusrbKicC6mlJ/QOeMrO6tGt1a+BHNTVikxX0vG/GC0KFIkR6ym4IaqJO9I5x1QcHJ5tyY4VqhtB0Z/LFBaMN1mNnFEPD/o9vb63iTwYDa5UdTw5tsAGdUdCHMNESfe6JI53mPCE5aMEpsfzCUu2qBjhKaSgk3yX9MQM+wAZt09fjybFTN0LXbBFLwCovGKgae9AYEpBgKQua8e0I4V4+ph0SRWIZwMN/hgplifIcHWRupEmOiupUAlV1EkgtG8MLl0thQ4XSapWoOWJ6LyEECJ3VK0/grf0OippxZiBmw7lzTc1SgKB4HDsdHjReRg7LfNAaQIbDBSFeoUoFeAWFS8aqEKDKQTMUIVnF17WetQnS6nAU5sG2BKDBDejAXXEEDNDZ80IkalBUD4XkIQ4O9YlOsu2hFJuH3uEyatMLgEgAbRQAoptUnpSDUaihJbr2PoNAThTxxHOSsLRgTisRvvjlet+qS4WtkibrSqWVUCGq7VWDuXKHerFmqSMSirvKe5rZKyWLTHGqhWairvvcK2UvVBulqspgfi9FZu1FKjsEENCKTeHv/8a2LRwFlrItFaqEeHPtSxWZsM+EiqZRWt1k/l2CGAQMTYSQ7qNjLakKzo0tDWw/yaURB6+25AfKNgYBXUlAGXQ10u2sGEZyjbUacZFGoEL7mSyhrJ/+RpAacXcobTUNx2OLtWQVO/Jm4UwMEOOVbWwhQyQmxN+lBq524xftac4hOaGCq9xMTfB0HFWGf6K22mYtsAnKEf0SQM17nipd6iY5d1GDGMSbQFX62IBQK4RSQ4lR6qn41SckVB6eyd2Qd1XmXWrCUA/bq7RbPzPbHwn1Mxgq0pqERgKbqiJtVn7G0w0SS97Uw1Cv3Jyv+vhOw7XinqcSg8qS7V8xHPnZKgvU9zBHErNDqq81StVlCBVkzrb5vAgXk8LRIpbS2BFmM4vZ7JbRV1sV4p+RUMZk37ZXjbYneSqFq3JK7XdsC/DKXLpXOf03dRx2QYSS6MZmWUTKNqFVlhWoqC4kqkpbmLw22uhfiZtS4QdLGTkt42om05NqGgxl2xqwjSJaLLKi150NQmuNUqKWWnMBOGSLw2C4SoSdtl7nNaWwPg191pevqAkht2QAa8MjN1v8GyFkoHUe0VMfXMF8C5vgn+UA2X2RYd54mV0NSiWppiDDm5JfVjCzrCxCvqZc0tsQNowPlrSC02x2ItstAFvMwpl3D9V9H2GOz5Psb9u9EGlHPdUaTsS/+QHEfF4ByZs4hJ0RoTD8jgS5uZOqaqS9LXDDsWBcGYSoHJ2FpZoN4xqI5dtjiSQYlulWpcU/GhWfwdy+92JG7AKLKAQ00qX+cMKRGjra+ZbUTgQynRTljQmMMr6DjoghqxgsEpQeTDjaJIJUDayBS7U5qUYwVmODlNGqsFrhoIzcgAZnAV9iEux3wlRP7y7teXvfKIJpaJdKyJT/szg+6hpFKKOwY1LctIJgJeuozzQakwIuGKyy8sSrqrlc61bao60lFD68mk1dDZJY/xIq+SPhmF5DKSrg6co4pK0w1QgtIWKsHebNDASv2HqfngAzrdjKQr1I5SGEHwVotb21l/VsK3vu5XTd6mdEPpJGIeOfWNmslronnGI1wrB7puBW4SsiuN3MbFPzhEagRQEQPIHX5i/ApEh+ajH8j5LJRrRKwrwP2DFFZu4AXXTM6zIuFeugomDGxh7mE7BlToWK8eYGHTvZN7VFfkf4QjCttoa1vcMoillWUPGkbYnAZuoxzlURtNCjI69GyD0N90BetG2KBUrhEXdgGc5tb7xYzZ1fIX4n+dCkC/M6mkD9EaZwLXXSlDInuJKGvAn3+diEdAVIZVWqUdkxd2hldsOEpVUsOTbauNHaorfaFJZ1JCoqyHlKZWlOcyoNrdBWg5XeEUK+NO1fOBTipz137HDgKBGMOAB46GwMIFgMnINgohNRX9TknchQVsGt3r59i/njW/Ou6YV7UoNDAIn3ogOlz2QrHVyLE7k1wtAAox1aL28SMLiCGtIsoGqdWcRqsTEVjFK+J3E9vdRX6h9EUQyqqtKCkEzx1yyRGkIFDQs2FYjVhFsQTgpjK7EffoIhmBQFqkEh8lZFGYKW05XAQcxJ6jEJxbeqbHtRgzzw0gwitqeEHQCW5ElXl1N50LIYE/9PTSxYclB6MA7yAeP+vSf1vJxxJyZmeWHzuwOksq/eVAsyG8dar0/Ge8yxECQDxOJ1C2KZUf8HPf0Zndz0avLtnhDhfhualDwsQWjt1a9dtrwtVbeoBOG2dqWKhLIFdytQNW2pgFxMowXsej1ZvGRarZK+JvzVo4j8CBSqxdiXXQhksUepNn9QWuvJGfjqactak48zFdjmF7KLUHkHtESVoS6PlY7XWSBgLZivOgPUQKsZU5vhFuV4HKWvCQKZOiul9tw3/SdR0UkAxOZ5HlgzjiOF/9W1jKNyxrapGot38yZK43/gwiqIVeycBQyjHEpsqQGhNXNp6hY/2AjLn/jJWMMihyKNEEG64CpzmG9PRgkTs5ckhzbep1GdSupCMpxy2uk8ZEFIhGerm9vzi7OX333v877SlVWfosgXL16Yp/TR1vOzHJIb/4+lsaD64XoUd32K8DJmwJNEYHQmax8FJyQ43TVWIHOtTpjMaSCGBhx2l1kXmSLC1fWOGYonhpXo4OjkIcNCKkeGNkxSW4GQmVXBF3oFf0k/1Zk6mXKo9fHRtGJP4MuYWGY7Q4AEfKyJNnHI8HMMP/4AhjH+9ps/z67Pc+rxaHBhz/l1BMHEh+2xRDKdcOOO86troujr3g64Drw7QYh0lSgWPKWTRL5wYqxCd7BGSghwUIsBi8fKRJhtpZRTxKcEonkshsh0X5FAtCZhuYlAzZGu1m4GoHZ9uWo0HsLitgYGg5Bh9kppumSMidn6YTZYPaEAdEPWRe2S89ny65eX33ytBeDRithBNKIPAkaWJ6dcapWENxuZLaHFBRe1RHRzeENS6giSZA17twISQEKCJG8V8SciKv4Jz2sqPKziKlK65VEtDmAIYz6cmEzyrdHMrx8e5iMa1SWXMfoQhat/FU5tjYVXdxbHvabB1hI4M8NI+8dxEFP7S2cJKDVUiPji4thxDFrBg7hq+4FJjkVUsKMg6qo1mJnHgAwJCGbVSfNDjdD0e9ta/c37kGAba8qmPcwucmW0ItXEyMRFujZ/2bKhvsWI5uP2jg4sgrBUnTqYv5YwSsXQQ9FoeSWlPJHiXKrB9hwEBViygQ2civsiE85m3rJFWWFAnBcZ4QaQQBb6KD+qyTN4+fI7QpgAvsadG4hpbNu8B+FAQ95NSa6fGLUlSkHTSBbQ05VgOur0ODYyYMdsoEikMUCLt7MoK50XEuGmmtsiYFUeKM0v56O0CQy0FUMryebJjsapLIO3IGO2QQsbw54oWsOucloSKliySDyDHYrWmDWo/aPOLWRSBhV7zEPW1N1My3Fu53Zak4Xe9tKoEPksCSyQ3Eqai9JuybTTDsCma3EnrlsxKI8XkUGtbSzASscjFC3AxGnQTYkiMH1FxJA4LcWk5L765oDf8iZ02N3jPQ6JZJBO1CwipdMaVBDa1LpV5WcrpV1SmRE3bTc4Qj/V+F9EvJN/xVoq0Yyh25qGQhybU12ipMCkYA0EbKU3P5PAFDEoJOARWtIdSevXV9ORaGZ/T75QVEqlhN7PGr1NKA2eTJSjtwP4qtpWMx6nVvKW3nz8RJr0hL1MVWLSxbK+BZExlHQZcvp5nFY0xk8bERhT+ANJn5bzRBVZVQEa1kxq+HuyIwWaxxD4GY1AgRrvS9ZkLiKiNxgqATd/2zBnHpZk1ytDzLa5GXOlFI0HWgzcFRfIgg3IkS4qOx1j5QfqoXVRASkrpV0PE0OHEIGq5D/fKa/+baQ/a/BZCQzartdQlEWIARErOvECCa7NEadznUF7nXP1bqFXvKEDBk4wXiJtR1tDLCs6E3qGP7XLWSnyIa9MGs6uwiJolpbqzyaGZyOz1j8udIczElh3E6NkqabvnsxnOQamhuRUaNxRdGbTo6Af0I0f4BNVIA+GYwpgNBoKBV+zX3pzuswQCaghSxLTkPjQAai22dRuw2xoz0eZlzfRQILgBUFILZU0VngG+dolGfqEBAkrSwVKvFFA9sqUDIxQ9A7/Y5BTkSIwadeIAbnGH0WiE3ibVQJsAkK0bx4IY9hpD9WmighV6lFHVRJ73lxujE7VEWkMYIDgnjKXYQVQeIPmab7A4K4aDHfAEITkrwQSZ/0N367S62AnwsLUHmkrw1wU5uVyDhqZS5VxHfVavcx92lNN9DB00oxdmrZ9t/rzU+AqcABq3H/0R+a8tXLFsth8O9FMnD2QAhxfXZ1fO0CkCAaddDYjz1rMpksL3B1xQFAIi0pFiUAgQhnbRISmAZEJcJlJyJcDkC2WIuY0+tjkXX0RcaqVNUFZ/VsHyHvk4Ra63NdtkSOVxkxI6qpWtzfJVzmTeecaMocWQ4hO4UQTnLoR8yFsGJcK/c8FZLhRADfnu40rsALNRlTFVVyLFS9rmgJneVbpc9WbyLp6xuTFY6y1zjODSrHrhtEyC1c2oRBQhElwhUgArkQiYhf030Tu/hkR4Va9D5RZehQgwkbsdVv6xvAQScJaDwvRWLsop2pLvLbGL2VixpJonMa4KJhoXNbAEJrW5mafe7nus8m2hcoRk1kWOpjEkdfwWJHC0hYdUCTMnrv0ClWSiJwJVO+OQ2FYiXpcQJxNjF3sHUiyV62sY8SimAvFfEXC/0ImAZulBOqlo1YOxBIX/+P7in6qaOU8Tw0/Sg2zHYqFfzCIOY9ZJnaAgn0tC8AMgMams6q3OfFW16mb3cBDxtPIR2hWLVY7Qdw/gaakaidNtW9GBLzCX9VKaMjPEN//jdm8anaSOEYoqgHIZgtUHoC54xpbsWdP031xtSMgmbQ53S0bEIqWkmMqLjbXmo5ALf9HiEI7ShTcAmYeARrw6FYhoyx5gkuwIhBpu6IBew29ykAlUctHIaMh5QoTVUoRZiqP0eBRcfsnHLU7uM7a0oROWs5xLF1HO0HWXpco+TRbwabKtpMohxAkgYGhlwCf8CkBSLyh545hp2nrvdFAd9rhUwaOlxFbmY1UZrB3S4IY5xKCaGIIKUdwruHNoIsQgT00wOoidov0Qh8/46xDnrY6M1olVFUXthE9Gpw8Ue51Th8rUd6RNaoEVssdjpxltGfgaIlyabqGNaO2MRZZHZLGuFzAqLRMADFB2sYM9fC1rTYVNs0BAW0ZPDw5tgRB1U67QRG1M6/8qAK8EUpE2BKmp29TtA/9NFhwZyQiCgoSfsEgn9UHkXWeLIudwwQ9Ah5STXWMs+ciGL0DXMZDMxhErUVojCVKo5YJCaCHQvGAaQGV/SE7RmVYBUfT5TseHlgcxyUa3Yv01PyNVVwInEGz7E7lOllf7i/FyWYu+Z630pQF1t4CcvD+s6fnJG04mB4e6Nzo7dqArR/KxTJUyiBeY1SYb+tdguuQ++6hPGmjmnEP5GSrsMxGC+KfHgp04/yJMolJlIftgKIbQgxkBROeEQIpCl+q6z7sNdFgT9j19PTtuZGlAzt7cryKOsL9RE2qcWCYIVXCiAIb5zHoQcXYo6DZ8BabqpOYasiIljhoNjXAWVOhTiN/8vhRDqzb30MV/W34n+3ti16MDzQqApGPDkdQoQ5ratRRaXQ43q8Mj61XbnY0IkIstFhouE7Ap9UsjnGm/H4GUfBKhWQBnPFp+eaLo9aiufLeCTASMByl/l4GHjFFzkxTMP4rksj8OuyN5lI6H+VzNLG+aMygw+mc9WtBxNzJZQ78zal11EfX2vG24nbjCyBBiNn13GrwYceO/NoooYyN+v7p3kLVigJXuAELdV0ZOUBezcwsh+GhQhmYoJnjpXEhM04BoewzFY/ChMl4FHclPmWORLmBQAyh8piqJMVlaJ28oEcGQ5DohQQRB7Hpf8Ztb/IdRpQDXluoycYkKLMU2tDC/q1tiBxKhnoSt+cfXiFB2jISlzVH4SoTglUGR3Pou0+DDoWQg94ip3g6cHK4dDhmhoOCEigU0R5QBmNGpSPAc4UGoAmYxRIyFIeNOcq4tHh1amCv6K0e8hxRr2E420S3qwqArYJmEkh2OC4SCFkjDvnNaEZDWqSndbJwOeUx5+QKnJrWYSx+oh8SpNtC4qyDQnF0iYLobmAzXlAbTmRzezmbDt6eW65VZ/oZmNWZzHdlbjSsDa2jbscpDaXVnti1UNVHFrATczTMlqCftsO9aINdalkyIsG/ZlyWr16++fLLL1+9epUdn3UINsK7wUWcjBakLxTsCIA6qIZ6WrIzmvXKqSqZKpApSTXCDsH16PKaV4u41eFOqA8koEtYHYtbh4dqIy6rtRFC6ODkRFHwCNJvN2/ytYTajsC6ZCygRrGDucRZYgqfwELFjZY5VFhqaJPf3GejQfUmy1Py3MpqmEIpAbLL63yoqA7fBn9CfVDFVlVsCjN0jMa7aa4HKD7hqsXu+uD4iPLbYYfzbVkol2aiBPnZD2yJKBt9r809YJtnxDeaEyvDOJZDqSfpFjG7UQcoCFjYrDr9ioxzQgwtKkFHfRBopAye6THWkdrBu4biwwipxA+mKpS4Q/N+Hifksv2xFmYBWkoY1O2afTN+FaLoFFfXrQQBFSBupUcEt2dTfsw4AgUH9eRIaVtNHzxyz4xnhHqWb0rzXO4FSSyiXh1/E+joKc7HwsZ6BCy6jaw7VyWPmtPRkofmtoMeco6PQTlep3S8PtCViewgQyKIq1PIhjXgY894i5FKFtRlCiVWL1PNLF88sMkveXI+QoL/rMFwMprnaE6SX758HcGrL5/G60eESiO4aRpA57PfTu2JspkA/xCdcyULvij7cPRoUj7bxJGahzNs23aEM79rWqWZza3EO/qJZME/oxaagzak/A6lmMGyPWIShHBCK8IzVC28dfSYFqEfYVYEMqiA5Lmvg4EhgyeFlSMNpkwdWyqPmyRThxiItF3TgemV+4iLx2F1jiO7UhV6xbMKGMKfLc/VgKw0sDXauloqkTyR8rakA2wZhk1UGB8JQjVn0AURK6KQzRMF1Bmhgx6rUHJxV10Jd/Sd+KuQURNUNefOhFhJGefS4yMShyJldzZfXJ5n0J5dCc8X8ZGK2+7rzEEYNuC8QiBjMvlZ0bSQLdTVRyxJcQ2VYtvATzlbZ7ZQbJcEqVv/oqcDOM1EVOofwQ5WJaFZGdTtIgGVNHqoFa+CUnVT5UFOEVlRAc2jFB42WiAJ+JGLqCEg5cNTFMtxdpqXjD6Ia52TYMNVyKlL48yei6vTs9ObWRrL/qJdDOMnnGhMtRoQA4tYMU9iFMOOkCROzvNUWMLFAgHC06BXyoWqbgJBjflZhL5n3FU3OdFLalaZSnaZy4FEn2KzEFeeVrncnGXL5tqkgTuWN78K8ixHbhqXAQEnIXtDEVY2uCUaqPB+L/tPE+8P5qKdXnd2NXt7cU4c5jWBGcIm9okhA5Zr6q4uoVYrKav1GuooJAuIxFE0mRSQTQVhyJpFYohsOtfBsEmmaM2cVHaD6shFxuNFoFBESOCcHmW6McQmX9wzJ6pwRizRMSeFxX/DgMzlVaKiEFxF7GgYXfjBkhZwvc73MB8DHqf2ZO7Kzj3fSKhdt8ksxsCEC5HzdM78JuavU3w0V0TAxvAeGcoHgTOMBW9oS7eK8HlS3qH9jKRUEZkjk+Fn/K1MEipwiyUBBJ5hjNjLSTJTcULKoghEIwRJMriqU4plrDyBRByB4cKPfCeF5Gf8BboOaebv1UrfDNM51oCJkWCuHpAgMcpmsr4ODxR+xeCRCPgLkxl8MBQ8uTb0IxQVNcHWw0CMJl6yzGSgSWQbbbon+fIhXsxSPER8G1z8ryNAEPCl3F3GV8FUVj9Kh+EBsz0QbkeIHNMwdqqKVoHKTyOjimR27F3iht5GT8g5DnoK+odOtjl54AgW08D2Kw+cEBhUVVHbQBpNCZIWSATSEIXs3KqPUtLufAYhhono7oxCwMr4fyKfNmBARgu7MBhKvLKOtfCfKTAqIX+xsFjMKERkCjecktJBzXD4eJT+Czji2VIekfIvFVYcmbzZ9YHziWiFk/qy5Dn8MA1MiOuTmweH+4l6uhunTGTtg1Ejp/aMx0cHkwASPHPAubMetKtUhlrLLWkIYKCQ2NoBy4f9Iu+pnlfWmQZhSp0TeYszriyIYKrGx9hGiIVXnosBDXtjlh4+wRSJ5KFScKsEEz8T2Ia68cce6xImzBCGZ/UVdvX2DXll9f+2b5IiJV/a9fXuYXdi54YaFIKkzt6Rg2bqOw9KGY5MOOIVhMlGTK2pSkC5izfxKoysznsYDJ1SCplB517Nvn6RZSXl1ULogOtvdRYVKFxdVRWMinUhw12K35Ec71mS3DhPXe+oQIHcR+6LHO4RLFfi6js2Zbn03wqwUBdwrawruoEnYGWYI0BAjAyfHCGCr12Igpj/hFxecbgeBJgdNWEoYEk9pYlI4j6eK4bGJY5J7vTO3f1F0rgnjds7KuSvAlBxR9sLxGRLZbXoQn4/XYJt7EpQbTXnpipqVyKgKhpdlpvthk5IGdMv5WVeS/jZGlKbZt0zaZTcSfBl1Uz4O7JBATgFLVRoAFQVuTQqxPIUD1wpkJzpwzF14k3IxMT+VEp5fuudLASJ0LSsPeTcwDCVlr65QZJ6W7eEZpdne1NUoKpeowI48Lwl960SCHnCfXqin8qes/xuG4BF/Kxfp8kTZ9TTZ9KR5UaRhUaFir/StAobeO3aRNq9FGEHrKkmZiwhIu++/n8BM287HyfWqUIAAAAASUVORK5CYII=",
+      "text/plain": [
+       "<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=87x244>"
+      ]
+     },
+     "execution_count": 26,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dataset[110][\"image\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "bb86837f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAD0AFcDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDm/FehWvh2yjjt2kLMQpJHWsLSG1a0k3Wds8hc+nBr0zxTb2V3aRT3Um1c5U4zx61PY6poFno3nC6gKoOSCMk/SuZXZo7Hn2ua79vtGtrzTvIvFA+YelcjKo5rV8R60uravLdRJti+6vGCRWKzk1rFXJsaVrOF3KJHCH+Dcdv5Vo22oFZ45I40DxcodgOTjHNc5EWLda7rwhpEV3cKZeR71UtEOEbsrT+LNaWbzQY42IxlExTLbXNSuSy3L+ZGzbiWGea19Z8K6hJfyNZxq8WeBmp7DwhqCxZljRSe2ayvc2cLHKzRl3dlBA9NxqnDIEm2+prrZ/D95FK6NH+VZR8NXrT5ChRnvWhDZVvLK2kMeWILDkg0VpT6JcqmW29cUUhaCXfiD7d4RSynceeuFBJ5wO1Y/h7w3deI7oLFlIFb55COPoKf4X0SHWb7NzKwiX7wHU17Fp32LTbWO1tLfbGowAOKl6Eo5Hxh4esdI8Hhba2XzIQP3nc+teWumUDDvXsfj66aXw7LGEOGryTTYTczCDIyQcZpxK6lSEESDPrXs/gjSRLp4mzjcMdK86i8OzSMp29SOte/eGtITTvDsKkcqmSampK+htTjbU47VfEEXh+Z/tMTMd+BisxvijaH5VsZGP1pnjFhdTyb/mCuQo9BXCPbhHzjinBKxNWXY6O78czTSM6WYGTwCelV18Z3DZzap+dYpi3DoaRYMZ4qjE0rnxPdznAiVRn1orNMOT0ooHcPB+rJpWvxSz827nZIPb1r6IsLeynhSVEQqwypHcV8z6jp8+malJayrhkP5+9dhpPjzVNK0pLNCrhU2qx6j/GqUeYzPTvHCWI0CeJmjUFDgAjJNeH6cP7OkL7FZ+xPan32rXOoy+ZczM7e54FVfN962jTQrs6a08QvG43ouK9H0XxyJ7b7K5jXK4FeJiXnrT47qSNso7KfUGk6UWUqkj07VtL3iSdP9IVm3bVPIrh55zbXoDWcg9NycCrFn4kngiVGkY4HepZ/EAucb0VvcrR7KwOdyB3W4beYtgqMxAHipZdUjlkjjKbTgKDV6OFSucUnASkYbqQx+XNFbxtkx0FFZ8pVzd+IfhpLmxGqwp++iX95gdRXlQfK17x4vuhB4WuJOoMXT8K8DViRnGM9qumybDyaTNNzRnmtRDt1LuqPNGaaYiUvjvU0E+4gVTY5FOt8q4Ip8wWuaUwztYHkc11Glv5tpGWPPSudiTz4iO9bulkQQrGxHHepdmK1jXEYA6cUVYQBl4orKxVx3ijWrfVvDS2dq3mSbVV1Jx0HUV5hNY3NsAZYiAe46V3SabcWTDzoSorI13iFVHc1kpWZuoXicoaSpZIyrYNR4rpTuYNWENFLSUxAa19EsYb1nEjEbegFY/etbw/NtvtvrUT2NKe51kPhuOW0kNvIRMBkAnAqnFaTW0gSQDf1OK6LTbtYnOaoXZvJtRElvbSGNR1AyDXNGbub1Ka5blm2kCoA5waKzpxNDcLNMrrnIAxRXQcg46rdXTH7TMT6ADP9aytRU3FzGvQZ71GsdzbSMsgPBPJH69afcTAukhwBuycVyXuzuinaxj6vB5VyfQiszNbevDf5Uo6Fe1YmOK66exz1FZiZopKUVoZC4qxpjGHUIjngnmol5qSMBZVb0NKSuhwdmd3a3QwMDnNb1hb6pdqGN5HDbE8KMZ/+tXH2Mm6JSDUXiV7i3hjljuJUBwAqtgdK4npI7WnKI3xdrQ+3m1tZDL5BKmUnqc+lFceGZiSeT6nmiuuOxxSjqe0abp8V7ZSCaMMx7kcjiuI1W0NpPJCQcIcDNd/p10scuxAArVjeJrNJ1aVfvfzrD2dkdkXdnCXNwk2nRox/eIT+VZZxVq7jZHKgVUEbVtT2MKu4hopdho8s46VoY2YBtpp/mZWnR25cgYrSi0h5oDjg/SndWKUWXNEkDxFCelavia1+0WFqmeS3H5VgaWr2l0Y3GCOoNad1evPdQxFsheg9K4ai947YP3dTm9R0iXTJFMhBVhwc4orsvFGnpLo0UuBvBHJFFbxloc7Wps2U4+0IrdCat6hEJUKntWTZzQeeqsw3A8VrPJvJPrWn2QTszkNQ0XfLuU4Unmqo0Be8uK6m7XEbVml651OzOynSjNXZkjQI+8p69hViLRbRSNxZvarRcik8wjkiq5zZUIIt21lZxsNsK10dg9nGjf6OuSPSuWjnAYDNXY7kBcA0uZsPZxsUvEkEceoJcxIFDDaxFZToqalAx+6w7VoavI8kQDN8gOax7qUmWAoeQaTOeorbHYaxG0+joqDJAWirOmTLcWao+CQBndRSRlyM5WyV5JwMV1MKsQKwNLRjehx90dfautiUFBxzXXBXOaTKl1DvhJAycVzUsrxuflGAe9dr5WVII4rlbyDZPIpHRjXPUp2Z3YapdWMr+1ERyHXFOW/S4IWMgZ9arX1spBPTFZ1jlb0L2NZ2Ou50BiwM7sn2p8MTs4LHC+lEcLKPvVKFYnAOKpFDNSRTZSe1cxFP506D+62K6icFo9h5BqjPpEcd0kqDaG5IFU1ocVR2kW7bUBZXphZyAy5HpRXP+JJtl6iR53KuDRWVi+aJv2+uWumllkhdjnkrWlD4y0ogFlmQ+myuNvCHmf8A3jUCxgkV2RdjzG7nqNt4g0u5iDJcAcdG4rFv5Y5buR4mDI3ORXO6dDZglrjp6CtdzCT/AKOu2MDpU1Hc6sL8RRvvukAdaraJpsup6mYIiocLu+arN1V7wEwj8ZRMf4lwAehrA7puxKyGCRon+8hKn6igeoq94hi8jXbpQMAtkVmgkUJlQleJJ5bysFjXc3pXTR+Ho59JW5lISVEz1rl4rkxzKR1B7V1omaXTi28gFOhrZK6POxE7SPJ9cVptWbYkjDHZSaK9ssI7S3tIyLWLfjJZlBJoqeUi9zxOTmQn3pAQKfdRmOVvTNV8nNXc51sWFcqeDWxYzb49vcVz8jMgyK19CbzN5f8ACom9DooO0i1d4APrWt4JtN+tx3POY6zL1ecjvXVeBYGE7NsOMZJrnudtSd0Q+LiW1xjtwMfnWE8oRctxXTeL5rca4sSgl9vzHHFc1fIrRkCmmTCdkZtvqLyXhVF6HjPeu10WK61O1kZvljThie4rzaW4ayvhIg5Wu48Ia/eapO8WxI4VXJCjk11QehxVdZHS398mmWiF1LDgYFFWjGsnDoCPcUVVi01Y8WuL9bqZht2sCeDTAcVBqkH2HX7qEdFkP86nU8is0cyHuu6M+wrR0YbImPrVL+E/Sr2nMFgOfWpqbG9HVlud9zKvcnivZvCumRWumRuVUEpzjv3rxOSQNLHjruFe56bMYvDQk6FUJ/8AHa5zebPJPEN+L7xnfmMnYjkL9M1DMSyfhWPbXBm1i9l6lpm/nWo7/KabLprQopo41SZ137SBya2/AVqbW6vY2O5kbbkVU0aXF6y56itTwp+71LUR6yf41vS1OaroztFI70VCGIPWiurlMOY8V8XSbvFV6c8mQ5/OmQkGJTUPiNH/ALeu3b+Jyc/jS2jZhHrWHUSLqHjmpIJdoKioQpPNV4Zv9IZfepnsa0XZmzbAvdxem8V7VeXa2vhCRjgYiPP/AAGvErWTZNG3vXpOp3Et54Slij+8YR39ua50tTpnseP2Nywv2OeXYn9a6IMWSuTtz5V2d38LEV1cHzRD3FORVLYn0z5NQX3rc0MbNZvAO+01gQt5Vwr/AN081saLcBtVmI48wDr7CtqTsc9VanYA5Wis+fUILP8A10ir7Zorb2hlyHlXia7ju7lZVUAkc1RsmLKRS6rbSRTtnoCcVFablJqGxJGqvC1lozC8xV0N8vWqYGbpD6mpbKgtTWSQrtr0XQbpbvR3hkbohFefm3Plg9sVraTqj2kTxgHkYyDWKWp0yehyWpRi31m5jByBIcfnXR2J3QL/ALtczqRMmqyOTnLZrrrGDbbJhSAVHX6UpoKDH20QebnpVDXZjbzRi3ZkYkcqaukvBNkdKzNXy80D+4qovQKi1IdRVnt4mcu5IBJJzzRVu6g862Reneii5CidlqfhuyvlYiLBb0FYo8CNuOxjjPetvUtbktmxDgY61iz+Jr5xjzNuPSqVyNBf+EBnJ/4+EUfWsnVfCraQi3LXSSAHgD1qZ9cvPmPnvz1way727muVwzsw9zVJE7EP26TdtzlfStHT0MwfA5rGSB2kHB610ulwiEZHUjmqsgu2crf2k6Xju0bAZ610Vjq/k2cSSIWKrjrV7UIvMgckc44rmlhuixUROfTAqGkyoya2Ne51S3mQjYyn61lTSNNNGQxKg9Kli0TU7ogR278nqRWvYeDNXaQMUC49anRA5yZUv7pTbx+UNhAANFdRF4BvJxiQc+1FToaRloZeqZ89uaymA20UVujGIxUG6nmNPSiigaFjjXeOK6PT4I9i8daKKmRSN6HTraZPnTNamm6XZozbYh+VFFZvYZvQWduBkRr+VXUhjC8ItFFYlvYUnavAH5UUUVSJP//Z'"
+      ]
+     },
+     "execution_count": 36,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pil_to_url(dataset[110]['image'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce9be966",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'The image shows a person from behind, wearing a dark blue t-shirt and pink shorts. They are standing among a group of people, and the setting appears to be outdoors.'"
+      ]
+     },
+     "execution_count": 38,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:8082/v1\")\n",
+    "model_name = client.models.list().data[0].id\n",
+    "\n",
+    "def generate_content(image, prompt):\n",
+    "    \n",
+    "    url_of_pil_image = pil_to_url(image)\n",
+    "    \n",
+    "    response = client.chat.completions.create(\n",
+    "        model=model_name,\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": [\n",
+    "                    {\n",
+    "                        \"type\": \"text\",\n",
+    "                        \"text\": prompt,\n",
+    "                    },\n",
+    "                    {\n",
+    "                        \"type\": \"image_url\",\n",
+    "                        \"image_url\": {\n",
+    "                            \"url\": url_of_pil_image,\n",
+    "                        },\n",
+    "                    },\n",
+    "                ],\n",
+    "            }\n",
+    "        ],\n",
+    "        temperature=0.5,\n",
+    "        top_p=0.8,\n",
+    "    )\n",
+    "    return response.choices[0].message.content\n",
+    "\n",
+    "generate_content(image=dataset[110][\"image\"], prompt=\"describe this image\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8ebeb3b6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PROMPT = '''\n",
+    "You are an AI assistant that helps users describe a given person from an image in detail. The image is taken from a surveillance camera and focuses on one person. Your caption must focus on the person and cover the following aspects:\n",
+    "\n",
+    "- Gender, age, and pose of the person\n",
+    "- Upper body clothing such as shirt, jacket, etc.\n",
+    "- Lower body clothing such as pants, skirt, etc.\n",
+    "- Accessories on head/face such as hat, glasses, etc.\n",
+    "- Accessories on body such as bag, watch, book, etc.\n",
+    "- Accessories on feet such as shoes, sandals, etc.\n",
+    "- Activities and interactions with other objects such as holding a phone, sitting on a bench, etc.\n",
+    "- Transportation such as car, bicycle, etc.\n",
+    "\n",
+    "Here are two example captions. \n",
+    "{EXAMPLE}\n",
+    "Please mimic the style, expression, and sentence structure of the examples without copying the specific details. If the example is unusual, please ignore it. \n",
+    "You must describe the person in your input image truthfully and in detail.\n",
+    "'''\n",
+    "\n",
+    "def make_prompt(prompt, example):\n",
+    "    return prompt.format(EXAMPLE=example)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "76cd677f",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "lmdeploy",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

a_mllm_notebooks/tensorrt-llm/bert/.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ bert*
2	+ *.log

a_mllm_notebooks/tensorrt-llm/bert/README.md ADDED Viewed

	@@ -0,0 +1,79 @@

+# BERT and BERT Variants
+This document explains how to build the BERT family, specifically [BERT](https://huggingface.co/docs/transformers/model_doc/bert) and [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta) model using TensorRT-LLM. It also describes how to run on a single GPU and two GPUs.
+## Overview
+The TensorRT-LLM BERT family implementation can be found in [`tensorrt_llm/models/bert/model.py`](../../tensorrt_llm/models/bert/model.py). The TensorRT-LLM BERT family example
+code is located in [`examples/bert`](./). There are two main files in that folder:
+ * [`build.py`](./build.py) to build the [TensorRT](https://developer.nvidia.com/tensorrt) engine(s) needed to run the model,
+ * [`run.py`](./run.py) to run the inference on an input text,
+## Build and run on a single GPU
+TensorRT-LLM converts HuggingFace BERT family models into TensorRT engine(s).
+To build the TensorRT engine, use:
+```bash
+python3 build.py [--model <model_name> --dtype <data_type> ...]
+```
+Supported `model_name` options include: BertModel, BertForQuestionAnswering, BertForSequenceClassification, RobertaModel, RobertaForQuestionAnswering, and RobertaForSequenceClassification, with `BertModel` as the default.
+Some examples are as follows:
+```bash
+# Build BertModel
+python3 build.py --model BertModel --dtype=float16 --log_level=verbose
+# Build RobertaModel
+python3 build.py --model RobertaModel --dtype=float16 --log_level=verbose
+# Build BertModel with TensorRT-LLM BERT Attention plugin for enhanced runtime performance
+python3 build.py --dtype=float16 --log_level=verbose --use_bert_attention_plugin float16
+# Build BertForSequenceClassification with TensorRT-LLM remove input padding knob for enhanced runtime performance
+python3 build.py --model BertForSequenceClassification --remove_input_padding --use_bert_attention_plugin float16
+```
+The following command can be used to run the model on a single GPU:
+```bash
+python3 run.py
+```
+If the model built with **--remove_input_padding** knob, please run the model with below command
+```bash
+python3 run_remove_input_padding.py
+```
+#### Fused MultiHead Attention (FMHA)
+You can enable the FMHA kernels for BERT by adding `--enable_context_fmha` to the invocation of `build.py`. Note that it is disabled by default because of possible accuracy issues due to the use of Flash Attention.
+If you find that the default fp16 accumulation (`--enable_context_fmha`) cannot meet the requirement, you can try to enable fp32 accumulation by adding `--enable_context_fmha_fp32_acc`. However, it is expected to see performance drop.
+Note `--enable_context_fmha` / `--enable_context_fmha_fp32_acc` has to be used together with `--use_bert_attention_plugin float16`.
+#### Remove input padding
+The remove input padding feature is enabled by adding `--remove_input_padding` into build command.
+When input padding is removed, the different tokens are packed together. It reduces both the amount of computations and memory consumption. For more details, see this [Document](https://nvidia.github.io/TensorRT-LLM/advanced/gpt-attention.md#padded-and-packed-tensors).
+Currently, this feature only enables for BertForSequenceClassification model.
+## Build and run on two GPUs
+The following two commands can be used to build TensorRT engines to run BERT on two GPUs. The first command builds one engine for the first GPU. The second command builds another engine for the second GPU. For example, to build `BertForQuestionAnswering` with two GPUs, run:
+```bash
+python3 build.py --model BertForQuestionAnswering --world_size=2 --rank=0
+python3 build.py --model BertForQuestionAnswering --world_size=2 --rank=1
+```
+The following command can be used to run the inference on 2 GPUs. It uses MPI with `mpirun`.
+```bash
+mpirun -n 2 python3 run.py
+```

a_mllm_notebooks/tensorrt-llm/bert/base_benchmark/config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "builder_config": {
+    "max_batch_size": 256,
+    "max_input_len": 512,
+    "name": "bert",
+    "precision": "float16",
+    "tensor_parallel": 1,
+    "use_refit": false
+  },
+  "plugin_config": {
+    "bert_attention_plugin": "float16",
+    "context_fmha_enabled": true,
+    "gemm_plugin": "float16",
+    "gpt_attention_plugin": false,
+    "identity_plugin": false,
+    "layernorm_plugin": false,
+    "layernorm_quantization_plugin": false,
+    "nccl_plugin": false,
+    "smooth_quant_gemm_plugin": false,
+    "weight_only_quant_matmul_plugin": false
+  }
+}

a_mllm_notebooks/tensorrt-llm/bert/base_with_attention_plugin_benchmark/config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "builder_config": {
+    "max_batch_size": 256,
+    "max_input_len": 512,
+    "name": "bert",
+    "precision": "float16",
+    "tensor_parallel": 1,
+    "use_refit": false
+  },
+  "plugin_config": {
+    "bert_attention_plugin": "float16",
+    "context_fmha_enabled": true,
+    "gemm_plugin": "float16",
+    "gpt_attention_plugin": false,
+    "identity_plugin": false,
+    "layernorm_plugin": false,
+    "layernorm_quantization_plugin": false,
+    "nccl_plugin": false,
+    "smooth_quant_gemm_plugin": false,
+    "weight_only_quant_matmul_plugin": false
+  }
+}

a_mllm_notebooks/tensorrt-llm/bert/build.py ADDED Viewed

	@@ -0,0 +1,354 @@

+# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import os
+from collections import OrderedDict
+# isort: off
+import torch
+import tensorrt as trt
+# isort: on
+from transformers import BertConfig, BertForQuestionAnswering, BertForSequenceClassification, BertModel  # isort:skip
+from transformers import RobertaConfig, RobertaForQuestionAnswering, RobertaForSequenceClassification, RobertaModel  # isort:skip
+from weight import (load_from_hf_cls_model, load_from_hf_model,
+                    load_from_hf_qa_model)
+import tensorrt_llm
+from tensorrt_llm.builder import Builder
+from tensorrt_llm.mapping import Mapping
+from tensorrt_llm.network import net_guard
+from tensorrt_llm.plugin.plugin import ContextFMHAType
+def get_engine_name(model, dtype, tp_size, rank):
+    return '{}_{}_tp{}_rank{}.engine'.format(model, dtype, tp_size, rank)
+def parse_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--world_size',
+                        type=int,
+                        default=1,
+                        help='Tensor parallelism size')
+    parser.add_argument('--rank', type=int, default=0)
+    parser.add_argument('--dtype',
+                        type=str,
+                        default='float16',
+                        choices=['float16', 'float32'])
+    parser.add_argument('--timing_cache', type=str, default='model.cache')
+    parser.add_argument(
+        '--profiling_verbosity',
+        type=str,
+        default='layer_names_only',
+        choices=['layer_names_only', 'detailed', 'none'],
+        help=
+        'The profiling verbosity for the generated TRT engine. Set to detailed can inspect tactic choices and kernel parameters.'
+    )
+    parser.add_argument('--log_level', type=str, default='info')
+    parser.add_argument('--vocab_size', type=int, default=51200)
+    parser.add_argument('--n_labels', type=int, default=2)
+    parser.add_argument('--n_layer', type=int, default=24)
+    parser.add_argument('--n_positions', type=int, default=1024)
+    parser.add_argument('--n_embd', type=int, default=1024)
+    parser.add_argument('--n_head', type=int, default=16)
+    parser.add_argument('--hidden_act', type=str, default='gelu')
+    parser.add_argument('--max_batch_size', type=int, default=256)
+    parser.add_argument('--max_input_len', type=int, default=512)
+    parser.add_argument('--gpus_per_node', type=int, default=8)
+    parser.add_argument('--output_dir', type=str, default='bert_outputs')
+    parser.add_argument('--remove_input_padding',
+                        default=False,
+                        action='store_true')
+    parser.add_argument('--use_bert_attention_plugin',
+                        nargs='?',
+                        const='float16',
+                        type=str,
+                        default=False,
+                        choices=['float16', 'float32'])
+    parser.add_argument('--use_gemm_plugin',
+                        nargs='?',
+                        const='float16',
+                        type=str,
+                        default=False,
+                        choices=['float16', 'float32'])
+    parser.add_argument('--enable_context_fmha',
+                        default=False,
+                        action='store_true')
+    parser.add_argument('--enable_context_fmha_fp32_acc',
+                        default=False,
+                        action='store_true')
+    parser.add_argument('--model',
+                        default='BertModel',
+                        choices=[
+                            'BertModel',
+                            'BertForQuestionAnswering',
+                            'BertForSequenceClassification',
+                            'RobertaModel',
+                            'RobertaForQuestionAnswering',
+                            'RobertaForSequenceClassification',
+                        ])
+    parser.add_argument('--model_dir', type=str, required=False)
+    return parser.parse_args()
+def prepare_inputs():
+    # opt_shape is set to half of max batch_size and seq_len by default
+    # tune this according to real data distribution
+    bs_range = [1, (args.max_batch_size + 1) // 2, args.max_batch_size]
+    inlen_range = [1, (args.max_input_len + 1) // 2, args.max_input_len]
+    num_tokens_range = [
+        1,
+        (args.max_input_len * args.max_batch_size + 1) // 2,
+        args.max_input_len * args.max_batch_size,
+    ]
+    if not args.remove_input_padding:
+        input_ids = tensorrt_llm.Tensor(
+            name='input_ids',
+            dtype=trt.int32,
+            shape=[-1, -1],
+            dim_range=OrderedDict([('batch_size', [bs_range]),
+                                   ('input_len', [inlen_range])]),
+        )
+        # also called segment_ids
+        token_type_ids = tensorrt_llm.Tensor(
+            name='token_type_ids',
+            dtype=trt.int32,
+            shape=[-1, -1],
+            dim_range=OrderedDict([('batch_size', [bs_range]),
+                                   ('input_len', [inlen_range])]),
+        )
+    else:
+        input_ids = tensorrt_llm.Tensor(
+            name="input_ids",
+            dtype=trt.int32,
+            shape=[-1],
+            dim_range=OrderedDict([("num_tokens", [num_tokens_range])]),
+        )
+        token_type_ids = tensorrt_llm.Tensor(
+            name='token_type_ids',
+            dtype=trt.int32,
+            shape=[-1],
+            dim_range=OrderedDict([('num_tokens', [num_tokens_range])]),
+        )
+        position_ids = tensorrt_llm.Tensor(
+            name='position_ids',
+            dtype=trt.int32,
+            shape=[-1],
+            dim_range=OrderedDict([('num_tokens', [num_tokens_range])]),
+        )
+        max_input_length = tensorrt_llm.Tensor(
+            name="max_input_length",
+            dtype=trt.int32,
+            shape=[-1],
+            dim_range=OrderedDict([("max_input_length", [inlen_range])]),
+        )
+    input_lengths = tensorrt_llm.Tensor(name='input_lengths',
+                                        dtype=trt.int32,
+                                        shape=[-1],
+                                        dim_range=OrderedDict([('batch_size',
+                                                                [bs_range])]))
+    inputs = {
+        'input_ids': input_ids,
+        'input_lengths': input_lengths,
+        'token_type_ids': token_type_ids,
+    }
+    if args.remove_input_padding:
+        inputs['position_ids'] = position_ids
+        inputs['max_input_length'] = max_input_length
+    return inputs
+if __name__ == '__main__':
+    args = parse_arguments()
+    tensorrt_llm.logger.set_level(args.log_level)
+    if not os.path.exists(args.output_dir):
+        os.makedirs(args.output_dir)
+    torch_dtype = torch.float16 if args.dtype == 'float16' else torch.float32
+    trt_dtype = trt.float16 if args.dtype == 'float16' else trt.float32
+    builder = Builder()
+    builder_config = builder.create_builder_config(
+        name=args.model,
+        precision=args.dtype,
+        timing_cache=args.timing_cache,
+        profiling_verbosity=args.profiling_verbosity,
+        tensor_parallel=args.world_size,  # TP only
+        max_batch_size=args.max_batch_size,
+        max_input_len=args.max_input_len,
+    )
+    # Initialize model
+    if 'Roberta' in args.model:
+        model_type = 'Roberta'
+    else:
+        model_type = 'Bert'
+    # initialize config with input arguments and update from json
+    config_cls = globals()[f'{model_type}Config']
+    config = dict(
+        vocab_size=args.vocab_size,
+        num_labels=args.n_labels,
+        num_hidden_layers=args.n_layer,
+        max_position_embeddings=args.n_positions,
+        hidden_size=args.n_embd,
+        num_attention_heads=args.n_head,
+        intermediate_size=4 * args.n_embd if args.n_embd else None,
+        hidden_act=args.hidden_act,
+        torch_dtype=torch_dtype,
+    )
+    if args.model_dir is not None:
+        json_config = config_cls.get_config_dict(args.model_dir)[0]
+        config.update((k, v) for k, v in json_config.items() if v is not None)
+    bert_config = config_cls.from_dict(config)
+    output_name = 'hidden_states'
+    if args.model == 'BertModel' or args.model == 'RobertaModel':
+        hf_bert = globals()[f'{model_type}Model'](bert_config,
+                                                  add_pooling_layer=False)
+        tensorrt_llm_bert = tensorrt_llm.models.BertModel(
+            num_layers=bert_config.num_hidden_layers,
+            num_heads=bert_config.num_attention_heads,
+            hidden_size=bert_config.hidden_size,
+            vocab_size=bert_config.vocab_size,
+            hidden_act=bert_config.hidden_act,
+            max_position_embeddings=bert_config.max_position_embeddings,
+            type_vocab_size=bert_config.type_vocab_size,
+            pad_token_id=bert_config.pad_token_id,
+            is_roberta=(model_type == 'Roberta'),
+            mapping=Mapping(world_size=args.world_size,
+                            rank=args.rank,
+                            tp_size=args.world_size),  # TP only
+            dtype=trt_dtype)
+        load_from_hf_model(
+            tensorrt_llm_bert,
+            hf_bert,
+            bert_config,
+            rank=args.rank,
+            tensor_parallel=args.world_size,
+            fp16=(args.dtype == 'float16'),
+        )
+    elif args.model == 'BertForQuestionAnswering' or args.model == 'RobertaForQuestionAnswering':
+        hf_bert = globals()[f'{model_type}ForQuestionAnswering'](bert_config)
+        tensorrt_llm_bert = tensorrt_llm.models.BertForQuestionAnswering(
+            num_layers=bert_config.num_hidden_layers,
+            num_heads=bert_config.num_attention_heads,
+            hidden_size=bert_config.hidden_size,
+            vocab_size=bert_config.vocab_size,
+            hidden_act=bert_config.hidden_act,
+            max_position_embeddings=bert_config.max_position_embeddings,
+            type_vocab_size=bert_config.type_vocab_size,
+            pad_token_id=bert_config.pad_token_id,
+            is_roberta=(model_type == 'Roberta'),
+            num_labels=args.
+            n_labels,  # TODO: this might just need to be a constant
+            mapping=Mapping(world_size=args.world_size,
+                            rank=args.rank,
+                            tp_size=args.world_size),  # TP only
+            dtype=trt_dtype)
+        load_from_hf_qa_model(
+            tensorrt_llm_bert,
+            hf_bert,
+            bert_config,
+            rank=args.rank,
+            tensor_parallel=args.world_size,
+            fp16=(args.dtype == 'float16'),
+        )
+        output_name = 'logits'
+    elif args.model == 'BertForSequenceClassification' or args.model == 'RobertaForSequenceClassification':
+        hf_bert = globals()[f'{model_type}ForSequenceClassification'](
+            config=bert_config)
+        if args.model_dir is not None and os.path.exist(
+                os.path.join(args.model_dir, "pytorch_model.bin")):
+            state_dict = torch.load(
+                os.path.join(args.model_dir, "pytorch_model.bin"))
+            hf_bert.load_state_dict(state_dict, strict=False)
+        tensorrt_llm_bert = tensorrt_llm.models.BertForSequenceClassification(
+            num_layers=bert_config.num_hidden_layers,
+            num_heads=bert_config.num_attention_heads,
+            hidden_size=bert_config.hidden_size,
+            vocab_size=bert_config.vocab_size,
+            hidden_act=bert_config.hidden_act,
+            max_position_embeddings=bert_config.max_position_embeddings,
+            type_vocab_size=bert_config.type_vocab_size,
+            pad_token_id=bert_config.pad_token_id,
+            is_roberta=(model_type == 'Roberta'),
+            num_labels=bert_config.num_labels,
+            mapping=Mapping(world_size=args.world_size,
+                            rank=args.rank,
+                            tp_size=args.world_size),  # TP only
+            dtype=trt_dtype)
+        load_from_hf_cls_model(
+            tensorrt_llm_bert,
+            hf_bert,
+            bert_config,
+            rank=args.rank,
+            tensor_parallel=args.world_size,
+            fp16=(args.dtype == 'float16'),
+        )
+        output_name = 'logits'
+    else:
+        assert False, f"Unknown BERT model {args.model}"
+    # Module -> Network
+    network = builder.create_network()
+    network.plugin_config.to_legacy_setting()
+    if args.remove_input_padding:
+        assert args.model == "BertForSequenceClassification", \
+            "remove_input_padding is only supported for BertForSequenceClassification models"
+        network.plugin_config.remove_input_padding = True
+    if args.use_bert_attention_plugin:
+        network.plugin_config.bert_attention_plugin = args.use_bert_attention_plugin
+    if args.use_gemm_plugin:
+        network.plugin_config.gemm_plugin = args.use_gemm_plugin
+    assert not (args.enable_context_fmha and args.enable_context_fmha_fp32_acc)
+    if args.enable_context_fmha:
+        network.plugin_config.set_context_fmha(ContextFMHAType.enabled)
+    if args.enable_context_fmha_fp32_acc:
+        network.plugin_config.set_context_fmha(
+            ContextFMHAType.enabled_with_fp32_acc)
+    if args.world_size > 1:
+        network.plugin_config.set_nccl_plugin(args.dtype)
+    with net_guard(network):
+        # Prepare
+        network.set_named_parameters(tensorrt_llm_bert.named_parameters())
+        # Forward
+        inputs = prepare_inputs()
+        # logits for QA BERT, or hidden_state for vanilla BERT
+        output = tensorrt_llm_bert(**inputs)
+        # Mark outputs
+        output_dtype = trt.float16 if args.dtype == 'float16' else trt.float32
+        output.mark_output(output_name, output_dtype)
+    # Network -> Engine
+    engine = builder.build_engine(network, builder_config)
+    assert engine is not None, 'Failed to build engine.'
+    engine_file = os.path.join(
+        args.output_dir,
+        get_engine_name(args.model, args.dtype, args.world_size, args.rank))
+    with open(engine_file, 'wb') as f:
+        f.write(engine)
+    builder.save_config(builder_config,
+                        os.path.join(args.output_dir, 'config.json'))

a_mllm_notebooks/tensorrt-llm/bert/large_benchmark/config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "builder_config": {
+    "max_batch_size": 256,
+    "max_input_len": 512,
+    "name": "bert",
+    "precision": "float16",
+    "tensor_parallel": 1,
+    "use_refit": false
+  },
+  "plugin_config": {
+    "bert_attention_plugin": false,
+    "context_fmha_enabled": false,
+    "gemm_plugin": false,
+    "gpt_attention_plugin": false,
+    "identity_plugin": false,
+    "layernorm_plugin": false,
+    "layernorm_quantization_plugin": false,
+    "nccl_plugin": false,
+    "smooth_quant_gemm_plugin": false,
+    "weight_only_quant_matmul_plugin": false
+  }
+}

a_mllm_notebooks/tensorrt-llm/bert/large_with_attention_plugin_benchmark/config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "builder_config": {
+    "max_batch_size": 256,
+    "max_input_len": 512,
+    "name": "bert",
+    "precision": "float16",
+    "tensor_parallel": 1,
+    "use_refit": false
+  },
+  "plugin_config": {
+    "bert_attention_plugin": "float16",
+    "context_fmha_enabled": true,
+    "gemm_plugin": "float16",
+    "gpt_attention_plugin": false,
+    "identity_plugin": false,
+    "layernorm_plugin": false,
+    "layernorm_quantization_plugin": false,
+    "nccl_plugin": false,
+    "smooth_quant_gemm_plugin": false,
+    "weight_only_quant_matmul_plugin": false
+  }
+}

a_mllm_notebooks/tensorrt-llm/bert/run.py ADDED Viewed

	@@ -0,0 +1,128 @@

+# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import json
+import os
+# isort: off
+import torch
+import tensorrt as trt
+# isort: on
+import tensorrt_llm
+from tensorrt_llm import logger
+from tensorrt_llm.runtime import Session, TensorInfo
+from build import get_engine_name  # isort:skip
+def trt_dtype_to_torch(dtype):
+    if dtype == trt.float16:
+        return torch.float16
+    elif dtype == trt.float32:
+        return torch.float32
+    elif dtype == trt.int32:
+        return torch.int32
+    else:
+        raise TypeError("%s is not supported" % dtype)
+def parse_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--log_level', type=str, default='info')
+    parser.add_argument('--engine_dir', type=str, default='bert_outputs')
+    return parser.parse_args()
+if __name__ == '__main__':
+    args = parse_arguments()
+    tensorrt_llm.logger.set_level(args.log_level)
+    config_path = os.path.join(args.engine_dir, 'config.json')
+    with open(config_path, 'r') as f:
+        config = json.load(f)
+    assert config["plugin_config"]["remove_input_padding"] == False, \
+        "Please refer to run_remove_input_padding.py for running BERT models with remove_input_padding enabled"
+    dtype = config['builder_config']['precision']
+    world_size = config['builder_config']['tensor_parallel']
+    assert world_size == tensorrt_llm.mpi_world_size(), \
+        f'Engine world size ({world_size}) != Runtime world size ({tensorrt_llm.mpi_world_size()})'
+    model_name = config['builder_config']['name']
+    runtime_rank = tensorrt_llm.mpi_rank() if world_size > 1 else 0
+    runtime_mapping = tensorrt_llm.Mapping(world_size,
+                                           runtime_rank,
+                                           tp_size=world_size)
+    torch.cuda.set_device(runtime_rank % runtime_mapping.gpus_per_node)
+    serialize_path = get_engine_name(model_name, dtype, world_size,
+                                     runtime_rank)
+    serialize_path = os.path.join(args.engine_dir, serialize_path)
+    stream = torch.cuda.current_stream().cuda_stream
+    logger.info(f'Loading engine from {serialize_path}')
+    with open(serialize_path, 'rb') as f:
+        engine_buffer = f.read()
+    logger.info(f'Creating session from engine')
+    session = Session.from_serialized_engine(engine_buffer)
+    for i in range(3):
+        batch_size = (i + 1) * 4
+        seq_len = (i + 1) * 32
+        input_ids = torch.randint(100, (batch_size, seq_len)).int().cuda()
+        input_lengths = seq_len * torch.ones(
+            (batch_size, ), dtype=torch.int32, device='cuda')
+        token_type_ids = torch.randint(100, (batch_size, seq_len)).int().cuda()
+        inputs = {
+            'input_ids': input_ids,
+            'input_lengths': input_lengths,
+            'token_type_ids': token_type_ids
+        }
+        output_info = session.infer_shapes([
+            TensorInfo('input_ids', trt.DataType.INT32, input_ids.shape),
+            TensorInfo('input_lengths', trt.DataType.INT32,
+                       input_lengths.shape),
+            TensorInfo('token_type_ids', trt.DataType.INT32,
+                       token_type_ids.shape),
+        ])
+        outputs = {
+            t.name: torch.empty(tuple(t.shape),
+                                dtype=trt_dtype_to_torch(t.dtype),
+                                device='cuda')
+            for t in output_info
+        }
+        if (model_name == 'BertModel' or model_name == 'RobertaModel'):
+            output_name = 'hidden_states'
+        elif (model_name == 'BertForQuestionAnswering'
+              or model_name == 'RobertaForQuestionAnswering'):
+            output_name = 'logits'
+        elif (model_name == 'BertForSequenceClassification'
+              or model_name == 'RobertaForSequenceClassification'):
+            output_name = 'logits'
+        else:
+            assert False, f"Unknown BERT model {model_name}"
+        assert output_name in outputs, f'{output_name} not found in outputs, check if build.py set the name correctly'
+        ok = session.run(inputs, outputs, stream)
+        assert ok, "Runtime execution failed"
+        torch.cuda.synchronize()
+        res = outputs[output_name]

a_mllm_notebooks/tensorrt-llm/bert/run_remove_input_padding.py ADDED Viewed

	@@ -0,0 +1,153 @@

+# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import json
+import os
+import random
+from typing import List
+# isort: off
+import torch
+import tensorrt as trt
+# isort: on
+import tensorrt_llm
+from tensorrt_llm import logger
+from tensorrt_llm.runtime import Session, TensorInfo
+from build import get_engine_name  # isort:skip
+def trt_dtype_to_torch(dtype):
+    if dtype == trt.float16:
+        return torch.float16
+    elif dtype == trt.float32:
+        return torch.float32
+    elif dtype == trt.int32:
+        return torch.int32
+    else:
+        raise TypeError("%s is not supported" % dtype)
+def parse_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--log_level", type=str, default="info")
+    parser.add_argument("--engine_dir", type=str, default='bert_outputs')
+    return parser.parse_args()
+def process_input(input_ids_list: List[torch.Tensor],
+                  token_type_ids_list: List[torch.Tensor]):
+    input_lengths = []
+    position_ids_list = []
+    max_input_length = 0
+    for i, input_ids in enumerate(input_ids_list):
+        input_len = len(input_ids)
+        assert input_len == len(token_type_ids_list[i]), f"sample {i}: len(input_ids)={len(input_ids)}, " \
+                                                         f"len(token_type_ids)={len(token_type_ids_list[i])}, not equal"
+        input_lengths.append(input_len)
+        position_ids_list.append(torch.arange(0, input_len, dtype=torch.int32))
+        max_input_length = max(max_input_length, input_len)
+    # [num_tokens]
+    input_ids = torch.concat(input_ids_list).int().cuda()
+    token_type_ids = torch.concat(token_type_ids_list).int().cuda()
+    position_ids = torch.concat(position_ids_list).int().cuda()
+    input_lengths = torch.tensor(input_lengths).int().cuda()  # [batch_size]
+    max_input_length = torch.empty((max_input_length, )).int().cuda()
+    return input_ids, input_lengths, token_type_ids, position_ids, max_input_length
+if __name__ == '__main__':
+    args = parse_arguments()
+    tensorrt_llm.logger.set_level(args.log_level)
+    config_path = os.path.join(args.engine_dir, 'config.json')
+    with open(config_path, 'r') as f:
+        config = json.load(f)
+    dtype = config['builder_config']['precision']
+    world_size = config['builder_config']['tensor_parallel']
+    assert world_size == tensorrt_llm.mpi_world_size(), \
+        f'Engine world size ({world_size}) != Runtime world size ({tensorrt_llm.mpi_world_size()})'
+    model_name = config['builder_config']['name']
+    runtime_rank = tensorrt_llm.mpi_rank() if world_size > 1 else 0
+    runtime_mapping = tensorrt_llm.Mapping(world_size,
+                                           runtime_rank,
+                                           tp_size=world_size)
+    torch.cuda.set_device(runtime_rank % runtime_mapping.gpus_per_node)
+    serialize_path = get_engine_name(model_name, dtype, world_size,
+                                     runtime_rank)
+    serialize_path = os.path.join(args.engine_dir, serialize_path)
+    stream = torch.cuda.current_stream().cuda_stream
+    logger.info(f'Loading engine from {serialize_path}')
+    with open(serialize_path, 'rb') as f:
+        engine_buffer = f.read()
+    logger.info(f'Creating session from engine')
+    session = Session.from_serialized_engine(engine_buffer)
+    remove_input_padding = config["plugin_config"]["remove_input_padding"]
+    assert remove_input_padding, "This is a demo for BERT models with remove_input_padding enabled"
+    for i in range(3):
+        batch_size = (i + 1) * 4
+        # use list of tensor to represent unpadded samples
+        input_ids = []
+        token_type_ids = []
+        for _ in range(batch_size):
+            seq_len = random.randint(64, 128)
+            input_ids.append(torch.randint(100, size=(seq_len, )).int().cuda())
+            token_type_ids.append(
+                torch.randint(0, 1, size=(seq_len, )).int().cuda())
+        input_ids, input_lengths, token_type_ids, position_ids, max_input_length = \
+            process_input(input_ids, token_type_ids)
+        inputs = {
+            "input_ids": input_ids,
+            "input_lengths": input_lengths,
+            "token_type_ids": token_type_ids,
+            "position_ids": position_ids,
+            "max_input_length": max_input_length
+        }
+        output_info = session.infer_shapes([
+            TensorInfo("input_ids", trt.DataType.INT32, input_ids.shape),
+            TensorInfo("input_lengths", trt.DataType.INT32,
+                       input_lengths.shape),
+            TensorInfo("token_type_ids", trt.DataType.INT32,
+                       token_type_ids.shape),
+            TensorInfo("position_ids", trt.DataType.INT32, position_ids.shape),
+            TensorInfo("max_input_length", trt.DataType.INT32,
+                       max_input_length.shape)
+        ])
+        outputs = {
+            t.name: torch.empty(tuple(t.shape),
+                                dtype=trt_dtype_to_torch(t.dtype),
+                                device='cuda')
+            for t in output_info
+        }
+        output_name = "logits"
+        assert output_name in outputs, f'{output_name} not found in outputs, check if build.py set output name correctly'
+        ok = session.run(inputs, outputs, stream)
+        assert ok, "Runtime execution failed"
+        torch.cuda.synchronize()
+        res = outputs[output_name]
+        print(res)