lmms-lab
/

llava-next-110b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

luodian commited on May 10

Commit

e9f1edb

•

1 Parent(s): 52dc246

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ We conducted the training on LLaVA-1.6's codebase with adding support of Llama-3
 ### Training Hyperparameters
 ```shell
-LLM_VERSION="Qwen/Qwen1.5-72B-Chat"
 LLM_VERSION_CLEAN="${LLM_VERSION//\//_}"
 VISION_MODEL_VERSION="openai/clip-vit-large-patch14-336"
 VISION_MODEL_VERSION_CLEAN="${VISION_MODEL_VERSION//\//_}"
@@ -80,7 +80,7 @@ torchrun # with necessary torchrun information for distributed training\
     --num_train_epochs 1 \
     --per_device_train_batch_size 1 \
     --per_device_eval_batch_size 4 \
-    --gradient_accumulation_steps 2 \
     --evaluation_strategy "no" \
     --save_strategy "steps" \
     --save_steps 3000 \
@@ -108,6 +108,7 @@ torchrun # with necessary torchrun information for distributed training\
 - 500K academic-task-oriented VQA data mixture.
 - 50K GPT-4V data mixture.
 - 40K ShareGPT data.
 #### Speeds, Sizes, Times [optional]

 ### Training Hyperparameters
 ```shell
+LLM_VERSION="Qwen/Qwen1.5-110B-Chat"
 LLM_VERSION_CLEAN="${LLM_VERSION//\//_}"
 VISION_MODEL_VERSION="openai/clip-vit-large-patch14-336"
 VISION_MODEL_VERSION_CLEAN="${VISION_MODEL_VERSION//\//_}"
     --num_train_epochs 1 \
     --per_device_train_batch_size 1 \
     --per_device_eval_batch_size 4 \
+    --gradient_accumulation_steps 1 \
     --evaluation_strategy "no" \
     --save_strategy "steps" \
     --save_steps 3000 \
 - 500K academic-task-oriented VQA data mixture.
 - 50K GPT-4V data mixture.
 - 40K ShareGPT data.
+- 20K COCO Caption data.
 #### Speeds, Sizes, Times [optional]