Llama-3.2-11B-Vision-Instruct and Llama-3.2-11B-Vision results are exactly same, coincidentally or it's same model?
I benchmarked the model on several datasets, and I noticed that the results were identical, including the floating-point details.
Both share the same Base, the instruct Model was taught Instruction-Following after the base model was trained
@aaditya Can you tell me more what benchmark did you run? How did you run them and what result did you get? Thanks!
Both share the same Base, the instruct Model was taught Instruction-Following after the base model was trained
@Sanyam Yes, you're absolutely right. However, after fine-tuning the base model, isn't it common for performance to change slightly, whether it's an improvement or a slight decline?
@wukaixingxp
I evaluated the medical benchmark multimedqa
, which includes 9 different datasets, using lm-harness. I tried twice and got the same results both times, though I acknowledge there could be a possibility of an error on my part.
@aaditya Can you tell me the command you used to run the eval? what number did you get? Thanks!
Hi @wukaixingxp Here are the details:
command for Llama-3.2-11B-Vision
lm_eval --model hf \ --model_args pretrained=meta-llama/Llama-3.2-11B-Vision \ --tasks multimedqa \ --device cuda:0 \ --batch_size auto \ --output_path results --log_samples
Result on single A100:
command for Llama-3.2-11B-Vision-Instruct
lm_eval --model hf \ --model_args pretrained=meta-llama/Llama-3.2-11B-Vision-Instruct \ --tasks multimedqa \ --device cuda:0 \ --batch_size auto \ --output_path results --log_samples
Result on single A100:
@aaditya
For instruct model please use --apply_chat_template
option to get the special token like <|start_header_id|>user<|end_header_id|>
added. Let me know if that works.