evaluation different from your self-reported results

#1
by catherinexyz - opened

Hi, sorry to bother you again. I use your checkpoint nkkbr/vica2 to evaluate VSI-Bench, and the results I got are a bit different from your reported results-60%. The evaluation results I got is 57% and I am wondering what settings might cause this difference (to provide additional information, I used the 64 frames and the temperature 0 during the evaluation and followed the QA template you used on the github)? Thanks for your patience!

Hi, @catherinexyz , thanks for reaching out and for your evaluation of ViCA2 on VSI-Bench.

To be precise, our tested result on VSI-Bench is 56.8%.

You can find a detailed breakdown of our scores across the eight tasks in the main table of our repository:

Overall Performance on VSI-Bench

To help us compare results more granularly, could you please share the scores you obtained for each of the eight tasks in your test?

Thanks again for your valuable feedback!

That makes sense! Thanks for your reply!

catherinexyz changed discussion status to closed

Sign up or log in to comment