Task Reasoning test

#1
by Tibbnak - opened

https://arxiv.org/pdf/1502.05698.pdf
Tested model using the tasks outlined in this paper. Had to mulit-shot with <|startoftext|>/n### Question: \nOutput:\n
Had to regen a few times
but ultimately the model could answer 13.5/20 which is in the upper end of 13b models. Best 13b model I've tested can get 15
gpt 3.5 turbo tends to get 17.5/20

May benefit from further finetuning on a lora based around structured chain of thought or reasoning. (example, CoT, superCoT, Dolphin, etc)

Details:
On task 4 it couldn't figure out that the bedroom was north of the bathroom (kept saying the bathroom was north of the bedroom)

On task 10 it would only answer 'no' for if John was in the classroom when the expected anwser is either that it's unclear or maybe.

task 11 it just kept saying kitchen.

task 13 it kept saying Daniel was in the office

Task 14 it just said she went home but got the school part right.

task 17 it answered no to if the red square was to the left of the triangle

task 18 it did not think the box could fit in the suitcase but could figure out the cupboard couldn't fit into the box.

task 19 it sorta got it

By giving directions to "Go to the hallway then go to the kitchen"
and "Go to the hallway then go to the bathroom"
which isn't what the paper looks for but it is technically correct

Sign up or log in to comment