Any eval results?

#1
by geepytee - opened

Would be great to see humaneval results for this fine tune.

Just tried running this fine tune in a Space, was hoping to put it through humaneval but the results don't really make sense even as I was testing basic tasks. Even a simple "write a python function that prints hello world" gets the LLM rambling. Perhaps the person who fine tune it can provide some guidance?

it uses the same prompt template the OpenHermes 2.5, which is ChatML
tried with that: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B#prompt-format

Sign up or log in to comment