Issue/Bug replicating HumanEval result
Hi all,
I'm looking to replicate the HumanEval result for this model so that I can then go on to testing on interesting orthogonal benchmarks.
Unfortunately, I find that the model goes off the rails frequently, and is likely far from Phind's quoted performance when i attempt to replicate. Does anyone see an obvious bug here - https://github.com/emrgnt-cmplxty/zero-shot-replication/blob/main/zero_shot_replication/model/hugging_face_model/phind_model.py?
For reference, I am seeing output like that shown:
def is_multiply_prime(a):
"""Write a function that returns true if the given number is the multiplication of 3 prime numbers
and false otherwise.
Knowing that (a) is less then 100.
Example:
is_multiply_prime(30) == True
30 = 2 * 3 * 5
"""
def is_prime(n))):
if n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n
This model have the Theta of 1000000. Is there any way to implement that in the script?
Thanks for reporting, we'll investigate
The eval code in the model card just worked for me. Could you please let me know if that works for you?
I will test explicitly tomorrow, I don't think there are any significant diffs w.r.t what I am doing, but this can help pinpoint.
The eval code in the model card just worked for me. Could you please let me know if that works for you?
same here, every outputs end with same words, it seems there is no end_token here
There is some commentary in the reddit thread here -> https://www.reddit.com/r/LocalLLaMA/comments/164754t/wizardcoder_eval_results_vs_chatgpt_and_claude_on/
It does seem that the issue is related to transformers version.
Can confirm, running off transformers main brach commit worked.
I tried this code on single gpu. but getting bad results.
from transformers import AutoTokenizer, LlamaForCausalLM
from transformers import BitsAndBytesConfig
import torch
import os
model_path = "Phind/Phind-CodeLlama-34B-v2"
model = LlamaForCausalLM.from_pretrained(model_path, load_in_8bit=True, device_map="auto")
#model = LlamaForCausalLM.from_pretrained(model_path, quantization_config=nf4_config)
tokenizer = AutoTokenizer.from_pretrained(model_path, legacy=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
text = "Write a code in python for Inferecing large language models using Transformers library. Give step by step approach."
inputs = tokenizer(text, return_tensors="pt").to("cuda:0")
out = model.generate(**inputs, max_length=200, temperature=0.9, repetition_penalty=1.5, do_sample=True)
print(tokenizer.decode(out[0][len(inputs['input_ids'][0]):]))
This is the output i am getting.
In order to inferencing with transformer model, we need use the Hugging Face's pytorch-transformers Library.
Step 1: Installation of Libraries
You can install this required useful very necessary important big huge immense massive monstrous enormous vast colossal portentious prodigious sizeable sizable mammoth mind mouth multitudinously numberless numb numerous novel nones none non non nonsensical senseless insignificant inconsequentialist unimportant small sm
python
# Importing Necessary nec es ess ent en env e environments needed environment environments environments
import torch
from transformers import AutoModelForMaskedLM,AutoTokenizerFastBert BertConfigP
class Class Config Model Token BERT For
config = class Auto
Can someone suggest?