jnanliu
/

LiveMath-Judge

Text Generation

feature-extraction

text-generation-inference

Model card Files Files and versions

jnanliu commited on Jan 9

Commit

7b5f5ba

·

verified ·

1 Parent(s): 82bb9b5

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 # load the model and tokenizer
 model = AutoModelForCausalLM.from_pretrained(
     'jnanliu/LiveMath-Judge',
-    device_map="auto",
     torch_dtype=torch.bfloat16,
 )
 tokenizer = AutoTokenizer.from_pretrained(
@@ -66,12 +66,12 @@ Analysis:
 conversations = [
   {'role': 'user', 'content': prompt.format(question=question, gold_answer=golden_answer, answer=generated_answer)}
 ]
-inputs = tokenizer.apply_chat_template(conversations, return_tensors="pt")
 # do inference
 pred = model.generate(
-    input_ids=inputs["input_ids"].to(model.device),
-    attention_mask=inputs["attention_mask"].to(model.device),
     num_return_sequences=1,
 )[0].cpu().tolist()
 response = tokenizer.decode(pred, skip_special_tokens=True)
@@ -84,8 +84,8 @@ response = tokenizer.decode(pred, skip_special_tokens=True)
 Following is the mG-Pass@16 results of Qwen2.5-72B-Instruct-as-Judge and LiveMath-Judge-as-Judge of 10 models.
 |model| Qwen2.5-72B-Instruct | LiveMath-Judge |
 | -- | -- | -- |
-| Qwen2.5-7B-Instruct | 26.45 | 26.32 |
-| Qwen2.5-Math-7B-Instruct | 37.91 | 38.01 |
 | Llama-3.1-8B-Instruct | 10.43 | 10.41 |
 | Llama-3.1-70B-Instruct | 21.37 | 22.12 |
 | Llama-3.3-70B-Instruct | 27.36 | 27.23 |

 # load the model and tokenizer
 model = AutoModelForCausalLM.from_pretrained(
     'jnanliu/LiveMath-Judge',
+    device_map='auto',
     torch_dtype=torch.bfloat16,
 )
 tokenizer = AutoTokenizer.from_pretrained(
 conversations = [
   {'role': 'user', 'content': prompt.format(question=question, gold_answer=golden_answer, answer=generated_answer)}
 ]
+inputs = tokenizer.apply_chat_template(conversations, return_tensors='pt')
 # do inference
 pred = model.generate(
+    input_ids=inputs['input_ids'].to(model.device),
+    attention_mask=inputs['attention_mask'].to(model.device),
     num_return_sequences=1,
 )[0].cpu().tolist()
 response = tokenizer.decode(pred, skip_special_tokens=True)
 Following is the mG-Pass@16 results of Qwen2.5-72B-Instruct-as-Judge and LiveMath-Judge-as-Judge of 10 models.
 |model| Qwen2.5-72B-Instruct | LiveMath-Judge |
 | -- | -- | -- |
+| Qwen2.5-7B-Instruct | 26.45 | 28.17 |
+| Qwen2.5-Math-7B-Instruct | 37.91 | 39.54 |
 | Llama-3.1-8B-Instruct | 10.43 | 10.41 |
 | Llama-3.1-70B-Instruct | 21.37 | 22.12 |
 | Llama-3.3-70B-Instruct | 27.36 | 27.23 |