Correctness of the example question in HF space

#1
by benhaotang - opened

I setup this question for testing, the choice of $d=4-\epsilon$ is to see the 2 times of difference that result should display and prevent cheating in code:

For a scalar field theory with interaction Lagrangian $\mathcal{L}_{int} = g\phi^3 + \lambda\phi^4$:
1. Enumerate all possible 1-loop Feynman diagrams contributing to the scalar propagator
2. For each diagram, write down its loop contribution
3. Provide Mathematica code to calculate these loop amplitudes with dimensional regularization at $d=4-\epsilon$
Please explain your reasoning step by step.

After run 10 times running local @ Q6_K

For task one:

  • 100% correct: 2 times
  • at least one if the diagram is correct: 5 times
  • completely wrong: 3 times

seems around 30-40% better than my local Phi4 @ Q6_K, note the result for Claude Sonnet 3.5 is {3,7,0}.

For task 2&3: just have fun.... most of the time even the momentum is wrongly defined, maybe this is too hard for local model:( for claude, most of the time the loop contribution is correct if the first task is correct, but none of the code is runable... it always tries to go too fansy.

OK, I see if I run with pytorch version with the system instruction by sky-T1

SYSTEM """Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process.

Please structure your response into two main sections: Thought and Solution.

In the Thought section, detail your reasoning process using the specified format:

“‘
<|begin_of_thought|>
{thought with steps separated with "\n\n"}
<|end_of_thought|>
”’

Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps.

In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows:
“‘
<|begin_of_solution|>
{final formatted, precise, and clear solution}
<|end_of_solution|>
”’
Now, try to solve the following question through the above guidelines:
"""

now looks like the completely wrong percentage will be lower(tried 5 times all at lease have one correct). But it cannot follow the instructed template very well, so no idea what will happen(e.g. hallucinations) reaching the end of the content length.

Sign up or log in to comment