TIR实验结果异常 #26

wangzhihao-coder · 2024-10-09T02:55:56Z

我参考TIR的prompt在qwen2.5-Math的1.5B和7B模型上进行了实验，得到的指标结果比COT差，我怀疑我的实现缺少了一些步骤，能说明下更详细的实现方式嘛？
我参考下面的prompt实现了TIR

# TIR
messages = [
    {"role": "system", "content": "Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]

The text was updated successfully, but these errors were encountered:

rangmiao · 2024-10-09T08:46:29Z

我也遇到了同样的问题

hav4ik · 2024-10-20T18:51:19Z

Hi, same for me, for both 1.5B and 7B. My numbers so far:

Model	MATH CoT	MATH CoT (maj@8)	MATH TIR
Qwen2.5-1.5B-Instruct	0.7646	N/A	0.7534
Qwen2.5-7B-Instruct	0.8408	0.8804	0.8088

The CoT numbers are a bit higher than reported because I used MARIO_EVAL to compare results with ground truth. I'm using vLLM for inference, with top_p=1.0 and temperature=0.

Failure cases

I checked a few failure cases and they seemed strange to me. For example, problem 50 in MATH-test benchmark. The model used a function sympy.complete_square, which clearly does not exist:

<|im_start|>system
Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
Find the radius of the circle with equation $x^2 + 8x + y^2 - 6y = 0$.<|im_end|>
<|im_start|>assistant
To find the radius of the circle given by the equation \(x^2 + 8x + y^2 - 6y = 0\), we need to rewrite the equation in the standard form of a circle's equation \((x - h)^2 + (y - k)^2 = r^2\). This process is called completing the square.

Let's break it down step-by-step:

1. Start with the given equation:
   \[
   x^2 + 8x + y^2 - 6y = 0
   \]

2. Complete the square for the \(x\) terms. The coefficient of \(x\) is 8, so we take half of 8 (which is 4) and square it (which is 16). Add and subtract 16 inside the equation:
   \[
   x^2 + 8x + 16 - 16 + y^2 - 6y = 0
   \]
   This can be rewritten as:
   \[
   (x + 4)^2 - 16 + y^2 - 6y = 0
   \]

3. Next, complete the square for the \(y\) terms. The coefficient of \(y\) is -6, so we take half of -6 (which is -3) and square it (which is 9). Add and subtract 9 inside the equation:
   \[
   (x + 4)^2 - 16 + y^2 - 6y + 9 - 9 = 0
   \]
   This can be rewritten as:
   \[
   (x + 4)^2 + (y - 3)^2 - 25 = 0
   \]

4. Finally, move the constant term to the right side of the equation:
   \[
   (x + 4)^2 + (y - 3)^2 = 25
   \]

Now the equation is in the standard form \((x - h)^2 + (y - k)^2 = r^2\), where \((h, k)\) is the center of the circle and \(r\) is the radius. From the equation \((x + 4)^2 + (y - 3)^2 = 25\), we can see that the center of the circle is \((-4, 3)\) and the radius \(r\) is \(\sqrt{25} = 5\).

So, the radius of the circle is \(5\).

Let's confirm this with Python code.
```python
import sympy as sp

# Define the variables
x, y = sp.symbols('x y')

# Define the equation of the circle
equation = x**2 + 8*x + y**2 - 6*y

# Complete the square for x and y terms
x_term = sp.complete_square(x**2 + 8*x, full=True)[0]
y_term = sp.complete_square(y**2 - 6*y, full=True)[0]

# Rewrite the equation
completed_square_eq = x_term + y_term - sp.complete_square(x**2 + 8*x, full=True)[1] - sp.complete_square(y**2 - 6*y, full=True)[1]

# Simplify the equation
completed_square_eq = sp.simplify(completed_square_eq)

# Extract the radius
radius_squared = completed_square_eq.rhs
radius = sp.sqrt(radius_squared)

print(radius)

Does Qwen team has an in-house version of Sympy that contains functions not existing in the main repo?

hengck23 · 2024-10-23T04:53:28Z

Python code could be hallucinated. That explains function that cannot be found?

maybe check this:
https://github.com/QwenLM/Qwen-Agent/blob/a9ef165971d37ac37ec1df9e565ca46a38b0afab/examples/tir_math.py

# We use the following two systems to distinguish between COT mode and TIR mode
TIR_SYSTEM = """Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."""
COT_SYSTEM = """Please reason step by step, and put your final answer within \\boxed{}."""

            'The dependencies for Python Executor support are not installed. '
            'Please install the required dependencies by running: pip install "qwen-agent[python_executor]"') from e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TIR实验结果异常 #26

TIR实验结果异常 #26

wangzhihao-coder commented Oct 9, 2024

rangmiao commented Oct 9, 2024

hav4ik commented Oct 20, 2024 •

edited

Loading

hengck23 commented Oct 23, 2024 •

edited

Loading

TIR实验结果异常 #26

TIR实验结果异常 #26

Comments

wangzhihao-coder commented Oct 9, 2024

rangmiao commented Oct 9, 2024

hav4ik commented Oct 20, 2024 • edited Loading

Failure cases

hengck23 commented Oct 23, 2024 • edited Loading

hav4ik commented Oct 20, 2024 •

edited

Loading

hengck23 commented Oct 23, 2024 •

edited

Loading