Discrepancy between the Ant experiment in the paper and the code. #11

HesNobi · 2021-07-24T06:30:25Z

Hi,
Thank you very much for the open research and the code. I've implemented your environment for my research (with appropriate references), After running SAC over the CustomeAnt environment, I have realized that the reward return doesn't reorient the agent and the Ant still walks sideways. Also according to the code, I can't find the objective of the mentioned behavior (Figure 5 of the paper).

I am looking forward to your kind reply,

Thanks,

Code with the discrepancy:
The reorientation has not been designed in the following code.

def _step(self, a):
        vel = self.model.data.qvel.flat[0]
        forward_reward = vel
        self.do_simulation(a, self.frame_skip)

        ctrl_cost = .01 * np.square(a).sum()
        contact_cost = 0.5 * 1e-3 * np.sum(
            np.square(np.clip(self.model.data.cfrc_ext, -1, 1)))
        state = self.state_vector()
        flipped = not (state[2] >= 0.2) 
        flipped_rew = -1 if flipped else 0
        reward = forward_reward - ctrl_cost - contact_cost +flipped_rew

        self.timesteps += 1
        done = self.timesteps >= self.max_timesteps

        ob = self._get_obs()
        return ob, reward, done, dict(
            reward_forward=forward_reward,
            reward_ctrl=-ctrl_cost,
            reward_contact=-contact_cost,
            reward_flipped=flipped_rew)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy between the Ant experiment in the paper and the code. #11

Discrepancy between the Ant experiment in the paper and the code. #11

HesNobi commented Jul 24, 2021 •

edited

Loading

Discrepancy between the Ant experiment in the paper and the code. #11

Discrepancy between the Ant experiment in the paper and the code. #11

Comments

HesNobi commented Jul 24, 2021 • edited Loading

HesNobi commented Jul 24, 2021 •

edited

Loading