Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curiosity #53

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

curiosity #53

wants to merge 1 commit into from

Conversation

zhchaoo
Copy link

@zhchaoo zhchaoo commented Jul 8, 2021

experiment curiosity, use icm to add intrinsic rewards.
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros and Trevor Darrell. Curiosity-driven Exploration by Self-supervised Prediction. In ICML 2017.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MrSyee MrSyee requested review from MrSyee and Curt-Park July 9, 2021 05:40
@MrSyee
Copy link
Collaborator

MrSyee commented Jul 9, 2021

Hi, @zhchaoo
It is great job! I think it'll be better if you make a few change.

  1. The graphs of losses for ICM in result part is needed for showing difference between naive DQN and curiosity.
  2. In render part, the result gif video is needed, not an error message.

@Curt-Park What do you think? Curiosity is not rainbow series, so it is different from the concept of this tutorial. But it is also one of algorithms using DQN.

@Curt-Park
Copy link
Owner

@MrSyee I think this can be added as an extra topic.
@zhchaoo Thanks for your contribution. Please check @MrSyee 's comment.

@@ -0,0 +1,723 @@
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the most important part in this tutorial.

Currently, it doesn't look having self-contained information so that people understand it enough by this.

How about adding more information like mathematical formulation? or the difference from the naive DQN?


Reply via ReviewNB

@@ -0,0 +1,723 @@
{
Copy link
Owner

@Curt-Park Curt-Park Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #38.            # Forward prediction: predict next state feature, given current state feature and action (one-hot)

Suggestion (Google Docstyle):

"""Forward prediction.

predict next state feature, given current state feature and action (one-hot).
"""

Reply via ReviewNB

@@ -0,0 +1,723 @@
{
Copy link
Owner

@Curt-Park Curt-Park Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #39.            pred_s_next = F.relu(self.pred_module1( torch.cat([feature_x, a_vec.float()], dim = -1).detach()))

suggestion (Black Style):

pred_s_next = F.relu(
self.pred_module1(torch.cat([feature_x, a_vec.float()], dim =-1).detach())

)

Reply via ReviewNB

@@ -0,0 +1,723 @@
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #59.            self.use_extrinsic = use_extrinsic

Need to add description in the docstring


Reply via ReviewNB

@@ -0,0 +1,723 @@
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #60.            self.intrinsic_scale = intrinsic_scale

Need to add description in the docstring


Reply via ReviewNB

@@ -0,0 +1,723 @@
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #73.            self.icm = ICM(obs_dim, action_dim).to(self.device)

Need to add description in the docstring


Reply via ReviewNB

@@ -0,0 +1,723 @@
{
Copy link
Owner

@Curt-Park Curt-Park Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #201.            a_vec = F.one_hot(action, num_classes = self.env.action_space.n).reshape(-1,self.env.action_space.n) # convert action from int to one-hot format

Line is too long. Suggestion (Black style):

# convert action from int to one-hot format

a_vec = F.one_hot(
action, num_classes=self.env.action_space.n
).reshape(-1, self.env.action_space.n)

Reply via ReviewNB

@@ -0,0 +1,723 @@
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to show the result


Reply via ReviewNB

@@ -0,0 +1,723 @@
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #37.        def pred(self, feature_x, a_vec):

Need to add type annotation:

feature_x: torch.Tensor, a_vec: torch.Tensor

Reply via ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants