Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cart Pole example in documentation #43

Merged
merged 2 commits into from
Feb 11, 2024
Merged

Cart Pole example in documentation #43

merged 2 commits into from
Feb 11, 2024

Conversation

Wotaker
Copy link
Collaborator

@Wotaker Wotaker commented Feb 9, 2024

Editorial changes to the examples section in docs

TODO:

  • Delete recomender system example ✅
  • Add Cart Pole example ✅
  • Add MAB example

Integration with Gymnasium
**************************

`OpenAI Gymnasium <https://gymnasium.farama.org/>`_, also known as Gym, is a popular toolkit for developing and comparing reinforcement learning algorithms by provides a standardized interface for environments. Gym offers a variety of environments, from simple classic control tasks like balancing a pole, which is described below in detail, to complex games like Atari and MuJoCo. It even supports creating custom environments, making it a versatile tool for all things reinforcement learning research.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. "also known" -> "formerly known"
  2. I think we should talk about environments directly and avoid the term "RL algorithms" to avoid confusion (like reviewers who wondered what the difference is between Reinforced-lib and gymnasium).

Copy link
Collaborator Author

@Wotaker Wotaker Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed these two points. Thanks


The Cart Pole environment is a classic control task in which the goal is to balance a pole on a cart. The environment is described by a 4-dimensional state space, which consists of the cart's position, the cart's velocity, the pole's angle, and the pole's angular velocity. The agent can take one of two actions: push the cart to the left or push the cart to the right. The episode ends when the pole falls below a certain angle or the cart moves outside of the environment's boundaries. The goal is to keep the pole balanced for as long as possible.

The following example demonstrates how to train a reinforcement learning agent using Reinforced-lib and OpenAI Gym. The agent uses the Deep Q-Network (DQN) algorithm to learn how to balance the pole. The DQN algorithm is implemented in Reinforced-lib and the Cart Pole environment is provided by Gym.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the documentation we have "Deep Q-Learning (DQN)", here it is "Deep Q-Network (DQN)". We have to make up our mind and change the description here or there so as not to be misleading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest solution (i.e., requiring least explaining) would be to only refer to the "Deep Q-Network (DQN) algorithm".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reading about it today to find out what the correct form is and I found this comparison - I think it best illustrates the difference. Now, however, I don't know whether it is more appropriate to refer to an algorithm or a model in the context of our library?

Copy link
Collaborator Author

@Wotaker Wotaker Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is that important. My solution is to leave it as it already is described in agents description which is Deep Q-learning (DQN). I know it is kind of fishy, but it is all over the internet. I was having a headache about it some time in the past :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, we are explicite explaining the shortcut both here and in the DQN agent class description, so I think it is ok so that we explain to the user what we mean by DQN. Lets not be part of the Mathematical Inquisition

logger_types=[StdoutLogger, TensorboardLogger]
)

We than start the training loop, where we iterate over the number of epochs and for each epoch we run the agent in the environment. We start by resetting the environment and sampling the agent's initial action. Than we run the agent int he environment by updating the environment state with the action and sampling the next action. We continue this loop until the environment reaches a terminal state. We log the length of the epoch and continue to the next epoch.
Copy link
Owner

@m-wojnar m-wojnar Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Than Then we run the agent int the environment by performing the action in updating the environment state with the action

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

than != then

"We then start the training loop"

"Then, we run the agent in the environment"
or
"Next, we run the agent in the environment"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course - I missed this typo and did not correct it in the comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've corrected these spelling mistakes. Next time I will double check with language tools.

@Wotaker
Copy link
Collaborator Author

Wotaker commented Feb 11, 2024

I have also replaced Gym with Gymnasium where possible to further emphasise the OpenAI's transition.

@Wotaker Wotaker marked this pull request as ready for review February 11, 2024 11:21
@Wotaker Wotaker merged commit e92d0c6 into main Feb 11, 2024
5 checks passed
@Wotaker Wotaker deleted the doc-gym-ext branch February 11, 2024 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants