-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cart Pole example in documentation #43
Conversation
docs/source/examples.rst
Outdated
Integration with Gymnasium | ||
************************** | ||
|
||
`OpenAI Gymnasium <https://gymnasium.farama.org/>`_, also known as Gym, is a popular toolkit for developing and comparing reinforcement learning algorithms by provides a standardized interface for environments. Gym offers a variety of environments, from simple classic control tasks like balancing a pole, which is described below in detail, to complex games like Atari and MuJoCo. It even supports creating custom environments, making it a versatile tool for all things reinforcement learning research. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "also known" -> "formerly known"
- I think we should talk about environments directly and avoid the term "RL algorithms" to avoid confusion (like reviewers who wondered what the difference is between Reinforced-lib and gymnasium).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed these two points. Thanks
docs/source/examples.rst
Outdated
|
||
The Cart Pole environment is a classic control task in which the goal is to balance a pole on a cart. The environment is described by a 4-dimensional state space, which consists of the cart's position, the cart's velocity, the pole's angle, and the pole's angular velocity. The agent can take one of two actions: push the cart to the left or push the cart to the right. The episode ends when the pole falls below a certain angle or the cart moves outside of the environment's boundaries. The goal is to keep the pole balanced for as long as possible. | ||
|
||
The following example demonstrates how to train a reinforcement learning agent using Reinforced-lib and OpenAI Gym. The agent uses the Deep Q-Network (DQN) algorithm to learn how to balance the pole. The DQN algorithm is implemented in Reinforced-lib and the Cart Pole environment is provided by Gym. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the documentation we have "Deep Q-Learning (DQN)", here it is "Deep Q-Network (DQN)". We have to make up our mind and change the description here or there so as not to be misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the easiest solution (i.e., requiring least explaining) would be to only refer to the "Deep Q-Network (DQN) algorithm".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reading about it today to find out what the correct form is and I found this comparison - I think it best illustrates the difference. Now, however, I don't know whether it is more appropriate to refer to an algorithm or a model in the context of our library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is that important. My solution is to leave it as it already is described in agents description which is Deep Q-learning (DQN). I know it is kind of fishy, but it is all over the internet. I was having a headache about it some time in the past :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, we are explicite explaining the shortcut both here and in the DQN agent class description, so I think it is ok so that we explain to the user what we mean by DQN. Lets not be part of the Mathematical Inquisition
docs/source/examples.rst
Outdated
logger_types=[StdoutLogger, TensorboardLogger] | ||
) | ||
|
||
We than start the training loop, where we iterate over the number of epochs and for each epoch we run the agent in the environment. We start by resetting the environment and sampling the agent's initial action. Than we run the agent int he environment by updating the environment state with the action and sampling the next action. We continue this loop until the environment reaches a terminal state. We log the length of the epoch and continue to the next epoch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Than Then we run the agent int the environment by performing the action in updating the environment state with the action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
than != then
"We then start the training loop"
"Then, we run the agent in the environment"
or
"Next, we run the agent in the environment"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course - I missed this typo and did not correct it in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've corrected these spelling mistakes. Next time I will double check with language tools.
I have also replaced Gym with Gymnasium where possible to further emphasise the OpenAI's transition. |
Editorial changes to the examples section in docs
TODO: