Update docs

* Documentation now mentions PettingZoo instead of Gymnasium. * Updated examples to follow the PettingZoo API (e.g., `obs, _ = env.reset()`). * Removed obsolete parts of documentation. * Fixed a few typos. * Improved appearance of some paragraphs.
ethicsai · Jul 18, 2024 · bb52e75 · bb52e75
1 parent 16070b2
commit bb52e75
Show file tree

Hide file tree

Showing 13 changed files with 121 additions and 163 deletions.
diff --git a/docs/source/adding_model.rst b/docs/source/adding_model.rst
@@ -2,7 +2,7 @@ Adding a new model
 ==================
 
 One of the principal goals of this simulator is to be able to compare various
-learning algorithms (similarly to Gymnasium's environments).
+learning algorithms (similarly to PettingZoo's environments).
 This page describes how to implement another learning algorithm (i.e., *model*).
 
 Models interact with the :py:class:`SmartGrid <smartgrid.environment.SmartGrid>`
@@ -13,7 +13,7 @@ through the *interaction loop*:
     from smartgrid import make_basic_smartgrid
 
     env = make_basic_smartgrid()
-    obs = env.reset()
+    obs, _ = env.reset()
     max_step = 10  # Can also be 10_000, ...
     for step in range(max_step)
         actions = model.forward(obs)  # Add your model here!
@@ -49,38 +49,35 @@ used for different agents.
             self.env = env
 
         def forward(obs):
-            # `obs` is a dict containing:
-            # - `global`: an instance of GlobalObservation;
-            # - `local`: a list of instances of LocalObservations, one per agent.
-            # To reconstruct the observations per agent, a for loop can be used:
-            obs_per_agent = [
-                np.concatenate((
-                    obs['local'][i],
-                    obs['global'],
-                ))
-                for i in range(self.env.n_agent)
-            ]
-            # Then, each element of `obs_per_agent` can be used for the specific agent.
-            # Here, we simply use random.
-            agent_actions = []
-            for i in range(self.env.n_agent):
-                # We need the number of dimensions of the action. It should be 6, but
-                # it's better to avoid hard-coding it.
-                agent_action_space = self.env.action_space[i]
+            # `obs` is a dict mapping each agent name to its observations.
+            # Agent observations are namedtuples that can be printed for
+            # easier human readability and debugging, or transformed to
+            # numpy arrays (with `np.asarray`) for easier handling by Neural
+            # Networks.
+
+            # The env expects a dict mapping each agent name to its desired action.
+            # Here, we simply create a random action for each agent, with Numpy.
+            agent_actions = {}
+            for agent_name in self.env.agents:
+                # `obs[agent_name]` are the agent's observations
+                # We need the action's number of dimensions. It should be 6,
+                # but the SmartGrid can be extended and so it's better to avoid
+                # hard-coding it.
+                agent_action_space = self.env.action_space(agent_name)
                 agent_action_nb_dimensions = agent_action_space.shape[0]
                 action = np.random.random(agent_action_nb_dimensions)
                 # `action` is a ndarray of 6 values in [0,1].
-                # Most learning algorithms will handle values in [0,1], but the
-                # SmartGrid env actually expects actions in a different space,
-                # depending on the agent's profile. We can use `interpolate`
-                # to make the transformation.
+                # Most learning algorithms will handle values in [0, 1], but the
+                # SmartGrid env may expect actions in a different space, depending
+                # on the agent's profile. We can use `interpolate` to transform.
                 action = interpolate(
                     value=action,
                     old_bounds=[(0,1)] * agent_action_nb_dimensions,
                     new_bounds=list(zip(agent_action_space.low, agent_action_space.high))
                 )
-                agent_actions.append(action)
-            # At this point, `agent_actions` is a list of actions (ndarrays), one
+                agent_actions[agent_name] = action
+
+            # At this point, `agent_actions` is a dict of actions (ndarrays), one
             # element for each agent.
             return agent_actions
 
@@ -104,15 +101,15 @@ but we will illustrate the ``backward`` method anyway:
     # (...) code from previous section
 
     def backward(self, new_obs, rewards):
-        # `new_obs` has the same shape as `obs` in `forward`: `global` and `local`.
-        new_obs_per_agent = [
-            np.concatenate((
-                new_obs['local'][i],
-                new_obs['global'],
-            ))
-            for i in range(self.env.n_agent)
-        ]
-        # `rewards` will be usually a list of scalar values, one per agent
+        for agent_name in self.env.agents:
+            # `new_obs` is a dict of observations, one element for each agent.
+            agent_obs = new_obs[agent_name]
+            # `rewards` is also a dict; each element can be:
+            # - a scalar (single value) if the SmartGrid env has a single reward
+            #   function (single-objective);
+            # - a dict mapping reward names to their values, if the env has
+            #   multiple reward functions (multi-objective).
+            agent_reward = rewards[agent_name]
 
 .. warning::
     If you do not use a :py:class:`~smartgrid.wrappers.reward_aggregator.RewardAggregator`

diff --git a/docs/source/argumentation.rst b/docs/source/argumentation.rst
@@ -21,7 +21,7 @@ You can use argumentation:
 Using the existing argumentation reward functions
 -------------------------------------------------
 
-You can import these reward functions from the:py:mod:`smartgrid.reward.argumentation`
+You can import these reward functions from the :py:mod:`smartgrid.rewards.argumentation`
 package; accessing this packages *requires* the `AJAR`_ library, which you can
 install with ``pip install git+https://github.com/ethicsai/ajar.git@v1.0.0``.
 Trying to import anything from this package without having `AJAR`_ will raise

diff --git a/docs/source/custom_scenario.rst b/docs/source/custom_scenario.rst
@@ -64,45 +64,15 @@ profiles to instantiate :py:class:`~smartgrid.agents.agent.Agent`\ s.
 .. note::
    If the package was installed through ``pip`` instead of cloning the repository,
    accessing the files through a relative path will not work. Instead, the files
-   must be accessed from the installed package itself. In this case, the
-   :py:mod:`importlib.resources` module can be used.
-
-To access files from an installed package:
-
-.. code-block:: Python
-
-    converter = DataOpenEIConversion()
-
-    # Before Python 3.9:
-    from importlib_resources import path
-    # `path` returns a context manager that must be used in a `with`.
-    # The first argument is the path of the dataset, using `.` instead of `/`.
-    # The `data/` folder is moved within the `smartgrid` package when installing.
-    # The second argument is the name of the requested file, within the dataset.
-    with path('smartgrid.data.openei', 'profile_office_annually.npz') as f:
-        converter.load(
-            'Office',
-            f,
-            comfort.neutral_comfort_profile
-        )
-
-    # Since Python3.9:
-    from importlib_resources import files, as_file
-    # `as_file` returns a context manager that must be used in a `with`.
-    # You may use the `smartgrid` module directly as an argument, or `'smartgrid'`
-    # (i.e., a string).
-    with as_file(files(smartgrid).joinpath('data/openei/profile_office_annually.npz')) as f:
-        converter.load(
-            'Office',
-            f,
-            comfort.neutral_comfort_profile
-        )
+   must be accessed from the installed package itself.
 
 To simplify getting the path to data files, the :py:func:`~smartgrid.make_env.find_profile_data`
 function may be used, although it has some limitations. In particular, it
 only works with a single level of nesting (e.g., ``data/dataset/sub-dataset/file``
-will not work), and it relies on the :py:func:`importlib.resources.path` function,
-which is deprecated since Python3.11 (but still usable, for now).
+will not work). Yet, this function will work whether you have cloned the
+repository (as long as the current working directory is at the project root),
+or installed as a package; it is the recommended way to specify which data file
+to use.
 
 .. code-block:: Python
 
@@ -313,14 +283,14 @@ can be used instead. To use *multi-objective* learning algorithms, which
 receive several rewards each step, simply avoid wrapping the base environment.
 
 When the environment is wrapped, the base environment can be obtained through
-the :py:obj:`~gymnasium.Wrapper.unwrapped` property. Gymnasium
-wrappers should allow access to any (public) attribute automatically:
+the :py:obj:`~gymnasium.Wrapper.unwrapped` property. The wrapper allows access
+to any public attribute of the environment automatically:
 
 .. code-block:: Python
 
    smartgrid = env.unwrapped
-   n_agent = env.n_agent  # Note that `n_agent` is not defined in the wrapper!
-   assert n_agent == smartgrid.n_agent
+   num_agents = env.num_agents  # Note that `num_agents` is not defined in the wrapper!
+   assert num_agents == smartgrid.num_agents
 
 The interaction loop
 ^^^^^^^^^^^^^^^^^^^^
@@ -333,13 +303,13 @@ can be used:
 .. code-block:: Python
 
     done = False
-    obs_n = env.reset()
+    obs_n, _ = env.reset()
     while not done:
         # Implement your decision algorithm here
-        actions = [
-            agent.profile.action_space.sample()
-            for agent in env.agents
-        ]
+        actions = {
+            agent_name: env.action_space(agent_name).sample()
+            for agent_name in env.agents
+        }
         obs_n, rewards_n, terminated_n, truncated_n, info_n = env.step(actions)
         done = all(terminated_n) or all(truncated_n)
     env.close()
@@ -349,13 +319,13 @@ Otherwise, the env termination must be handled by the interaction loop itself:
 .. code-block:: Python
 
     max_step = 50
-    obs_n = env.reset()
+    obs_n, _ = env.reset()
     for _ in range(max_step):
         # Implement your decision algorithm here
-        actions = [
-            agent.profile.action_space.sample()
-            for agent in env.agents
-        ]
+       actions = {
+            agent_name: env.action_space(agent_name).sample()
+            for agent_name in env.agents
+        }
         # Note that we do not need the `terminated` nor `truncated` values here.
         obs_n, rewards_n, _, _, info_n = env.step(actions)
     env.close()

diff --git a/docs/source/extending/observations.rst b/docs/source/extending/observations.rst
@@ -22,7 +22,7 @@ GlobalObservation
 -----------------
 
 Creating a completely new way to compute observations is easy: simply define
-a new class (ideally a :py:func:`collections.namedtuple`), and implement its
+a new :py:func:`dataclasses.dataclass`, and implement its
 :py:meth:`~.GlobalObservation.compute` class method (not instance method!), as
 well as :py:meth:`~.GlobalObservation.reset`.
 
@@ -31,11 +31,14 @@ For example, let us create a global observation class that only contains the
 
 .. code-block:: Python
 
-    from collections import namedtuple
+    import dataclasses
+    from smartgrid.observation.base_observation import BaseObservation
 
-    fields = ['hour']
+    @dataclasses.dataclass(frozen=True)
+    class OnlyHourGlobalObservation(BaseObservation):
 
-    class OnlyHourGlobalObservation(namedtuple('OnlyHourGlobalObservation', fields)):
+        # Dataclass require defining their attributes, which helps readability.
+        hour: float
 
         @classmethod
         def compute(cls, world):
@@ -46,38 +49,38 @@ For example, let us create a global observation class that only contains the
         def reset(cls):
             pass
 
-It is a little bit trickier to retain the existing fields of the global
-observation, because of the way Python handles namedtuples. For another example,
-let us create new *global* observations that include the current day in addition
-to the existing fields.
+The existing global observation fields can also be retained, by extending the
+:py:class:`~smartgrid.observation.global_observation.GlobalObservation` dataclass.
+For another example, let us create new *global* observations that include the
+current day in addition to the existing fields.
 
 .. code-block:: Python
 
-    from smartgrid.observation import GlobalObservation
-    from collections import namedtuple
+    import dataclasses
+    from smartgrid.observation.base_observation import GlobalObservation
 
-    # `GlobalObservation._fields` is a tuple, we cannot concatenate a list to it.
-    fields = ('day',) + GlobalObservation._fields
+    @dataclasses.dataclass(frozen=True)
+    class GlobalObservationAndDay(GlobalObservation):
 
-    class GlobalObservationAndDay(namedtuple('GlobalObservationAndDay', fields)):
+        # Dataclass require defining their attributes, which helps readability.
+        # These attributes are added to the ones defined in parent classes.
+        day: float
 
         @classmethod
         def compute(cls, world):
             obs = GlobalObservation.compute(world)
-            # `obs` is an instance (tuple) of GlobalObservation that contains
-            # all the other fields we want.
+            # `obs` is an instance of GlobalObservation containing all other fields.
             # We need to compute `day` now.
             day = world.current_step // 24
             # Now, we need to combine `day` with the other fields. To avoid
             # potential errors in the order of arguments, we will use keyworded
             # arguments (transforming `obs` into a dict and using the `**` operator).
-            existing_fields = obs._asdict()
+            existing_fields = obs.asdict()
             return cls(day=day, **existing_fields)
 
         @classmethod
         def reset(cls):
-            GlobalObservation.reset()
-
+            super.reset()
 
 LocalObservation
 ----------------
@@ -88,12 +91,14 @@ difference between the agents' comfort and the average of others' comfort.
 
 .. code-block:: Python
 
-    from collections import namedtuple
-    import numpy as np
+    import dataclasses
+    from smartgrid.observation.base_observation import BaseObservation
 
-    fields = ['comfort_diff']
+    @dataclasses.dataclass(frozen=True)
+    class ComfortDiffLocalObservation(BaseObservation):
 
-    class ComfortDiffLocalObservation(namedtuple('ComfortDiffLocalObservation', fields)):
+        # Dataclass require defining their attributes, which helps readability.
+        comfort_diff: float
 
         @classmethod
         def compute(cls, world, agent):
@@ -109,6 +114,9 @@ difference between the agents' comfort and the average of others' comfort.
             # But it is provided, to allow for more complex local observations.
             pass
 
+Similarly to global observations, existing fields can be retained by inheriting
+from :py:class:`~smartgrid.observation.local_observation.LocalObservation`
+rather than :py:class:`~smartgrid.observation.base_observation.BaseObservation`.
 
 ObservationManager
 ------------------
@@ -133,7 +141,7 @@ For example, assuming that we want to use our ``GlobalObservationAndDay``:
         global_observation=GlobalObservationAndDay
     )
 
-Both *global* and *local* observations can be overriden at the same time, by
+Both *global* and *local* observations can be overridden at the same time, by
 specifying both arguments:
 
 .. code-block:: Python

diff --git a/docs/source/extending/rewards.rst b/docs/source/extending/rewards.rst
@@ -57,7 +57,7 @@ to gain money by rewarding the difference with the previous step.
             super().__init__()
             self.previous_payoffs = {}
 
-        def calculate(self, world, agents):
+        def calculate(self, world, agent):
             # Get (or use default) the payoff at the last step.
             previous_payoff = self.previous_payoffs.get(agent)
             if previous_payoff is None:

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -2,8 +2,9 @@ Documentation of |project_name|
 ===============================
 
 This project aims to provide a (simplified) multi-agent simulator of a
-**Smart Grid**, using the `Gymnasium <https://gymnasium.farama.org/>`_
-(formerly OpenAI Gym) framework.
+**Smart Grid**, using the `PettingZoo <https://pettingzoo.farama.org/>`_
+(a multi-agent equivalent to `Gymnasium <https://gymnasium.farama.org/>`_)
+framework.
 
 This simulator has a strong focus on **ethical considerations**: in this
 environment, the learning agents must decide how to consume and distribute