Skip to content

Commit

Permalink
notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
LegrandNico committed Oct 13, 2023
1 parent 2e53198 commit 43d9652
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 95 deletions.
107 changes: 51 additions & 56 deletions docs/source/notebooks/0-Theory.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
},
{
"cell_type": "code",
"execution_count": 58,
"execution_count": 2,
"id": "054c77b0-855c-46bf-b861-96d430d90d56",
"metadata": {
"tags": [
Expand Down Expand Up @@ -87,7 +87,7 @@
"id": "44be6e84-78df-4c3a-b577-3d0b732f9ef9",
"metadata": {},
"source": [
"### Adding a drift to the random walk\n",
"#### Adding a drift to the random walk\n",
"\n",
"The Gaussian random walk can be further parametrized by adding a drift over time. This value, often noted $\\rho$, will be added at each time step:\n",
"\n",
Expand All @@ -100,7 +100,7 @@
},
{
"cell_type": "code",
"execution_count": 59,
"execution_count": 3,
"id": "a4cac17f-6ce5-41a3-8960-a781dc95e6b7",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -143,21 +143,21 @@
"tags": []
},
"source": [
"### Autoregressive processes\n",
"#### Autoregressive processes\n",
"\n",
"We can also assume that the generative process follows an [autoregressive model](https://en.wikipedia.org/wiki/Autoregressive_model), in which case the value of the next iteration is weighted by a coefficient and called by an intercept, often note $\\phi$ and $m$ (respectively) in the Matlab toolbox.\n",
"\n",
"\n",
"$$\n",
"x_1^{(k)} \\sim \\mathcal{N}(m + \\phi * x_1^{(k-1)}, \\sigma^2)\n",
"x_1^{(k)} \\sim \\mathcal{N}(x_1^{(k-1)} + \\phi(m - x_1^{(k-1)}), \\sigma^2)\n",
"$$\n",
"\n",
"We repeat the same simulation below using $\\phi = .4$ and $m = 12.0$."
]
},
{
"cell_type": "code",
"execution_count": 64,
"execution_count": 4,
"id": "85c9972d-3e71-4737-9465-9f0f3d54b8ce",
"metadata": {
"editable": true,
Expand Down Expand Up @@ -189,7 +189,7 @@
"x_1 = [0.0]\n",
"for i in range(200):\n",
" x_1.append(\n",
" np.random.normal(phi * (m - x_1[i-1]) + x_1[i-1], scale=1) \n",
" np.random.normal(x_1[i-1] + phi * (m - x_1[i-1]) , scale=1) \n",
" )\n",
"\n",
"plt.figure(figsize=(12, 3))\n",
Expand Down Expand Up @@ -262,7 +262,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 5,
"id": "1da6f085-191c-4b9d-b425-10a0f29d9fec",
"metadata": {
"tags": [
Expand Down Expand Up @@ -335,7 +335,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"id": "a44fa090-e2b0-4029-969c-595e0083d09a",
"metadata": {
"tags": [
Expand Down Expand Up @@ -400,7 +400,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 7,
"id": "f31781b8-d0f5-4007-befc-439ccdfb83a5",
"metadata": {
"editable": true,
Expand Down Expand Up @@ -464,18 +464,26 @@
"sns.despine()"
]
},
{
"cell_type": "markdown",
"id": "cb7d2736-c868-46e3-90e2-e6bc37f61fe5",
"metadata": {},
"source": [
"## Dynamic beliefs updating\n",
"### The Hierarchical Gaussian Filter in a network of predictive nodes\n",
"The coding examples introduced above illustrated generative models that can simulate data forward from a given volatility structure, with key parameters stochastically fluctuating. Based on these principles, any given latent state in the world can be modelled as having a volatility parent state, a value parent state, both, or none. When the node is an orphan, it evolves as a Gaussian random walk around its previous value with a fixed step size. Consequently, when inferring the evolution of these states, the exact belief update equations (which include the computation of new predictions, posterior values, and prediction errors, and represent an approximate inversion of this generative model, see {cite:p}`2011:mathys` depend on the nature of the coupling of a given state with its parent and children states. In particular, the nodes that implement the belief updates will communicate with their value parents via value prediction errors, or **VAPE**s, and volatility prediction errors, or **VOPE**s, with their volatility parents. Hierarchical Gaussian Filters use this as a model of the environment to make sense of new observations, also referred to as the sensory part of the HGF, or the filtering part. In this situation, new observations are coming in at the network's root and the model updates the belief structure accordingly (from bottom to top nodes). It is therefore straightforward to describe the standard two-level and three-level Hierarchical Gaussian Filters for continuous and binary inputs as the combination of value and volatility couplings (see {ref}`hgf-fig`)"
]
},
{
"cell_type": "markdown",
"id": "1e76a017-b0a0-4d2a-a931-c760b14970df",
"metadata": {},
"source": [
"Based on these principles, any given state in the world can be modelled as having a volatility parent state, a value parent state, both, or none. When the node is an orphan, it evolves as a Gaussian random walk around its previous value with a fixed step size. Consequently, when inferring on the evolution of these states, the exact belief update equations (which include the computation of new predictions, posterior values, and prediction errors, and represent an approximate inversion of this generative model, see {cite:p}`2011:mathys` depend on the nature of the coupling of a given state with its parent and children states. In particular, the nodes that implement the belief updates will communicate with their value parents via value prediction errors, or **VAPE**s, and via volatility prediction errors, or **VOPE**s, with their volatility parents.\n",
"\n",
"```{figure} ../images/hgf.png\n",
"---\n",
"name: hgf-fig\n",
"---\n",
"The two-level and three-level Hierarchical Gaussian Filters for binary or continuous inputs, as described in {cite:p}`2014:mathys,2011:mathys`. The binary HGF has the particularity that it uses a sigmoid transform in the input node to convert continuous values into binary probabilities. For both models, volatility coupling is depicted with dashed lines, and value coupling with straight lines. The three-level HGF has one volatility layer more than the two-level HGF.\n",
"The two-level and three-level Hierarchical Gaussian Filters for binary or continuous inputs, as described in {cite:p}`2014:mathys,2011:mathys`. The binary HGF has the particularity that it uses a sigmoid transform in the input node to convert continuous values into binary probabilities. For both models, volatility coupling is depicted with dashed lines, and value coupling with straight lines. The three-level HGF has one volatility layer more than the two-level HGF, which is used to estimate higher-order uncertainty.\n",
"```"
]
},
Expand All @@ -485,32 +493,23 @@
"metadata": {},
"source": [
"```{hint}\n",
"Hierarchical Gaussian Filters are inspired by other simpler models for Bayesian filtering and reinforcement learning. These models can be seen for example as generalisation of the [Kalman Filter](https://en.wikipedia.org/wiki/Kalman_filter) or the [Rescorla-Wagner model](https://en.wikipedia.org/wiki/Rescorla%E2%80%93Wagner_model). Specifically:\n",
"\n",
"* A one-level HGF for continuous input can be seen as a [Kalman Filter](https://en.wikipedia.org/wiki/Kalman_filter). \n",
"* A two-level binary HGF can be seen as a [Rescorla-Wagner](https://en.wikipedia.org/wiki/Rescorla%E2%80%93Wagner_model) model with an adaptive learning rate that depends on the precision of the belief.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "cb7d2736-c868-46e3-90e2-e6bc37f61fe5",
"metadata": {},
"source": [
"## The propagation of prediction and prediction errors in a network of probabilistic nodes\n",
"\n",
"The coding examples introduced above illustrated generative models that can simulate data forward from a given volatility structure, with key parameters stochastically fluctuating. Hierarchical Gaussian Filters use this as a model of the environment to make sense of new observations, also referred to as the sensory part of the HGF, or the filtering part. In this situation, new observations are coming in at the root of the network and the model updates the belief structure accordingly (from bottom to top nodes)."
]
},
{
"cell_type": "markdown",
"id": "c5ac10e6-e40f-43cd-8c89-17f76e34fb57",
"metadata": {
"tags": []
},
"source": [
"In its first description, {cite:p}2011:mathys derived a set of simple, one-step update equations that represent changes in beliefs about the hidden states (i.e. the sufficient statistics of the nodes) specified in the generative model. The update equations for volatility and value coupling in the generalized hierarchical Gaussian filter have been described in {cite:p}`weber:2023`. For each state, a belief is held (and updated for every new input) by the agent and described as a Gaussian distribution, fully characterized by its mean and its inverse variance, or precision,\n",
"on a given trial (this is the notation we have been using in the previous examples). We conceptualize each belief as a node in a network, where belief updates involve computations within nodes as well as message passing between nodes. \n",
"### The propagation of prediction and prediction errors\n",
"\n",
"The computations triggered by any observation at each time point can be ordered in time as shown in the [belief update algorithm](#belief-update)."
"Having described the model as a specific configuration of predictive nodes offer many advantages, especially in term of modularity for the user. However, the model itself is not limited to the description of the generative process that we covered in the previous examples. The most interesting, and also the more complex, part of the modelling consists of the capability for the network to update the hierarchical structure in a Bayesian optimal way as new observations are presented. These steps are defined by a set of simple, one-step update equations that represent changes in beliefs about the hidden states (i.e. the sufficient statistics of the nodes) specified in the generative model. These equations were first described in {cite:t}`2011:mathys`, and the update equations for volatility and value coupling in the generalized Hierarchical Gaussian filter (on which most of the update functions in [pyhgf](https://github.com/ilabcode/pyhgf) are based) have been described in {cite:p}`weber:2023`. The exact computations in each step especially depend on the nature of the coupling (via {term}`VAPE`s vs. {term}`VOPE`s) between the parent and children nodes. It is beyond the scope of this tutorial to dive into the derivation of these steps and we refer the interested reader to the mentioned papers. Here, we provide a general overview of the dynamic of the update sequence that supports belief updating. The computations triggered by any observation at each time point can be ordered in time as shown in the [belief update algorithm](#belief-update)."
]
},
{
Expand All @@ -521,27 +520,25 @@
"```{note} Belief update\n",
":name: belief-update\n",
"\n",
"For $i$ a {term}`node` in a probabilistic network at time $k$, with children at $i-1$ and parent at $i+1$\n",
"\n",
"1. {term}`Prediction`\n",
" Compute $\\mathrm{prediction}^{(k)}_i$ \n",
" ← receive $\\mathrm{PE}^{(k)}_{i-1}$ from $\\mathrm{node}_{i-1}$\n",
"\n",
"2. Update \n",
" compute $\\mathrm{posterior}^{(k)}_i$ \n",
" **given** $\\mathrm{PE}^{(k)}_{i-1}$ and $\\mathrm{prediction}^{(k)}_i$ \n",
" → send $\\mathrm{posterior}^{(k)}_i$ to $\\mathrm{node}_{i-1}$\n",
"\n",
"3. {term}`prediction error` \n",
" compute $\\mathrm{PE}^{(k)}_i$ \n",
" **given** $\\mathrm{prediction}^{(k)}_i$ and $\\mathrm{posterior}^{(k)}_i$ \n",
" → send $\\mathrm{PE}^{(k)}_i$ to $\\mathrm{node}_{i+1}$ \n",
" ← receive $\\mathrm{posterior}^{(k)}_{i+1}$ from $\\mathrm{node}_{i+1}$ \n",
"\n",
"4. {term}`Prediction` \n",
" compute $\\mathrm{prediction}^{(k+1)}_i$ \n",
" **given** $\\mathrm{posterior}^{(k)}_i$ and $\\mathrm{posterior}^{(k)}_{i+1}$ \n",
"\n",
"Let's consider a simple network containing $x_{node}$ be a {term}`node`, defined at time $k$, with children nodes defined at $x_{children}$ and parent at $x_{parent}$. The standard approach to update this network upon the presentation of a new observation is:\n",
"\n",
"1. {term}`Prediction` step\n",
" For $n$ in the network, starting from the leaves to the roots: \n",
" Given the time elapsed since the last update \n",
" Given the the posterior value of the node $n$ \n",
" Given the prediction from the parent nodes \n",
" - Compute the *expected_mean* and *expected precision*\n",
"\n",
"2. Beliefs propagation step \n",
"For $n$ in the network, starting from the roots to the leaves: \n",
" 1. {term}`Update` step\n",
" Given the prediction errors received from the child nodes \n",
" - Compute the new sufficient statistics for node $n$\n",
" 2. {term}`prediction error`\n",
" Given the new posterior from the update step \n",
" Given the expectation from the prediction step\n",
" - Compute a new prediction error ({term}`VAPE` or {term}`VOPE`\n",
" - Send it to the parent node\n",
"```"
]
},
Expand All @@ -556,10 +553,8 @@
"tags": []
},
"source": [
"The exact computations in each step depend on the nature of the coupling (via {term}`VAPE`s vs. {term}`VOPE`s) between the parent and children nodes.\n",
"\n",
"```{important}\n",
"We have placed the {term}`Prediction` step at the end of the update loop. This is because usually, we think about the beginning of a timepoint trial as starting with receiving a new input, and of a prediction as being present before that input is received (this is especially relevant to model time points as a trial in an experiment). However, in some variants of the HGF the prediction also depends on the time that has passed in between trials, which is something that can only be evaluated once the new input arrives - hence the additional computation of the (current) prediction at the beginning of the trial. Conceptually, it makes most sense to think of the prediction as happening continuously between trials. For implementational purposes, it is however most convenient to only compute the prediction once the new input (and with it its arrival time) enters. This ensures both that the posterior means of parent nodes have had enough time to be sent back to their children for preparation for the new input, and that the arrival time of the new input can be taken into account appropriately.\n",
"We have placed the {term}`Prediction` step at the beginning of the update loop to account that the prediction depends on the time that has passed in between trials, which is something that can only be evaluated once the new input arrives. This is because we usually think about the beginning of a trial/time point as starting with receiving a new input and of a prediction as being present before that input is received. For implementational purposes, it is most convenient to only compute the prediction once the new input (and with it its arrival time) enters. However, it makes most sense to think of the prediction as happening continuously between the time points, but this is not something that is tracked continuously by the model.\n",
"```"
]
},
Expand Down Expand Up @@ -603,27 +598,27 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 8,
"id": "b29ec7fb-2a4b-4d64-ac66-426f3bc9b6d5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Last updated: Wed Aug 09 2023\n",
"Last updated: Fri Oct 13 2023\n",
"\n",
"Python implementation: CPython\n",
"Python version : 3.9.16\n",
"IPython version : 8.14.0\n",
"\n",
"pyhgf : 0.0.4\n",
"jax : 0.4.13\n",
"jaxlib: 0.4.13\n",
"pyhgf : 0.0.9\n",
"jax : 0.4.14\n",
"jaxlib: 0.4.14\n",
"\n",
"seaborn : 0.12.2\n",
"matplotlib: 3.7.1\n",
"numpy : 1.22.0\n",
"seaborn : 0.12.2\n",
"\n",
"Watermark: 2.4.3\n",
"\n"
Expand Down
Loading

0 comments on commit 43d9652

Please sign in to comment.