Skip to content

Commit

Permalink
build based on c8097c4
Browse files Browse the repository at this point in the history
  • Loading branch information
Documenter.jl committed Oct 7, 2024
1 parent ded74a0 commit 9a56957
Show file tree
Hide file tree
Showing 8 changed files with 12 additions and 12 deletions.
2 changes: 1 addition & 1 deletion previews/PR284/.documenter-siteinfo.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-10-07T09:26:19","documenter_version":"1.7.0"}}
{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-10-07T09:29:42","documenter_version":"1.7.0"}}
2 changes: 1 addition & 1 deletion previews/PR284/algorithmic_differentiation/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@
D f [x] (\dot{x}) &= [(D \mathcal{l} [g(x)]) \circ (D g [x])](\dot{x}) \nonumber \\
&= \langle \bar{y}, D g [x] (\dot{x}) \rangle \nonumber \\
&= \langle D g [x]^\ast (\bar{y}), \dot{x} \rangle, \nonumber
\end{align}\]</p><p>from which we conclude that <span>$D g [x]^\ast (\bar{y})$</span> is the gradient of the composition <span>$l \circ g$</span> at <span>$x$</span>.</p><p>The consequence is that we can always view the computation performed by reverse-mode AD as computing the gradient of the composition of the function in question and an inner product with the argument to the adjoint.</p><p>The above shows that if <span>$\mathcal{Y} = \RR$</span> and <span>$g$</span> is the function we wish to compute the gradient of, we can simply set <span>$\bar{y} = 1$</span> and compute <span>$D g [x]^\ast (\bar{y})$</span> to obtain the gradient of <span>$g$</span> at <span>$x$</span>.</p><h1 id="Summary"><a class="docs-heading-anchor" href="#Summary">Summary</a><a id="Summary-1"></a><a class="docs-heading-anchor-permalink" href="#Summary" title="Permalink"></a></h1><p>This document explains the core mathematical foundations of AD. It explains separately <em>what</em> is does, and <em>how</em> it goes about it. Some basic examples are given which show how these mathematical foundations can be applied to differentiate functions of matrices, and Julia <code>function</code>s.</p><p>Subsequent sections will build on these foundations, to provide a more general explanation of what AD looks like for a Julia programme.</p><h1 id="Asides"><a class="docs-heading-anchor" href="#Asides">Asides</a><a id="Asides-1"></a><a class="docs-heading-anchor-permalink" href="#Asides" title="Permalink"></a></h1><h3 id="*How*-does-Forwards-Mode-AD-work?"><a class="docs-heading-anchor" href="#*How*-does-Forwards-Mode-AD-work?"><em>How</em> does Forwards-Mode AD work?</a><a id="*How*-does-Forwards-Mode-AD-work?-1"></a><a class="docs-heading-anchor-permalink" href="#*How*-does-Forwards-Mode-AD-work?" title="Permalink"></a></h3><p>Forwards-mode AD achieves this by breaking down <span>$f$</span> into the composition <span>$f = f_N \circ \dots \circ f_1$</span>, where each <span>$f_n$</span> is a simple function whose derivative (function) <span>$D f_n [x_n]$</span> we know for any given <span>$x_n$</span>. By the chain rule, we have that</p><p class="math-container">\[D f [x] (\dot{x}) = D f_N [x_N] \circ \dots \circ D f_1 [x_1] (\dot{x})\]</p><p>which suggests the following algorithm:</p><ol><li>let <span>$x_1 = x$</span>, <span>$\dot{x}_1 = \dot{x}$</span>, and <span>$n = 1$</span></li><li>let <span>$\dot{x}_{n+1} = D f_n [x_n] (\dot{x}_n)$</span></li><li>let <span>$x_{n+1} = f(x_n)$</span></li><li>let <span>$n = n + 1$</span></li><li>if <span>$n = N+1$</span> then return <span>$\dot{x}_{N+1}$</span>, otherwise go to 2.</li></ol><p>When each function <span>$f_n$</span> maps between Euclidean spaces, the applications of derivatives <span>$D f_n [x_n] (\dot{x}_n)$</span> are given by <span>$J_n \dot{x}_n$</span> where <span>$J_n$</span> is the Jacobian of <span>$f_n$</span> at <span>$x_n$</span>.</p><div class="citation canonical"><dl><dt>[1]</dt><dd><div id="giles2008extended">M. Giles. <em>An extended collection of matrix derivative results for forward and reverse mode automatic differentiation</em>. Unpublished (2008).</div></dd><dt>[2]</dt><dd><div id="minka2000old">T. P. Minka. <em>Old and new matrix algebra useful for statistics</em>. See www. stat. cmu. edu/minka/papers/matrix. html <strong>4</strong> (2000).</div></dd></dl></div><section class="footnotes is-size-7"><ul><li class="footnote" id="footnote-note_for_geometers"><a class="tag is-link" href="#citeref-note_for_geometers">note_for_geometers</a>in AD we only really need to discuss differentiatiable functions between vector spaces that are isomorphic to Euclidean space. Consequently, a variety of considerations which are usually required in differential geometry are not required here. Notably, the tangent space is assumed to be the same everywhere, and to be the same as the domain of the function. Avoiding these additional considerations helps keep the mathematics as simple as possible.</li></ul></section></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../understanding_intro/">« Introduction</a><a class="docs-footer-nextpage" href="../mathematical_interpretation/">Mooncake.jl&#39;s Rule System »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.7.0 on <span class="colophon-date" title="Monday 7 October 2024 09:26">Monday 7 October 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
\end{align}\]</p><p>from which we conclude that <span>$D g [x]^\ast (\bar{y})$</span> is the gradient of the composition <span>$l \circ g$</span> at <span>$x$</span>.</p><p>The consequence is that we can always view the computation performed by reverse-mode AD as computing the gradient of the composition of the function in question and an inner product with the argument to the adjoint.</p><p>The above shows that if <span>$\mathcal{Y} = \RR$</span> and <span>$g$</span> is the function we wish to compute the gradient of, we can simply set <span>$\bar{y} = 1$</span> and compute <span>$D g [x]^\ast (\bar{y})$</span> to obtain the gradient of <span>$g$</span> at <span>$x$</span>.</p><h1 id="Summary"><a class="docs-heading-anchor" href="#Summary">Summary</a><a id="Summary-1"></a><a class="docs-heading-anchor-permalink" href="#Summary" title="Permalink"></a></h1><p>This document explains the core mathematical foundations of AD. It explains separately <em>what</em> is does, and <em>how</em> it goes about it. Some basic examples are given which show how these mathematical foundations can be applied to differentiate functions of matrices, and Julia <code>function</code>s.</p><p>Subsequent sections will build on these foundations, to provide a more general explanation of what AD looks like for a Julia programme.</p><h1 id="Asides"><a class="docs-heading-anchor" href="#Asides">Asides</a><a id="Asides-1"></a><a class="docs-heading-anchor-permalink" href="#Asides" title="Permalink"></a></h1><h3 id="*How*-does-Forwards-Mode-AD-work?"><a class="docs-heading-anchor" href="#*How*-does-Forwards-Mode-AD-work?"><em>How</em> does Forwards-Mode AD work?</a><a id="*How*-does-Forwards-Mode-AD-work?-1"></a><a class="docs-heading-anchor-permalink" href="#*How*-does-Forwards-Mode-AD-work?" title="Permalink"></a></h3><p>Forwards-mode AD achieves this by breaking down <span>$f$</span> into the composition <span>$f = f_N \circ \dots \circ f_1$</span>, where each <span>$f_n$</span> is a simple function whose derivative (function) <span>$D f_n [x_n]$</span> we know for any given <span>$x_n$</span>. By the chain rule, we have that</p><p class="math-container">\[D f [x] (\dot{x}) = D f_N [x_N] \circ \dots \circ D f_1 [x_1] (\dot{x})\]</p><p>which suggests the following algorithm:</p><ol><li>let <span>$x_1 = x$</span>, <span>$\dot{x}_1 = \dot{x}$</span>, and <span>$n = 1$</span></li><li>let <span>$\dot{x}_{n+1} = D f_n [x_n] (\dot{x}_n)$</span></li><li>let <span>$x_{n+1} = f(x_n)$</span></li><li>let <span>$n = n + 1$</span></li><li>if <span>$n = N+1$</span> then return <span>$\dot{x}_{N+1}$</span>, otherwise go to 2.</li></ol><p>When each function <span>$f_n$</span> maps between Euclidean spaces, the applications of derivatives <span>$D f_n [x_n] (\dot{x}_n)$</span> are given by <span>$J_n \dot{x}_n$</span> where <span>$J_n$</span> is the Jacobian of <span>$f_n$</span> at <span>$x_n$</span>.</p><div class="citation canonical"><dl><dt>[1]</dt><dd><div id="giles2008extended">M. Giles. <em>An extended collection of matrix derivative results for forward and reverse mode automatic differentiation</em>. Unpublished (2008).</div></dd><dt>[2]</dt><dd><div id="minka2000old">T. P. Minka. <em>Old and new matrix algebra useful for statistics</em>. See www. stat. cmu. edu/minka/papers/matrix. html <strong>4</strong> (2000).</div></dd></dl></div><section class="footnotes is-size-7"><ul><li class="footnote" id="footnote-note_for_geometers"><a class="tag is-link" href="#citeref-note_for_geometers">note_for_geometers</a>in AD we only really need to discuss differentiatiable functions between vector spaces that are isomorphic to Euclidean space. Consequently, a variety of considerations which are usually required in differential geometry are not required here. Notably, the tangent space is assumed to be the same everywhere, and to be the same as the domain of the function. Avoiding these additional considerations helps keep the mathematics as simple as possible.</li></ul></section></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../understanding_intro/">« Introduction</a><a class="docs-footer-nextpage" href="../mathematical_interpretation/">Mooncake.jl&#39;s Rule System »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.7.0 on <span class="colophon-date" title="Monday 7 October 2024 09:29">Monday 7 October 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
Loading

0 comments on commit 9a56957

Please sign in to comment.