Skip to content

Commit

Permalink
Merge pull request #578 from LuxDL/ap/docs
Browse files Browse the repository at this point in the history
Fix numbering in the docs
  • Loading branch information
avik-pal authored Apr 9, 2024
2 parents a22199e + 0431aab commit abdbf4b
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 30 deletions.
1 change: 1 addition & 0 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ steps:
cuda: "*"
artifact_paths:
- "tutorial_deps/*"
- "docs/build/**/*"
env:
DATADEPS_ALWAYS_ACCEPT: true
JULIA_DEBUG: "Documenter"
Expand Down
4 changes: 2 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ layout: home
hero:
name: LuxDL Docs
text: Elegant & Performant Deep Learning in JuliaLang
tagline: A Pure Julia Deep Learning Framework putting Correctness and Performance First
text: Elegant & Performant Scientific Machine Learning in JuliaLang
tagline: A Pure Julia Deep Learning Framework designed for Scientific Machine Learning
actions:
- theme: brand
text: Tutorials
Expand Down
38 changes: 19 additions & 19 deletions docs/src/manual/distributed_utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,16 @@ DDP Training using `Lux.DistributedUtils` is a spiritual successor to

## Guide to Integrating DistributedUtils into your code

1. Initialize the respective backend with [`DistributedUtils.initialize`](@ref), by passing
in a backend type. It is important that you pass in the type, i.e. `NCCLBackend` and not
the object `NCCLBackend()`.
* Initialize the respective backend with [`DistributedUtils.initialize`](@ref), by passing
in a backend type. It is important that you pass in the type, i.e. `NCCLBackend` and not
the object `NCCLBackend()`.

```julia
DistributedUtils.initialize(NCCLBackend)
```

2. Obtain the backend via [`DistributedUtils.get_distributed_backend`](@ref) by passing in
the type of the backend (same note as last point applies here again).
* Obtain the backend via [`DistributedUtils.get_distributed_backend`](@ref) by passing in
the type of the backend (same note as last point applies here again).

```julia
backend = DistributedUtils.get_distributed_backend(NCCLBackend)
Expand All @@ -28,36 +28,36 @@ backend = DistributedUtils.get_distributed_backend(NCCLBackend)
It is important that you use this function instead of directly constructing the backend,
since there are certain internal states that need to be synchronized.

3. Next synchronize the parameters and states of the model. This is done by calling
[`DistributedUtils.synchronize!!`](@ref) with the backend and the respective input.
* Next synchronize the parameters and states of the model. This is done by calling
[`DistributedUtils.synchronize!!`](@ref) with the backend and the respective input.

```julia
ps = DistributedUtils.synchronize!!(backend, ps)
st = DistributedUtils.synchronize!!(backend, st)
```

4. To split the data uniformly across the processes use
[`DistributedUtils.DistributedDataContainer`](@ref). Alternatively, one can manually
split the data. For the provided container to work
[`MLUtils.jl`](https://github.com/JuliaML/MLUtils.jl) must be installed and loaded.
* To split the data uniformly across the processes use
[`DistributedUtils.DistributedDataContainer`](@ref). Alternatively, one can manually
split the data. For the provided container to work
[`MLUtils.jl`](https://github.com/JuliaML/MLUtils.jl) must be installed and loaded.

```julia
data = DistributedUtils.DistributedDataContainer(backend, data)
```

5. Wrap the optimizer in [`DistributedUtils.DistributedOptimizer`](@ref) to ensure that the
optimizer is correctly synchronized across all processes before parameter updates. After
initializing the state of the optimizer, synchronize the state across all processes.
* Wrap the optimizer in [`DistributedUtils.DistributedOptimizer`](@ref) to ensure that the
optimizer is correctly synchronized across all processes before parameter updates. After
initializing the state of the optimizer, synchronize the state across all processes.

```julia
opt = DistributedUtils.DistributedOptimizer(backend, opt)
opt_state = Optimisers.setup(opt, ps)
opt_state = DistributedUtils.synchronize!!(backend, opt_state)
```
```

6. Finally change all logging and serialization code to trigger on
`local_rank(backend) == 0`. This ensures that only the master process logs and serializes
the model.
* Finally change all logging and serialization code to trigger on
`local_rank(backend) == 0`. This ensures that only the master process logs and serializes
the model.

## [GPU-Aware MPI](@id gpu-aware-mpi)

Expand Down Expand Up @@ -108,4 +108,4 @@ And that's pretty much it!
1. Currently we don't run tests with CUDA or ROCM aware MPI, use those features at your own
risk. We are working on adding tests for these features.
2. AMDGPU support is mostly experimental and causes deadlocks in certain situations, this is
being investigated. If you have a minimal reproducer for this, please open an issue.
being investigated. If you have a minimal reproducer for this, please open an issue.
18 changes: 9 additions & 9 deletions docs/src/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ layout: page
<script setup>
import { VPTeamPage, VPTeamPageTitle, VPTeamMembers, VPTeamPageSection } from 'vitepress/theme'
const githubSvg = '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 640 512"><path d="M392.8 1.2c-17-4.9-34.7 5-39.6 22l-128 448c-4.9 17 5 34.7 22 39.6s34.7-5 39.6-22l128-448c4.9-17-5-34.7-22-39.6zm80.6 120.1c-12.5 12.5-12.5 32.8 0 45.3L562.7 256l-89.4 89.4c-12.5 12.5-12.5 32.8 0 45.3s32.8 12.5 45.3 0l112-112c12.5-12.5 12.5-32.8 0-45.3l-112-112c-12.5-12.5-32.8-12.5-45.3 0zm-306.7 0c-12.5-12.5-32.8-12.5-45.3 0l-112 112c-12.5 12.5-12.5 32.8 0 45.3l112 112c12.5 12.5 32.8 12.5 45.3 0s12.5-32.8 0-45.3L77.3 256l89.4-89.4c12.5-12.5 12.5-32.8 0-45.3z"/></svg>';
const codeSvg = '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 640 512"><path d="M392.8 1.2c-17-4.9-34.7 5-39.6 22l-128 448c-4.9 17 5 34.7 22 39.6s34.7-5 39.6-22l128-448c4.9-17-5-34.7-22-39.6zm80.6 120.1c-12.5 12.5-12.5 32.8 0 45.3L562.7 256l-89.4 89.4c-12.5 12.5-12.5 32.8 0 45.3s32.8 12.5 45.3 0l112-112c12.5-12.5 12.5-32.8 0-45.3l-112-112c-12.5-12.5-32.8-12.5-45.3 0zm-306.7 0c-12.5-12.5-32.8-12.5-45.3 0l-112 112c-12.5 12.5-12.5 32.8 0 45.3l112 112c12.5 12.5 32.8 12.5 45.3 0s12.5-32.8 0-45.3L77.3 256l89.4-89.4c12.5-12.5 12.5-32.8 0-45.3z"/></svg>';
const beginners = [
{
Expand All @@ -16,7 +16,7 @@ const beginners = [
links: [
{
icon: {
svg: githubSvg,
svg: codeSvg,
},
link: 'beginner/1_Basics' }
]
Expand All @@ -28,7 +28,7 @@ const beginners = [
links: [
{
icon: {
svg: githubSvg,
svg: codeSvg,
},
link: 'beginner/2_PolynomialFitting' }
]
Expand All @@ -40,7 +40,7 @@ const beginners = [
links: [
{
icon: {
svg: githubSvg,
svg: codeSvg,
},
link: 'beginner/3_SimpleRNN' }
]
Expand All @@ -52,7 +52,7 @@ const beginners = [
links: [
{
icon: {
svg: githubSvg,
svg: codeSvg,
},
link: 'beginner/4_SimpleChains' }
]
Expand All @@ -67,7 +67,7 @@ const intermediate = [
links: [
{
icon: {
svg: githubSvg,
svg: codeSvg,
},
link: 'intermediate/1_NeuralODE' }
]
Expand All @@ -79,7 +79,7 @@ const intermediate = [
links: [
{
icon: {
svg: githubSvg,
svg: codeSvg,
},
link: 'intermediate/2_BayesianNN' }
]
Expand All @@ -92,7 +92,7 @@ const intermediate = [
links: [
{
icon: {
svg: githubSvg,
svg: codeSvg,
},
link: 'intermediate/3_HyperNet' }
]
Expand All @@ -107,7 +107,7 @@ const advanced = [
links: [
{
icon: {
svg: githubSvg,
svg: codeSvg,
},
link: 'advanced/1_GravitationalWaveForm' }
]
Expand Down

1 comment on commit abdbf4b

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark suite Current: abdbf4b Previous: a22199e Ratio
Dense(2 => 2)/cpu/reverse/ReverseDiff (compiled)/(2, 128) 3256 ns 3291.125 ns 0.99
Dense(2 => 2)/cpu/reverse/Zygote/(2, 128) 7608.416666666667 ns 7594.333333333333 ns 1.00
Dense(2 => 2)/cpu/reverse/Tracker/(2, 128) 14687 ns 14357 ns 1.02
Dense(2 => 2)/cpu/reverse/ReverseDiff/(2, 128) 9830.4 ns 9856.6 ns 1.00
Dense(2 => 2)/cpu/reverse/Flux/(2, 128) 8746.333333333334 ns 8706 ns 1.00
Dense(2 => 2)/cpu/reverse/SimpleChains/(2, 128) 4170 ns 4151.111111111111 ns 1.00
Dense(2 => 2)/cpu/forward/NamedTuple/(2, 128) 2007.8 ns 2038.8 ns 0.98
Dense(2 => 2)/cpu/forward/ComponentArray/(2, 128) 1660.3197278911564 ns 1652.5448275862068 ns 1.00
Dense(2 => 2)/cpu/forward/Flux/(2, 128) 1839.8048780487804 ns 1800.2586206896551 ns 1.02
Dense(2 => 2)/cpu/forward/SimpleChains/(2, 128) 179.2422969187675 ns 179.61335187760778 ns 1.00
Dense(20 => 20)/cpu/reverse/ReverseDiff (compiled)/(20, 128) 17402 ns 17353 ns 1.00
Dense(20 => 20)/cpu/reverse/Zygote/(20, 128) 18545 ns 18605 ns 1.00
Dense(20 => 20)/cpu/reverse/Tracker/(20, 128) 35837 ns 35346 ns 1.01
Dense(20 => 20)/cpu/reverse/ReverseDiff/(20, 128) 28814 ns 28643 ns 1.01
Dense(20 => 20)/cpu/reverse/Flux/(20, 128) 19641.5 ns 19647 ns 1.00
Dense(20 => 20)/cpu/reverse/SimpleChains/(20, 128) 16220 ns 16050 ns 1.01
Dense(20 => 20)/cpu/forward/NamedTuple/(20, 128) 4826.285714285715 ns 4761.857142857143 ns 1.01
Dense(20 => 20)/cpu/forward/ComponentArray/(20, 128) 4874.857142857143 ns 4787.571428571428 ns 1.02
Dense(20 => 20)/cpu/forward/Flux/(20, 128) 4870.5 ns 4819 ns 1.01
Dense(20 => 20)/cpu/forward/SimpleChains/(20, 128) 1659.1 ns 1663.1 ns 1.00
Conv((3, 3), 3 => 3)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 3, 128) 40919886 ns 47826882 ns 0.86
Conv((3, 3), 3 => 3)/cpu/reverse/Zygote/(64, 64, 3, 128) 105438282.5 ns 107846874 ns 0.98
Conv((3, 3), 3 => 3)/cpu/reverse/Tracker/(64, 64, 3, 128) 82547287 ns 111447729 ns 0.74
Conv((3, 3), 3 => 3)/cpu/reverse/ReverseDiff/(64, 64, 3, 128) 105389107 ns 107598984 ns 0.98
Conv((3, 3), 3 => 3)/cpu/reverse/Flux/(64, 64, 3, 128) 101432513 ns 107543357 ns 0.94
Conv((3, 3), 3 => 3)/cpu/reverse/SimpleChains/(64, 64, 3, 128) 12101555.5 ns 12081138 ns 1.00
Conv((3, 3), 3 => 3)/cpu/forward/NamedTuple/(64, 64, 3, 128) 12139914.5 ns 18396029 ns 0.66
Conv((3, 3), 3 => 3)/cpu/forward/ComponentArray/(64, 64, 3, 128) 18273296.5 ns 18335367 ns 1.00
Conv((3, 3), 3 => 3)/cpu/forward/Flux/(64, 64, 3, 128) 17955192 ns 18282107.5 ns 0.98
Conv((3, 3), 3 => 3)/cpu/forward/SimpleChains/(64, 64, 3, 128) 6406788 ns 6393940.5 ns 1.00
vgg16/cpu/reverse/Zygote/(32, 32, 3, 1) 103984169.5 ns 104037238.5 ns 1.00
vgg16/cpu/reverse/Zygote/(32, 32, 3, 16) 842558211 ns 745853053 ns 1.13
vgg16/cpu/reverse/Zygote/(32, 32, 3, 64) 3036962183 ns 2862501835 ns 1.06
vgg16/cpu/reverse/Tracker/(32, 32, 3, 1) 158105613 ns 160006855 ns 0.99
vgg16/cpu/reverse/Tracker/(32, 32, 3, 16) 1091539870.5 ns 1149640042 ns 0.95
vgg16/cpu/reverse/Tracker/(32, 32, 3, 64) 4156720070 ns 4328457284 ns 0.96
vgg16/cpu/reverse/Flux/(32, 32, 3, 1) 87153400 ns 89766870 ns 0.97
vgg16/cpu/reverse/Flux/(32, 32, 3, 16) 677891576.5 ns 706555892 ns 0.96
vgg16/cpu/reverse/Flux/(32, 32, 3, 64) 3087144996 ns 3162713324 ns 0.98
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 1) 25057694 ns 24988726 ns 1.00
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 16) 235060201.5 ns 248321851.5 ns 0.95
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 64) 850027203 ns 984732186 ns 0.86
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 1) 26539452 ns 26635110 ns 1.00
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 16) 222485100 ns 252240083 ns 0.88
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 64) 843941878 ns 889103407 ns 0.95
vgg16/cpu/forward/Flux/(32, 32, 3, 1) 23351068 ns 30345424 ns 0.77
vgg16/cpu/forward/Flux/(32, 32, 3, 16) 185662518 ns 233111808 ns 0.80
vgg16/cpu/forward/Flux/(32, 32, 3, 64) 816281223 ns 889458306 ns 0.92
Conv((3, 3), 64 => 64)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 64, 128) 1134116904.5 ns 1035434449.5 ns 1.10
Conv((3, 3), 64 => 64)/cpu/reverse/Zygote/(64, 64, 64, 128) 1821706018 ns 1862385198.5 ns 0.98
Conv((3, 3), 64 => 64)/cpu/reverse/Tracker/(64, 64, 64, 128) 2165104540 ns 2246770805 ns 0.96
Conv((3, 3), 64 => 64)/cpu/reverse/ReverseDiff/(64, 64, 64, 128) 2350684531 ns 2397483572.5 ns 0.98
Conv((3, 3), 64 => 64)/cpu/reverse/Flux/(64, 64, 64, 128) 1833091167 ns 1923770490.5 ns 0.95
Conv((3, 3), 64 => 64)/cpu/forward/NamedTuple/(64, 64, 64, 128) 359054624 ns 375137750.5 ns 0.96
Conv((3, 3), 64 => 64)/cpu/forward/ComponentArray/(64, 64, 64, 128) 458776503 ns 392101451.5 ns 1.17
Conv((3, 3), 64 => 64)/cpu/forward/Flux/(64, 64, 64, 128) 353291618 ns 379737329.5 ns 0.93
Conv((3, 3), 1 => 1)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 1, 128) 11907971 ns 11869505 ns 1.00
Conv((3, 3), 1 => 1)/cpu/reverse/Zygote/(64, 64, 1, 128) 18075976.5 ns 17927171 ns 1.01
Conv((3, 3), 1 => 1)/cpu/reverse/Tracker/(64, 64, 1, 128) 19244630 ns 19152475.5 ns 1.00
Conv((3, 3), 1 => 1)/cpu/reverse/ReverseDiff/(64, 64, 1, 128) 23929082 ns 23886574 ns 1.00
Conv((3, 3), 1 => 1)/cpu/reverse/Flux/(64, 64, 1, 128) 18071045 ns 18004082 ns 1.00
Conv((3, 3), 1 => 1)/cpu/reverse/SimpleChains/(64, 64, 1, 128) 1162609 ns 1160001 ns 1.00
Conv((3, 3), 1 => 1)/cpu/forward/NamedTuple/(64, 64, 1, 128) 2078115 ns 2065263 ns 1.01
Conv((3, 3), 1 => 1)/cpu/forward/ComponentArray/(64, 64, 1, 128) 2088017.5 ns 2074119 ns 1.01
Conv((3, 3), 1 => 1)/cpu/forward/Flux/(64, 64, 1, 128) 2077539 ns 2062288 ns 1.01
Conv((3, 3), 1 => 1)/cpu/forward/SimpleChains/(64, 64, 1, 128) 197480 ns 204473 ns 0.97
Dense(200 => 200)/cpu/reverse/ReverseDiff (compiled)/(200, 128) 299121 ns 297618 ns 1.01
Dense(200 => 200)/cpu/reverse/Zygote/(200, 128) 274264 ns 274229.5 ns 1.00
Dense(200 => 200)/cpu/reverse/Tracker/(200, 128) 366778 ns 362660 ns 1.01
Dense(200 => 200)/cpu/reverse/ReverseDiff/(200, 128) 413700.5 ns 410510 ns 1.01
Dense(200 => 200)/cpu/reverse/Flux/(200, 128) 275296 ns 274355 ns 1.00
Dense(200 => 200)/cpu/reverse/SimpleChains/(200, 128) 395864 ns 396274 ns 1.00
Dense(200 => 200)/cpu/forward/NamedTuple/(200, 128) 88767 ns 88927 ns 1.00
Dense(200 => 200)/cpu/forward/ComponentArray/(200, 128) 89553 ns 89829 ns 1.00
Dense(200 => 200)/cpu/forward/Flux/(200, 128) 87284 ns 87194 ns 1.00
Dense(200 => 200)/cpu/forward/SimpleChains/(200, 128) 104536 ns 104937 ns 1.00
Conv((3, 3), 16 => 16)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 16, 128) 197678304 ns 207730596 ns 0.95
Conv((3, 3), 16 => 16)/cpu/reverse/Zygote/(64, 64, 16, 128) 349707282 ns 417547431 ns 0.84
Conv((3, 3), 16 => 16)/cpu/reverse/Tracker/(64, 64, 16, 128) 394767610 ns 446864478.5 ns 0.88
Conv((3, 3), 16 => 16)/cpu/reverse/ReverseDiff/(64, 64, 16, 128) 477350913 ns 476380395 ns 1.00
Conv((3, 3), 16 => 16)/cpu/reverse/Flux/(64, 64, 16, 128) 371954865 ns 412031574 ns 0.90
Conv((3, 3), 16 => 16)/cpu/reverse/SimpleChains/(64, 64, 16, 128) 335078971.5 ns 348975577 ns 0.96
Conv((3, 3), 16 => 16)/cpu/forward/NamedTuple/(64, 64, 16, 128) 53540565 ns 65585813 ns 0.82
Conv((3, 3), 16 => 16)/cpu/forward/ComponentArray/(64, 64, 16, 128) 49765921.5 ns 71096377 ns 0.70
Conv((3, 3), 16 => 16)/cpu/forward/Flux/(64, 64, 16, 128) 49896033.5 ns 71135179 ns 0.70
Conv((3, 3), 16 => 16)/cpu/forward/SimpleChains/(64, 64, 16, 128) 28103680 ns 28355509 ns 0.99
Dense(2000 => 2000)/cpu/reverse/ReverseDiff (compiled)/(2000, 128) 19642944.5 ns 19456862 ns 1.01
Dense(2000 => 2000)/cpu/reverse/Zygote/(2000, 128) 19748348.5 ns 19787020 ns 1.00
Dense(2000 => 2000)/cpu/reverse/Tracker/(2000, 128) 23593738 ns 23607252 ns 1.00
Dense(2000 => 2000)/cpu/reverse/ReverseDiff/(2000, 128) 24233000.5 ns 24257374 ns 1.00
Dense(2000 => 2000)/cpu/reverse/Flux/(2000, 128) 19740162 ns 19779678.5 ns 1.00
Dense(2000 => 2000)/cpu/forward/NamedTuple/(2000, 128) 6615632.5 ns 6627207 ns 1.00
Dense(2000 => 2000)/cpu/forward/ComponentArray/(2000, 128) 6593397.5 ns 6596001 ns 1.00
Dense(2000 => 2000)/cpu/forward/Flux/(2000, 128) 6506655 ns 6574174.5 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.