Skip to content

Commit

Permalink
added disclaimer to architecture cascading pattern
Browse files Browse the repository at this point in the history
  • Loading branch information
PicoCreator authored Sep 19, 2023
1 parent 44e5af1 commit 3ba94cf
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion docs/advance/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@ The follow gif, illustrates the parallel cascading nature over approximately 3 l

Effectively, allowing the RNN network to run like a transformer network, when rolled out side by side. Where it can be trained "like a transformer" and "executed like an RNN" (the best of both worlds)

All of this is achieved by using a combination of token shifting, channel and time mixing, to comptue the next layer / hidden state.
All of this while achieved by using a combination of token shifting, channel and time mixing, to replace LSTM and compute the next layer / hidden state.

> Note the cascading digram is the theorectical optimal, in practise some trainers and/or inference implementation may batch the cascade to chunks of tokens (32/64/128/256/512), to reduce VRAM lookup and usage, and improve overall performance.
## What is channel / time mixing, explain it in simple terms?

Expand Down

0 comments on commit 3ba94cf

Please sign in to comment.