How are weight updates for Analog training simulated? #431
Replies: 3 comments 1 reply
-
Hi @dingandrew , Bit stream lengths can be all adjusted separately for the write on A as well as write on C (look for For mixed precision, one can e.g. enable stochastic rounding and set the desired bit resolution of the gradient, transfer only one row / column at a time, etc. I am not quite sure what you mean by "STR gain factor" though. |
Beta Was this translation helpful? Give feedback.
-
Thank You for your response! Although, frankly I am not sure what this parameter does. |
Beta Was this translation helpful? Give feedback.
-
Oh, I see, the STR gain is essentially just the learning rate. Yes, the implementation is very close to that paper. We indeed implemented the stochastic pulse trains so that we actually draw pulses for a given BL. There are also improvements in the code that dynamically vary the BL according to the needs (see |
Beta Was this translation helpful? Give feedback.
-
Hi @maljoras,
My understanding is that aihwkit has implemented 3 optimizers for weight updates in analog training: plain SGD, mixed precision, and tiki-taka. In the papers describing these techniques, additional hardware units such as the stochastic translator and high precision digital unit are introduced. My question is can we modify these analog weight update algorithms and their hardware, for instance changing the stochastic bit stream length, delta W min, or STR gain factor?
Best Regards
Beta Was this translation helpful? Give feedback.
All reactions