How are weight updates for Analog training simulated? #431

dingandrew · 2022-04-11T21:51:45Z

dingandrew
Apr 11, 2022

My understanding is that aihwkit has implemented 3 optimizers for weight updates in analog training: plain SGD, mixed precision, and tiki-taka. In the papers describing these techniques, additional hardware units such as the stochastic translator and high precision digital unit are introduced. My question is can we modify these analog weight update algorithms and their hardware, for instance changing the stochastic bit stream length, delta W min, or STR gain factor?

Best Regards

maljoras · 2022-04-12T12:56:01Z

maljoras
Apr 12, 2022
Maintainer

Hi @dingandrew ,
indeed we have these 3 general types, but also a plethora of variations of these, because parameter settings are very flexible. In fact, we offer many more settings and adjustments than described in the respective papers. For instance, For Tiki-taka (where we have implemented version 1 and 2), the transfer from matrix A to C is fully flexible, e.g. defining the properties of the read and write, or one could even use more than 2 matrices (like A, C, and D).

Bit stream lengths can be all adjusted separately for the write on A as well as write on C (look for rpu_config.device.transfer_update.desired_BL parameter in case of the TikiTaka presets.). You can also set e.g. the transfer to perfect FP (rpu_config.device.transfer_update.is_perfect = True). Similarly, each crossbar array (A and C) are fully flexible. You can change their individual device properties, e.g. dw_min (rpu_config.device.unit_cell_devices[0].dw_min = 0.1 for changing the gradient write on A).

For mixed precision, one can e.g. enable stochastic rounding and set the desired bit resolution of the gradient, transfer only one row / column at a time, etc.

I am not quite sure what you mean by "STR gain factor" though.

1 reply

nano-matte Jan 18, 2023

Related: the default weight programming (all these three variants as far as I can tell) use same amplitude pulse trains to program the RPUs, which of course have the benefit of working in constant time complexity. However, in the literature many memristors are potentiated/depressed by pulse trains of varying amplitude or duration to acquire more linear behavior. Is there already any way to emulate such a way to program the weights, or are you considering these methods too far from being viable approaches?

dingandrew · 2022-04-13T01:32:50Z

dingandrew
Apr 13, 2022
Author

Thank You for your response!

From the Gokem et al. paper:

Although, frankly I am not sure what this parameter does.

0 replies

maljoras · 2022-04-13T13:47:42Z

maljoras
Apr 13, 2022
Maintainer

Oh, I see, the STR gain is essentially just the learning rate. Yes, the implementation is very close to that paper. We indeed implemented the stochastic pulse trains so that we actually draw pulses for a given BL. There are also improvements in the code that dynamically vary the BL according to the needs (see update_bl_management).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are weight updates for Analog training simulated? #431

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How are weight updates for Analog training simulated? #431

dingandrew Apr 11, 2022

Replies: 3 comments · 1 reply

maljoras Apr 12, 2022 Maintainer

nano-matte Jan 18, 2023

dingandrew Apr 13, 2022 Author

maljoras Apr 13, 2022 Maintainer

dingandrew
Apr 11, 2022

Replies: 3 comments 1 reply

maljoras
Apr 12, 2022
Maintainer

dingandrew
Apr 13, 2022
Author

maljoras
Apr 13, 2022
Maintainer