Inconsistent aihwkit's model inference output when given the same input #504

dongwhfdyer · 2023-05-07T07:54:34Z

dongwhfdyer
May 7, 2023

Description

When given the same input, aihwkit model in eval mode gives almost random output.

I was trying to transplant a vanilla transformer to aihwkit framework from torch framework. I was in the process of transplanting the transformer's encoder block. Actually, I succeeded in some sense. The L2 norm between the torch's model's output and aihwkit one is less than 0.5% which is shown in the following image.

it was done under the condition of not loading any pretrained weights for torch's model. But When I load the torch model with its pretrained weights. Then I did the same things as I did before, like, copying the torch.model's weights to corresponding aihwkit one, doing the inference on the same input. Unfortunately, the output of aihwkit's model becomes totally erratic.

The code is too long. I have selected the important parts as follows. I have highlighted the part that's leading to the inconsistency.
Everytime I run this code, the torch's model's output remains exactly the same. But for the aihwkit, the output is unpredictable as long as I load the weights for the torch model from the beginning.
As you can see, I have set the a_en.val() flag. And After copying the weights, I have inspected the actual weights in aihwkit's model. It is almost the same as in the torch's one though presented slightly different. Weirdly, its output is not consistent.
I am sooo confusing. I have been working this problem for a long time.
If there is any help, I would be really grateful.

# transformer.encoder written in torch
en = Encoder(vocab_size=vocab_size,
             positional_encoding=positional_encoding,
             d_model=512,
             n_heads=8,
             d_queries=64,
             d_values=64,
             d_inner=d_inner,
             n_layers=n_layers,
             dropout=dropout).to(device)

# transformer.encoder written in aihwkit
a_en = a_Encoder(vocab_size=vocab_size,
                 positional_encoding=positional_encoding,
                 d_model=512,
                 n_heads=8,
                 d_queries=64,
                 d_values=64,
                 d_inner=d_inner,
                 n_layers=n_layers,
                 dropout=dropout).to(device)

# load weights from checkpoint for torch's transformer.encoder 
# !!!! This loading process makes the inconsistency in the output of torch's transformer.encoder and aihwkit's transformer.encoder
# if i didn't load the weights, it will work.
#---------kkuhn-block------------------------------ # !!!!!!!!! 
checkpoint = torch.load("averaged_transformer_checkpoint.pth.tar", map_location=device)
en.load_state_dict(checkpoint['model'].encoder.state_dict())
#---------kkuhn-block------------------------------


# transfer weights from torch's transformer.encoder to aihwkit's transformer.encoder
a_en.copy_weights(en)

# set torch's transformer.encoder and aihwkit's transformer.encoder to eval mode
en.eval()
a_en.eval()

# inference
input = (torch.tensor([[4265, 4065, 3786, 4643, 3811, 19516, 3942, 4065, 3786, 20521,
                        3811, 17399]], device='cuda:0'), torch.tensor([12], device='cuda:0'))

output = en(*input)
a_output = a_en(*input)

# compare the output of torch's transformer.encoder and aihwkit's transformer.encoder
output_norm = torch.norm(output, p='fro', dim=None, keepdim=False, out=None, dtype=None).numpy(force=True)
a_error = torch.norm(a_output - output, p='fro', dim=None, keepdim=False, out=None, dtype=None).numpy(force=True)
a_error_ave = a_error / output_norm
print("a_error_ave: ", a_error_ave)


def copy_weights(self, encoder):
    encoder_state = encoder.state_dict()
    self.embedding = encoder.embedding
    self.positional_encoding = encoder.positional_encoding
    self.layer_norm.weight = nn.Parameter(encoder_state['layer_norm.weight'])
    self.layer_norm.bias = nn.Parameter(encoder_state['layer_norm.bias'])
    for layer_ind in range(self.n_layers):
        self.encoder_layers[layer_ind][0].set_weights(encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_queries.weight'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_queries.bias'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_keys_values.weight'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_keys_values.bias'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_output.weight'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_output.bias'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.0.layer_norm.weight'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.0.layer_norm.bias'])

        self.encoder_layers[layer_ind][1].set_weights(encoder_state['encoder_layers.' + str(layer_ind) + '.1.fc1.weight'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.1.fc1.bias'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.1.fc2.weight'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.1.fc2.bias'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.1.layer_norm.weight'],
                                                      encoder_state['encoder_layers.' + str(layer_ind) + '.1.layer_norm.bias'])

How to reproduce

If it's not enough information to see the real problem.
I can send code if you are willing to help me. Thank you!!

Expected behavior

Consistent output.

Other information

Pytorch version: 2.0.0 torchvision 0.15.1
Package version: aihwkit 0.7.1
OS: centos: 7.9
Python version: 3.9.16
Conda version (or N/A): 23.3.1

Answered by maljoras

May 7, 2023

Hi @dongwhfdyer, many thanks for raising an issue.

The AIHWKIT is a toolkit for simulating analog "noisy" in-memory computing hardware. The to be simulated hardware can be configured using the RPUConfig. Be default, a repeated forward pass, that is evaluation in the "eval mode" on this analog hardware will be noisy and non-ideal. Thus, it is entirely expected that the output is different from run to run, even in the "eval" mode. You can change the noisiness of the forward pass by adjusting the rpu_config.forward attributes which are of class IOParameters (see here). In particular if you set

rpu_config.forward.is_perfect = True

you turn off all noise and non-idealities, so that you…

View full answer

maljoras · 2023-05-07T16:37:41Z

maljoras
May 7, 2023
Maintainer

Hi @dongwhfdyer, many thanks for raising an issue.

The AIHWKIT is a toolkit for simulating analog "noisy" in-memory computing hardware. The to be simulated hardware can be configured using the RPUConfig. Be default, a repeated forward pass, that is evaluation in the "eval mode" on this analog hardware will be noisy and non-ideal. Thus, it is entirely expected that the output is different from run to run, even in the "eval" mode. You can change the noisiness of the forward pass by adjusting the rpu_config.forward attributes which are of class IOParameters (see here). In particular if you set

rpu_config.forward.is_perfect = True

you turn off all noise and non-idealities, so that you should see the identical output.

It is also important how to map the transformer weights onto memristive elements, that is how one sets the output scales to match the conductance range. These parameters are controlled by the rpu_config.mapping attributes. You can take a look at the settings we used for the transformer example here

0 replies

maljoras · 2023-05-07T16:40:24Z

maljoras
May 7, 2023
Maintainer

Also note that there is a convert_to_analog function which automatically copies all layer weights to an analog model. No need for the copy_weights function. See e.g. example 17

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent aihwkit's model inference output when given the same input #504

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Inconsistent aihwkit's model inference output when given the same input #504

dongwhfdyer May 7, 2023

Description

How to reproduce

Expected behavior

Other information

Replies: 2 comments

maljoras May 7, 2023 Maintainer

maljoras May 7, 2023 Maintainer

dongwhfdyer
May 7, 2023

maljoras
May 7, 2023
Maintainer

maljoras
May 7, 2023
Maintainer