Inconsistent aihwkit's model inference output when given the same input #504
-
DescriptionWhen given the same input, aihwkit model in I was trying to transplant a vanilla transformer to aihwkit framework from torch framework. I was in the process of transplanting the transformer's encoder block. Actually, I succeeded in some sense. The L2 norm between the torch's model's output and aihwkit one is less than 0.5% which is shown in the following image. it was done under the condition of not loading any pretrained weights for torch's model. But When I load the torch model with its pretrained weights. Then I did the same things as I did before, like, copying the torch.model's weights to corresponding aihwkit one, doing the inference on the same input. Unfortunately, the output of aihwkit's model becomes totally erratic. The code is too long. I have selected the important parts as follows. I have highlighted the part that's leading to the inconsistency. # transformer.encoder written in torch
en = Encoder(vocab_size=vocab_size,
positional_encoding=positional_encoding,
d_model=512,
n_heads=8,
d_queries=64,
d_values=64,
d_inner=d_inner,
n_layers=n_layers,
dropout=dropout).to(device)
# transformer.encoder written in aihwkit
a_en = a_Encoder(vocab_size=vocab_size,
positional_encoding=positional_encoding,
d_model=512,
n_heads=8,
d_queries=64,
d_values=64,
d_inner=d_inner,
n_layers=n_layers,
dropout=dropout).to(device)
# load weights from checkpoint for torch's transformer.encoder
# !!!! This loading process makes the inconsistency in the output of torch's transformer.encoder and aihwkit's transformer.encoder
# if i didn't load the weights, it will work.
#---------kkuhn-block------------------------------ # !!!!!!!!!
checkpoint = torch.load("averaged_transformer_checkpoint.pth.tar", map_location=device)
en.load_state_dict(checkpoint['model'].encoder.state_dict())
#---------kkuhn-block------------------------------
# transfer weights from torch's transformer.encoder to aihwkit's transformer.encoder
a_en.copy_weights(en)
# set torch's transformer.encoder and aihwkit's transformer.encoder to eval mode
en.eval()
a_en.eval()
# inference
input = (torch.tensor([[4265, 4065, 3786, 4643, 3811, 19516, 3942, 4065, 3786, 20521,
3811, 17399]], device='cuda:0'), torch.tensor([12], device='cuda:0'))
output = en(*input)
a_output = a_en(*input)
# compare the output of torch's transformer.encoder and aihwkit's transformer.encoder
output_norm = torch.norm(output, p='fro', dim=None, keepdim=False, out=None, dtype=None).numpy(force=True)
a_error = torch.norm(a_output - output, p='fro', dim=None, keepdim=False, out=None, dtype=None).numpy(force=True)
a_error_ave = a_error / output_norm
print("a_error_ave: ", a_error_ave)
def copy_weights(self, encoder):
encoder_state = encoder.state_dict()
self.embedding = encoder.embedding
self.positional_encoding = encoder.positional_encoding
self.layer_norm.weight = nn.Parameter(encoder_state['layer_norm.weight'])
self.layer_norm.bias = nn.Parameter(encoder_state['layer_norm.bias'])
for layer_ind in range(self.n_layers):
self.encoder_layers[layer_ind][0].set_weights(encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_queries.weight'],
encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_queries.bias'],
encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_keys_values.weight'],
encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_keys_values.bias'],
encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_output.weight'],
encoder_state['encoder_layers.' + str(layer_ind) + '.0.cast_output.bias'],
encoder_state['encoder_layers.' + str(layer_ind) + '.0.layer_norm.weight'],
encoder_state['encoder_layers.' + str(layer_ind) + '.0.layer_norm.bias'])
self.encoder_layers[layer_ind][1].set_weights(encoder_state['encoder_layers.' + str(layer_ind) + '.1.fc1.weight'],
encoder_state['encoder_layers.' + str(layer_ind) + '.1.fc1.bias'],
encoder_state['encoder_layers.' + str(layer_ind) + '.1.fc2.weight'],
encoder_state['encoder_layers.' + str(layer_ind) + '.1.fc2.bias'],
encoder_state['encoder_layers.' + str(layer_ind) + '.1.layer_norm.weight'],
encoder_state['encoder_layers.' + str(layer_ind) + '.1.layer_norm.bias'])
How to reproduceIf it's not enough information to see the real problem. Expected behaviorConsistent output. Other information
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi @dongwhfdyer, many thanks for raising an issue. The AIHWKIT is a toolkit for simulating analog "noisy" in-memory computing hardware. The to be simulated hardware can be configured using the rpu_config.forward.is_perfect = True you turn off all noise and non-idealities, so that you should see the identical output. It is also important how to map the transformer weights onto memristive elements, that is how one sets the output scales to match the conductance range. These parameters are controlled by the |
Beta Was this translation helpful? Give feedback.
-
Also note that there is a |
Beta Was this translation helpful? Give feedback.
Hi @dongwhfdyer, many thanks for raising an issue.
The AIHWKIT is a toolkit for simulating analog "noisy" in-memory computing hardware. The to be simulated hardware can be configured using the
RPUConfig
. Be default, a repeated forward pass, that is evaluation in the "eval mode" on this analog hardware will be noisy and non-ideal. Thus, it is entirely expected that the output is different from run to run, even in the "eval" mode. You can change the noisiness of the forward pass by adjusting therpu_config.forward
attributes which are of classIOParameters
(see here). In particular if you setyou turn off all noise and non-idealities, so that you…