Model inference speed is slower than training speed #3786

davidshen84 · 2024-03-24T05:12:04Z

davidshen84
Mar 24, 2024

Hi,

I want to create an EncoderDecoder model and use the encoder part later. The code design looks like this.

class Encoder(nn.Module):
  ...

class Decoder(nn.Module):
  ...

class EncoderDecoder(nn.Module):
  @compact
  def __call__(self, x):
    y = Encoder()(x)
    y = Decoder()(y)
    ...
    return y

The model state is saved using the checkpoint method described here.

While training the EncoderDecoder module on my RTX 2070 GPU, I can reach about 60 it/s.

But when I loaded the module, extracted the encoder part, and put it in another loop, I could only get about 6 it/s. This is how I used the encoder.

for step, batch in tqdm(...):
  y = encoder.apply(encode_state["params"], batch, ...)

I checked the GPU monitor, and my GPU is busy, which suggests it is using the GPU.

My Encoder parameter count is much smaller than that of EncoderDecoder, so I'd expect the loop that only uses the encoder to be much faster than the loop I used to train the EncoderDecoder, especially the 2nd loop does not train the encoder parameter at all.