Questions about the acceptable sequence length of borzoi #72

HelloWorldLTY · 2024-10-21T01:04:35Z

Hi, I notice that the borzoi model of gReLU relies on 512 as input sequence length, which is different from the default setting of borzoi (https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1.full.pdf) should be 524 kb.

Thanks a lot.

avantikalal · 2024-10-21T04:20:22Z

Hi @HelloWorldLTY , that doesn't sound right, our borzoi model should also take 524 kb input. Please see tutorial 1 where we use the model to make predictions on a 524 bp long sequence. Could you clarify where you found the number 512?

HelloWorldLTY · 2024-10-21T18:07:49Z

Hi, thanks for your quick reply.

I can run the bozori model with a 512 bp input:

which seems strange to me. But I use this approach:

model_params = {
    'model_type':'BorzoiPretrainedModel', # Type of model
    'n_tasks': 1, # Number of cell types to predict
    'crop_len':0, # No cropping of the model output
#     'n_transformers': 11, # Number of transformer layers; the published Enformer model has 11
}

train_params = {
    'task':'regression', # binary classification
    'lr':1e-4, # learning rate
    'logger': 'csv', # Logs will be written to a CSV file
    'batch_size': 4,
    'num_workers': 8,
    'devices': 0, # GPU index
    'save_dir': experiment,
    'optimizer': 'adam',
    'max_epochs': 50,
    'checkpoint': True, # Save checkpoints
    'loss': 'MSE'
}

import grelu.lightning
model = grelu.lightning.LightningModel(model_params=model_params, train_params=train_params)

If I set the input sequence lenfth as 524288, there will be an error (which I also raised couple of weeks ago in Genentech internal slack)

File ~/.conda/envs/evo/lib/python3.11/site-packages/grelu/model/blocks.py:725, in UnetBlock.forward(self, x, y)
    723 x = self.conv(x)
    724 x = self.upsample(x)
--> 725 x = torch.add(x, self.channel_transform(y))
    726 x = self.sconv(x)
    727 return x

RuntimeError: The size of tensor a (5018) must match the size of tensor b (5019) at non-singleton dimension 2

avantikalal · 2024-10-26T00:24:49Z

This is because 'BorzoiPretrainedModel' is not the same thing as the actual Borzoi model. To load the actual Borzoi model with the architecture and weights trained by Linder et al., please follow the instructions in tutorial 1.

What you are doing here, with BorzoiPretrainedModel is creating a new model which has the same convolutional and transformer layers as Borzoi, but you can define the final (head) layer yourself. As such, you are allowed to set whatever parameters you want, and as you have set crop_len=0 (no cropping of the model output), it will work with shorter inputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the acceptable sequence length of borzoi #72

Questions about the acceptable sequence length of borzoi #72

HelloWorldLTY commented Oct 21, 2024

avantikalal commented Oct 21, 2024

HelloWorldLTY commented Oct 21, 2024

avantikalal commented Oct 26, 2024

Questions about the acceptable sequence length of borzoi #72

Questions about the acceptable sequence length of borzoi #72

Comments

HelloWorldLTY commented Oct 21, 2024

avantikalal commented Oct 21, 2024

HelloWorldLTY commented Oct 21, 2024

avantikalal commented Oct 26, 2024