Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the acceptable sequence length of borzoi #72

Open
HelloWorldLTY opened this issue Oct 21, 2024 · 3 comments
Open

Questions about the acceptable sequence length of borzoi #72

HelloWorldLTY opened this issue Oct 21, 2024 · 3 comments

Comments

@HelloWorldLTY
Copy link

Hi, I notice that the borzoi model of gReLU relies on 512 as input sequence length, which is different from the default setting of borzoi (https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1.full.pdf) should be 524 kb.

Thanks a lot.

@avantikalal
Copy link
Collaborator

Hi @HelloWorldLTY , that doesn't sound right, our borzoi model should also take 524 kb input. Please see tutorial 1 where we use the model to make predictions on a 524 bp long sequence. Could you clarify where you found the number 512?

@HelloWorldLTY
Copy link
Author

Hi, thanks for your quick reply.

  1. I can run the bozori model with a 512 bp input:
image

which seems strange to me. But I use this approach:

model_params = {
    'model_type':'BorzoiPretrainedModel', # Type of model
    'n_tasks': 1, # Number of cell types to predict
    'crop_len':0, # No cropping of the model output
#     'n_transformers': 11, # Number of transformer layers; the published Enformer model has 11
}

train_params = {
    'task':'regression', # binary classification
    'lr':1e-4, # learning rate
    'logger': 'csv', # Logs will be written to a CSV file
    'batch_size': 4,
    'num_workers': 8,
    'devices': 0, # GPU index
    'save_dir': experiment,
    'optimizer': 'adam',
    'max_epochs': 50,
    'checkpoint': True, # Save checkpoints
    'loss': 'MSE'
}

import grelu.lightning
model = grelu.lightning.LightningModel(model_params=model_params, train_params=train_params)
  1. If I set the input sequence lenfth as 524288, there will be an error (which I also raised couple of weeks ago in Genentech internal slack)
File ~/.conda/envs/evo/lib/python3.11/site-packages/grelu/model/blocks.py:725, in UnetBlock.forward(self, x, y)
    723 x = self.conv(x)
    724 x = self.upsample(x)
--> 725 x = torch.add(x, self.channel_transform(y))
    726 x = self.sconv(x)
    727 return x

RuntimeError: The size of tensor a (5018) must match the size of tensor b (5019) at non-singleton dimension 2

@avantikalal
Copy link
Collaborator

This is because 'BorzoiPretrainedModel' is not the same thing as the actual Borzoi model. To load the actual Borzoi model with the architecture and weights trained by Linder et al., please follow the instructions in tutorial 1.

What you are doing here, with BorzoiPretrainedModel is creating a new model which has the same convolutional and transformer layers as Borzoi, but you can define the final (head) layer yourself. As such, you are allowed to set whatever parameters you want, and as you have set crop_len=0 (no cropping of the model output), it will work with shorter inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants