Multi-GPU "[LightningModule] object has no attribute [attribute]" #3100

andrewjong · 2020-08-22T02:51:54Z

andrewjong
Aug 22, 2020

What is your question?

When trying to use multiple GPUs with either "DP" or "DDP", I get errors "[Module] object has no attribute [the attribute]". I'm storing data in between methods with self. For example, because I want to visualize validation images, I store self.batch = batch in validation_step() so I can visualize it during validation_epoch_end(). But I get the error "Module] object has no attribute 'batch'. This happens for other self attributes too, e.g. with self.val_dataset I create in prepare_data().

The error only occurs when trying to use multiGPU. I suspect it could be a thread synchronization issue, but I don't know how to resolve it. Would anyone have any idea, please?

Code

Snipped Code Example

class BaseModel(pl.LightningModule):
    
    ...

    def prepare_data(self):
        ...
        self.val_dataset = make_validation_dataset(self.hparams)

    def val_dataloader(self):
        val_loader = torch.utils.data.DataLoader(self.val_dataset)
        return val_loader

    def validation_step(self, batch, idx):
        self.batch = batch  # save a single batch to visualize later
        result = self.training_step(batch, idx)
        return {"val_loss": result["loss"]}

    def validation_epoch_end(self, outputs):
        # after validation, visualize the last validation sample
        self.visualize(self.batch, "validation")

    def visualize(one_batch, train_or_val):
        ... # add stuff to TensorBoard


model = BaseModel()
trainer = Trainer(
    gpu_ids="1,2,3,4",
    distributed_backend="dp" # same thing happens with "ddp"
)
trainer.fit(model)

And the error is below. Just the text in red is relevant:

What have you tried?

Tried DP and DDP. Tried reading lightning docs and google, but couldn't find anything related to Lightning. Not sure if it's related to needing ".module" on DataParallel-wrapped modules, but it seems strange that Lightning would require that.

What's your environment?

OS: Linux
Packaging conda
Version Lightning 0.8.5

rohitgr7 · 2020-08-22T12:33:10Z

rohitgr7
Aug 22, 2020

https://pytorch-lightning.readthedocs.io/en/latest/lightning-module.html#prepare-data
https://pytorch-lightning.readthedocs.io/en/latest/lightning-module.html#setup
use setup to assign attribute to self. It is called on each device.

0 replies

andrewjong · 2020-08-22T16:40:06Z

andrewjong
Aug 22, 2020
Author

Thank you @rohitgr7 ! What if I want to save an attribute between method calls? I have training_step() call different methods and those methods save to self for convenience. Should I not do that for multiGPU / multiprocessing?

0 replies

rohitgr7 · 2020-08-22T16:57:50Z

rohitgr7
Aug 22, 2020

I believe it should work since except prepare_data all other methods are called in separate subprocess in case of multi-GPU / multiprocessing? training.

0 replies

andrewjong · 2020-08-22T17:05:06Z

andrewjong
Aug 22, 2020
Author

Okay, I'll try it and report back, thanks! I see the docs also mention important methods to call for DP/DDP2 to aggregate data at the end of steps. I'll try that as well.

0 replies

andrewjong · 2020-08-23T03:52:29Z

andrewjong
Aug 23, 2020
Author

I got it. The attribute error was no attribute named self.dataset because I was trying to set that in prepare_data(). I missed the several warnings on the docs that said NOT to set state in prepare_data() because it's only called on global_rank 0, and instead to do so in setup(). Once I did that, everything was resolved. Thanks all!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU "[LightningModule] object has no attribute [attribute]" #3100

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Multi-GPU "[LightningModule] object has no attribute [attribute]" #3100

andrewjong Aug 22, 2020

What is your question?

Code

What have you tried?

What's your environment?

Replies: 5 comments

rohitgr7 Aug 22, 2020

andrewjong Aug 22, 2020 Author

rohitgr7 Aug 22, 2020

andrewjong Aug 22, 2020 Author

andrewjong Aug 23, 2020 Author

andrewjong
Aug 22, 2020

rohitgr7
Aug 22, 2020

andrewjong
Aug 22, 2020
Author

rohitgr7
Aug 22, 2020

andrewjong
Aug 22, 2020
Author

andrewjong
Aug 23, 2020
Author