Multi-GPU when using Torchtext iterator for data loading #226

aitor-garcia-p · 2019-09-16T10:08:23Z

aitor-garcia-p
Sep 16, 2019

Hi there,

I have just discovered pytorch-lightning few days ago and it seems awesome (congratulations!)
I have a question I cannot solve by reading the docs and examples.
Is is fully compatible with Torchtext?
I am trying to use a Torchtext iterator to load the data in batches, and I have managed to make it work for a single GPU, but when I add additional GPUs to the trainer:

trainer = Trainer(experiment=exp, gpus=[0, 1])

it breaks saying:

RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorIndex.cu:397

I understand that the problem comes from the model and the data not being placed in the same GPU.
I am following the provided template, replacing the MNIST parts with my own data.
The way I load the training data is:

@pl.data_loader
    def tng_dataloader(self):
        print('tng data loader called')
        DEVICE = torch.device(next(self.parameters()).device if torch.cuda.is_available() else "cpu")
        print('Current device ', DEVICE)
        (train_iter,) = BucketIterator.splits(
            (self.train_data,),
            sort=False,
            batch_size=self.hparams.batch_size,
            shuffle=True,
            repeat=False,
            device=DEVICE  # what to do here? what will do Lightning w.r.t. device?
        )

        return train_iter

I use that little hack to get the current gpu device to parameterize the Torchtext BucketIterator, because if I leave the Torchtext iterator "device" field empty it defaults to cpu, and I get the corresponding complaint when the training starts with the model in the gpu:
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

But this hack does not work for more-than-one-gpu setting.
Am I missing something or am I doing something wrong?
I could also reimplement my data loading using regular Pytorch dataloaders as in the template, but I would like to know if I can stick to Torchtext and still get the multi-gpu goodies from Lightning :)

Thanks in advance!

williamFalcon · 2019-09-16T11:10:10Z

williamFalcon
Sep 16, 2019
Maintainer

hi! install from master and try again? i believe we pushed a fix for this on master. if not, i can look at it deeper

0 replies

williamFalcon · 2019-09-16T14:55:42Z

williamFalcon
Sep 16, 2019
Maintainer

@aitor-garcia-p actually, just released a new version with these fixes. Try again? if not we'll take a deeper look at it

0 replies

aitor-garcia-p · 2019-09-16T16:16:10Z

aitor-garcia-p
Sep 16, 2019
Author

Hi again,

After digging a bit (with my limited understanding), I see that in this function,

https://github.com/williamFalcon/pytorch-lightning/blob/4c61d1f30a29db8404606c6c933a5a2f3c0ae1ae/pytorch_lightning/trainer/trainer.py#L1077

if the "batch" parameter is a torchtext.data.Batch object (as it happens when using a Torchtext Iterator) the Trainer function transfer_batch_to_gpu will miss it despite having several conditionals.

I have made a test adding this additional condition:

elif getattr(batch, 'fields', None):
            for f_name in batch.fields:
                setattr(batch, f_name, self.transfer_batch_to_gpu(getattr(batch, f_name), gpu_id))
            return batch

(Or any other condition that catches a torchtext.data.Batch instance)
And it has started working nicely in single GPU with any additional hack to set the device on the iterators.

But still I cannot make multi-gpu working when the batches come from a torchtext iterator.
I instantiate the Trainer like this:

trainer = Trainer(experiment=exp, gpus=[0, 1], distributed_backend='ddp')

And it complains about the following:

   mp.spawn(self.ddp_train, nprocs=self.num_gpus, args=(model,))
  File "/DATA/agarciap_data/python_stuff/python_envs/venv_for_remotes/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 162, in spawn
    process.start()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/usr/local/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/local/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/local/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/local/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/usr/local/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/usr/local/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/local/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/usr/local/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/pickle.py", line 856, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/lib/python3.7/pickle.py", line 882, in _batch_setitems
    save(v)
  File "/usr/local/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/pickle.py", line 786, in save_tuple
    save(element)
  File "/usr/local/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/pickle.py", line 771, in save_tuple
    save(element)
  File "/usr/local/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/local/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/usr/local/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/pickle.py", line 856, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/lib/python3.7/pickle.py", line 882, in _batch_setitems
    save(v)
  File "/usr/local/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
TypeError: 'generator' object is not callable

It seems that there is something with the torchtext iterator that prevents from a proper serialization for the distributed processes spawn.
Or there is something else missing. For example, I have noticed that for ddp the Pytorch Dataloaders needs a DistributedSampler, but I don't know how does the torchtext iterators deal with that.

0 replies

williamFalcon · 2019-09-17T11:59:52Z

williamFalcon
Sep 17, 2019
Maintainer

yeah, looks like torchtext can't be pickled and thus not used with DDP. But you should verify that on the torchtext issues. If that's true, then i'd recommend DP or we can try to come up with a work around

0 replies

williamFalcon · 2019-09-17T12:01:04Z

williamFalcon
Sep 17, 2019
Maintainer

Also, feel free to submit a PR with your changes so we can enable torchtext support

0 replies

ctlaltdefeat · 2019-11-17T20:26:53Z

ctlaltdefeat
Nov 17, 2019

Hey @williamFalcon ,
Any progress on this, or a preferred workaround?
The ubiquity of torchtext together with this issue make if difficult to do NLP/seq2seq.

0 replies

williamFalcon · 2019-11-17T20:30:12Z

williamFalcon
Nov 17, 2019
Maintainer

hey! sorry, been busy with deadlines but will look at it this week.

want to take a stab at a PR? can help you finish it once you submit it

0 replies

williamFalcon · 2019-12-04T12:16:18Z

williamFalcon
Dec 4, 2019
Maintainer

@ctlaltdefeat did you still want to submit this PR?
@jeffling @neggert anyone want to take a look at this?

0 replies

ctlaltdefeat · 2019-12-04T14:31:42Z

ctlaltdefeat
Dec 4, 2019

I've been busy too, and I think it may be more of an issue between DistributedDataParallel and torchtext than anything that lightning adds per se.

0 replies

jeffling · 2019-12-06T02:11:10Z

jeffling
Dec 6, 2019

That's correct, torchtext can't be pickled and you'll want to use DP.

Could you give a full stacktrace of the issue with DP? I'm not sure which step is emitting that error or if it's coming from dataloading or training.

0 replies

ctlaltdefeat · 2019-12-06T02:21:13Z

ctlaltdefeat
Dec 6, 2019

The issue with DP (for me) is that the inability to use mixed-precision training offsets the benefit of multi-GPU training.
I think tomorrow I'll try to work around the issue by either converting the torchtext object to a standard DataLoader or by somehow separating the torchtext object from the model that needs to be pickled.

0 replies

aced125 · 2019-12-31T02:04:52Z

aced125
Dec 31, 2019

Any recent updates on this issue?

0 replies

aced125 · 2020-01-09T18:26:07Z

aced125
Jan 9, 2020

I am trying to run a torchtext dataset, it works fine with single GPU, but fails to work on dp, ddp (ddp2 out of bounds for me as no slurm). I think ddp may be an issue with another library (wandb.com).

But for dp I am getting the same error as OP.

RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorIndex.cu:397

0 replies

aced125 · 2020-01-09T19:09:57Z

aced125
Jan 9, 2020

@jeffling this is the error trace for DP with torchtext:


16:17:51
Starting Model

16:17:51
Traceback (most recent call last):

16:17:51
File "/Siamese_BERT_blogpost/train.py", line 107, in <module>

16:17:51
trainer.fit(model)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 348, in fit

16:17:51
self.dp_train(model)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/dp_mixin.py", line 104, in dp_train

16:17:51
self.run_pretrain_routine(model)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 455, in run_pretrain_routine

16:17:51
self.evaluate(model, self.get_val_dataloaders(), self.nb_sanity_val_steps, self.testing)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop_mixin.py", line 50, in evaluate

16:17:51
test)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop_mixin.py", line 174, in evaluation_forward

16:17:51
output = model(*args)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__

16:17:51
result = self.forward(*i

16:17:51
wandb: Waiting for W&B process to finish, PID 162

16:17:51
nput, **kwargs)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/pt_overrides/override_data_parallel.py", line 65, in forward

16:17:51
outputs = self.parallel_apply(replicas, inputs, kwargs)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/pt_overrides/override_data_parallel.py", line 69, in parallel_apply

16:17:51
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/pt_overrides/override_data_parallel.py", line 199, in parallel_apply

16:17:51
raise output

16:17:51
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/pt_overrides/override_data_parallel.py", line 165, in _worker

16:17:51
output = module.validation_step(*input, **kwargs)

16:17:51
File "/Siamese_BERT_blogpost/wrapper.py", line 42, in validation_step

16:17:51
out = self.forward(batch)

16:17:51
File "/Siamese_BERT_blogpost/wrapper.py", line 35, in forward

16:17:51
return self.siamese(batch)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py",wandb: Program failed with code 1. Press ctrl-c to abort syncing.

16:17:51
line 541, in __call__

16:17:51
result = self.forward(*input, **kwargs)

16:17:51
File "/Siamese_BERT_blogpost/models.py", line 46, in forward

16:17:51
premise = self.language_model(premise)[0]

16:17:51
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__

16:17:51
result = self.forward(*input, **kwargs)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 735, in forward

16:17:51
embedding_output = self.embeddings(input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__

16:17:51
result = self.forward(*input, **kwargs)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 186, in forward

16:17:51
inputs_embeds = self.word_embeddings(input_ids)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__

16:17:51
result = self.forward(*input, **kwargs)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/sparse.py", line 114, in forward

16:17:51
self.norm_type, self.scale_grad_by_freq, self.sparse)

16:17:51
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1484, in embedding

16:17:51
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

16:17:51
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorIndex.cu:400

I am running train.py from this repository: https://github.com/Genei-Ltd/Siamese_BERT_blogpost/blob/master/train.py

0 replies

jeffling · 2020-01-09T20:03:15Z

jeffling
Jan 9, 2020

@aced125 it looks like the batches aren't put into the right GPU. Could you look at the example code the OP put out regarding the hack with torchtext to place things into the right GPU?

It also looks like @ctlaltdefeat had this working with DP, but couldn't use DP due to other reasons. Any tips?

0 replies

aced125 · 2020-01-14T20:57:40Z

aced125
Jan 14, 2020

@jeffling I've given up on torchtext datasets to be honest. It was easy enough to switch to a torch.utils.data.DataLoader instead.

I am going to try PL on graph convolutions soon (will be using the pytorch-geometric library which also uses a custom DataLoader (which inherits from the torch DataLoader) so will let you know if that works well.

0 replies

zeeshansayyed · 2020-02-20T21:56:32Z

zeeshansayyed
Feb 20, 2020

Hello,

I am not able to get a simple toy example running using Torchtext iterators even on a single gpu. I am using a BucketIterator as follows:

train_data, valid_data, test_data = Multi30k.splits(exts=('.de', '.en'), fields=(SRC, TRG))
SRC.build_vocab(train_data, min_freq=2)
TRG.build_vocab(train_data, min_freq=2)
train_iter, valid_iter, test_iter = BucketIterator.splits((train_data, valid_data, test_data), batch_size=batch_size)

My trainer code is:

model = SegmenterModule(80, 76)
trainer = Trainer(gpus=1, max_nb_epochs=3, default_save_path='checkpoints')
trainer.fit(model)

But I get an error because the batch data is still on cpu and not moved to the gpu.

Stack trace:
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

Can someone please help me figure out the problem or share a working example using torchtext iterators? Also, should I open a new issue with this problem or let this question be here?

0 replies

elkotito · 2020-03-25T12:51:44Z

elkotito
Mar 25, 2020

As some people mentioned here, I cannot make it work even for a single GPU. I debugged the code and it seems like Batch object generated by torchtext.data.Iterator doesn't follow the rules described here

https://github.com/PyTorchLightning/pytorch-lightning/blob/45d671a4a81788b9d97fd6b47763816926e58e95/pytorch_lightning/trainer/distrib_parts.py#L420

As a result the data are not moved to GPU and the code gives my exception:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

0 replies

Borda · 2020-03-26T13:50:44Z

Borda
Mar 26, 2020
Maintainer

@aitor-garcia-p @mateuszpieniak @jeffling let's close this one and continue discission how to improve the situation in #1245

0 replies

jazzypan · 2020-04-12T12:48:17Z

jazzypan
Apr 12, 2020

Hello,

I am not able to get a simple toy example running using Torchtext iterators even on a single gpu. I am using a BucketIterator as follows:
train_data, valid_data, test_data = Multi30k.splits(exts=('.de', '.en'), fields=(SRC, TRG))
SRC.build_vocab(train_data, min_freq=2)
TRG.build_vocab(train_data, min_freq=2)
train_iter, valid_iter, test_iter = BucketIterator.splits((train_data, valid_data, test_data), batch_size=batch_size)
My trainer code is:
model = SegmenterModule(80, 76)
trainer = Trainer(gpus=1, max_nb_epochs=3, default_save_path='checkpoints')
trainer.fit(model)
But I get an error because the batch data is still on cpu and not moved to the gpu.

Stack trace:
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

Can someone please help me figure out the problem or share a working example using torchtext iterators? Also, should I open a new issue with this problem or let this question be here?

Have you find out the solution yet?

0 replies

GaoQ1 · 2020-06-17T08:52:25Z

GaoQ1
Jun 17, 2020

Hello,
I am not able to get a simple toy example running using Torchtext iterators even on a single gpu. I am using a BucketIterator as follows:
train_data, valid_data, test_data = Multi30k.splits(exts=('.de', '.en'), fields=(SRC, TRG))
SRC.build_vocab(train_data, min_freq=2)
TRG.build_vocab(train_data, min_freq=2)
train_iter, valid_iter, test_iter = BucketIterator.splits((train_data, valid_data, test_data), batch_size=batch_size)
My trainer code is:
model = SegmenterModule(80, 76)
trainer = Trainer(gpus=1, max_nb_epochs=3, default_save_path='checkpoints')
trainer.fit(model)
But I get an error because the batch data is still on cpu and not moved to the gpu.
Stack trace:
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select
Can someone please help me figure out the problem or share a working example using torchtext iterators? Also, should I open a new issue with this problem or let this question be here?
Have you find out the solution yet?

I also have the same problem.

0 replies

elkotito · 2020-06-17T11:33:18Z

elkotito
Jun 17, 2020

Currently, you need to transfer manually data to a GPU using torchtext. Take a look at my gist
https://gist.github.com/mateuszpieniak/f290b3a727db7e94b9da0bd3bd2e33c1 and the method transfer_batch_to_device.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU when using Torchtext iterator for data loading #226

{{title}}

Replies: 22 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Multi-GPU when using Torchtext iterator for data loading #226

Replies: 22 comments

williamFalcon Sep 16, 2019 Maintainer

williamFalcon Sep 16, 2019 Maintainer

aitor-garcia-p Sep 16, 2019 Author

williamFalcon Sep 17, 2019 Maintainer

williamFalcon Sep 17, 2019 Maintainer

williamFalcon Nov 17, 2019 Maintainer

williamFalcon Dec 4, 2019 Maintainer

Borda Mar 26, 2020 Maintainer

williamFalcon
Sep 16, 2019
Maintainer

williamFalcon
Sep 16, 2019
Maintainer

aitor-garcia-p
Sep 16, 2019
Author

williamFalcon
Sep 17, 2019
Maintainer

williamFalcon
Sep 17, 2019
Maintainer

williamFalcon
Nov 17, 2019
Maintainer

williamFalcon
Dec 4, 2019
Maintainer

Borda
Mar 26, 2020
Maintainer