Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial fails on the GPU part with 'cunn' - training with SGD #19

Open
salman1993 opened this issue Oct 20, 2016 · 5 comments
Open

Tutorial fails on the GPU part with 'cunn' - training with SGD #19

salman1993 opened this issue Oct 20, 2016 · 5 comments

Comments

@salman1993
Copy link

salman1993 commented Oct 20, 2016

Hey, I was just trying the tutorial and the last part doesn't seem to work. Here is the error I got. Can someone help me?

I already tried reinstalling 'nn' and 'cunn' and also reinstall Torch altogether! Btw, can someone also tell me why we do that?

Channel 1, Mean: 125.83175029297
Channel 1, Standard Deviation: 63.143400842609
Channel 2, Mean: 123.26066621094
Channel 2, Standard Deviation: 62.369209019002
Channel 3, Mean: 114.03068681641
Channel 3, Standard Deviation: 66.965808411114
# StochasticGradient: training
# current error = 2.2234263599277
# current error = 1.88329374547
# current error = 1.6842083223224
# current error = 1.5661180615187
# current error = 1.4682321660876
# StochasticGradient: you have reached the maximum number of iterations
# training error = 1.4682321660876
/home/s43moham/torch/install/bin/luajit: /home/s43moham/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/s43moham/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #3 to 'v' (cannot convert 'struct THCudaTensor *' to 'struct THDoubleTensor *')
stack traceback:
    [C]: in function 'v'
    /home/s43moham/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
    ...am/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:96: in function <...am/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:92>
    [C]: in function 'xpcall'
    /home/s43moham/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    ...e/s43moham/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    torchCudaTutorial.lua:82: in main chunk
    [C]: in function 'dofile'
    ...oham/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    /home/s43moham/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
    ...e/s43moham/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    torchCudaTutorial.lua:82: in main chunk
    [C]: in function 'dofile'
    ...oham/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670
@mhmtsarigul
Copy link

You are trying a process with non-cuda tensor. Convert it to cuda.

@BenMacKenzie
Copy link

i think the tutorial neglects to convert testset.data to a Cuda Tensor:

testset.data = testset.data:cuda()

@davidstutz
Copy link

Although it's a bit late, I got this kind of errors when not converting the criterion to CUDA:

criterion = criterion:cuda()

Afterwards, amking sure that in- and outputs are CUDA tensors, everything worked fine. Thought I'd share as the issue is still open!

@minkymorgan
Copy link

@davidstutz - thank you! That was exactly what I needed. In my case:
'nn.SequencerCriterion(nn.ClassNLLCriterion()):cuda()'

@ssdutHB
Copy link

ssdutHB commented Jul 17, 2018

From my perspective, the issue is caused by the position of :cuda() operation. The :double operation in tutorial converts the tensor not only to double but also to CPU. So the :cuda() operation for training and testing dataset must be placed after the :double operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants