-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues w/ softmax procedure #554
Comments
Could you provide a small reproducible example? |
import std/strformat
import arraymancer except softmax # by default, this softmax signature is proc (input: Tensor[softmax.T]): Tensor[softmax.T]
import arraymancer/nn/activation/softmax # this is the softmax we need softmax*[TT](a: Variable[TT]): Variable[TT]
let (N, D_in, H, D_out) = (64, 1000, 100, 10)
let ctx = newContext Tensor[float32]
let
x = ctx.variable(randomTensor[float32](N, D_in, 1'f32))
y = randomTensor[float32](N, D_out, 1'f32)
network ctx, TwoLayersNet:
layers:
fc1: Linear(D_in, H)
fc2: Linear(H, D_out)
forward x:
x.fc1.relu.fc2.softmax
let
model = ctx.init(TwoLayersNet)
optim = model.optimizerSGD(learning_rate = 1e-4'f32)
for t in 0 ..< 500:
let
y_pred = model.forward(x)
loss = y_pred.mse_loss(y)
echo &"Epoch {t}: loss {loss.value[0]}"
loss.backprop()
optim.update() |
@Vindaar the above is the "simple 2 layer" example modified to simply add softmax in the forward. It produces the same error as above |
Thanks! I felt so free as to update your comment and turn it into an actual code snippet. Will check it out. |
Thanks Vindaar for your fast response and comment edit lol I'm still getting used to Github markdown |
I've modified the proc softmax_backward_ag[TT](self: Gate[TT], payload: Payload[TT]): SmallDiffs[TT] =
let self = SoftmaxActivation[TT](self)#(Gate)
let gradient = payload.variable.grad
result = newDiffs[TT](1)
result[0] = gradient.softmax_backward(self.cache) This matches what I've found while looking at how However, now I get this error: After a search through the docs I see we don't have the |
@Vindaar please review when you can |
@Vindaar my good sir can we please get this implemented lol |
Can you please ping me about this on matrix/discord on the weekend, if I haven't looked into this by then? |
Ok, I just had a look at it. As you've mentioned yourself, the practical problem is that the backward pass for
(sorry for somewhat sloppy notation) See for example: That's why typically one combines the softmax on the last layer directly with a cross entropy loss, for which the gradient is easy to compute. I don't have the time & mental space atm to figure out how to efficiently implement this (if even possible?). If someone is willing to do so, feel free. Otherwise I'd just recommend to do what one normally does, i.e. use |
Shit, thanks for looking into it Vindaar. I will take a look when I finally get the time and mental space as well lol |
Firstly, I had to modify the arraymancer import to exclude 'softmax' procedure and then include it separately like so:
import arraymancer except softmax
import arraymancer/nn/activation/softmax
If I don't do this, the only version I'm getting is this one:
nnp_softmax.softmax: proc (input: Tensor[softmax.T]): Tensor[softmax.T]
Which would throw an error because it accepts
Tensor
s where as I need the version that acceptsVariable[TT]
, defined here: https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/nn/activation/softmax.nim(because this is needed in my custom
forward
procedure)When I did this, I'm STILL getting an error thrown although it's something different:
/Users/salient/Desktop/CatsVDogs/main.nim(75, 26) template/generic instantiation of
forward
from here/Users/salient/Desktop/CatsVDogs/main.nim(63, 9) template/generic instantiation of
softmax
from here/Users/salient/.nimble/pkgs/arraymancer-0.7.11/arraymancer/nn/activation/softmax.nim(58, 11) template/generic instantiation of
softmax_cache
from here/Users/salient/.nimble/pkgs/arraymancer-0.7.11/arraymancer/nn/activation/softmax.nim(42, 24) template/generic instantiation of
softmax_backward_ag
from here/Users/salient/.nimble/pkgs/arraymancer-0.7.11/arraymancer/nn/activation/softmax.nim(23, 35) Error: type mismatch: got but expected 'SoftmaxActivation[Tensor[system.float32]]'
Any explanation for what this is and/or how to fix it?
I do see an open issue that looks somewhat relevant but not too sure: #472
The text was updated successfully, but these errors were encountered: