You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I'd like to express my appreciation for the outstanding work you've done on this project. Your efforts are truly commendable!
I recently had the opportunity to read the SigLip research paper, and the authors of the paper propose that replacing the conventional softmax loss with a sigmoid loss could potentially enhance a model's learning capabilities, particularly when dealing with lower batch sizes. Given the potential benefits highlighted in the paper, I'm curious to know if there are any plans to integrate this approach into your repository.
The text was updated successfully, but these errors were encountered:
I had this 80% of the way but hadn't debugged the isend/irecv code to do the neighbour shifting (the paper authors use jax.lax.ppermute for this which is not available as pytorch primitive).
I'll see if I can get it into a state where it does something soonish... and then push a PR for others to look at, help polish/test, etc
@rom1504@fabiozappo what I have is here #634 , I tried working on the distributed part a bit more, but not sure I've got it behaving properly... at least it seems to not converge, or converge very poorly vs non-distributed ... currently trying a new autograd.Function approach which exchanges grad in opposite dir...
Hi there,
First of all, I'd like to express my appreciation for the outstanding work you've done on this project. Your efforts are truly commendable!
I recently had the opportunity to read the SigLip research paper, and the authors of the paper propose that replacing the conventional softmax loss with a sigmoid loss could potentially enhance a model's learning capabilities, particularly when dealing with lower batch sizes. Given the potential benefits highlighted in the paper, I'm curious to know if there are any plans to integrate this approach into your repository.
The text was updated successfully, but these errors were encountered: