Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Sigmoid Loss Integration: SigLip #618

Closed
fabiozappo opened this issue Sep 4, 2023 · 3 comments
Closed

Request for Sigmoid Loss Integration: SigLip #618

fabiozappo opened this issue Sep 4, 2023 · 3 comments

Comments

@fabiozappo
Copy link

Hi there,

First of all, I'd like to express my appreciation for the outstanding work you've done on this project. Your efforts are truly commendable!

I recently had the opportunity to read the SigLip research paper, and the authors of the paper propose that replacing the conventional softmax loss with a sigmoid loss could potentially enhance a model's learning capabilities, particularly when dealing with lower batch sizes. Given the potential benefits highlighted in the paper, I'm curious to know if there are any plans to integrate this approach into your repository.

@rom1504
Copy link
Collaborator

rom1504 commented Sep 4, 2023

@rwightman Could you describe what you tried on this ?

@rwightman
Copy link
Collaborator

I had this 80% of the way but hadn't debugged the isend/irecv code to do the neighbour shifting (the paper authors use jax.lax.ppermute for this which is not available as pytorch primitive).

I'll see if I can get it into a state where it does something soonish... and then push a PR for others to look at, help polish/test, etc

@rwightman rwightman mentioned this issue Sep 15, 2023
@rwightman
Copy link
Collaborator

rwightman commented Sep 15, 2023

@rom1504 @fabiozappo what I have is here #634 , I tried working on the distributed part a bit more, but not sure I've got it behaving properly... at least it seems to not converge, or converge very poorly vs non-distributed ... currently trying a new autograd.Function approach which exchanges grad in opposite dir...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants