-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SeqTransf & meanP #1
Comments
We propose a temporal encoder to model the temporal relationship by setting "sim_header == seqTransf" (as shown in Figure 2.) |
Thank you for replying. Thx. |
I agree with you. If the seqTranf is randomly initialized (actually initialized from clip as shown in line 116 of modules/modeling.py), it may cause some sub-optimal phenomenon. That is why CLIP4Clip + meanP is better than CLIP4Clip + seqTranf in most datasets. Therefore, in our paper, we recommend using original clip to obtain frame-level visual features as shown in line 298 of modules/modeling_xclip.py. The temporal encoder helps to obtain the global video-level visual representation. |
Oh, Really Thank you for your very kind and fast reply. I didn't notice that line 116 of modules/modeling.py means that code makes initialized the seqTransf from clip. Q1. Q2. Q3. Thx. you're very kind. |
I did an experiment on another language (Indonesian) on MSVD using XCLIP. And I found that X-CLIP+meanP performs the best compared to the other. I haven't tried on the English one tho. But my experiment indicates that X-CLIP+seqTransf, i.e., the proposed temporal encoder, don't always perform the best on a dataset with different characteristics as in MSVD-Indonesian. I will share my experiment results later. |
Dear Author,
I really am appreciated and fascinated by your work, and feel thankful of releasing your code.
I know that CLIP4clip + meanP have all the best performance among CLIP4Clip + seqTranf, seqLSTM, and tightTransf,
But I found that in your script, always seqTransf are recommended in sh files.
Is that any special reason that why "sim_header == seqTransf" is default setting?
I had looked your Table 2 on MSVD, your model recorded X-CLIP(ViT-B/32) R@1 scores 47.1 .
Is it mean that when X-Clip with seqTransf is the best than any other mode -meanP, tightTransf- ?
I cannot find that what kind of sim_header retrieved that scores in that table.
If X-CLIP + seqtrasnf is recommended anyway,
any special reason why seqTrasnf outperforms than meanP, unlike Clip4Clip did?
Sincerely,
The text was updated successfully, but these errors were encountered: