SeqTransf & meanP #1

celestialxevermore · 2022-10-05T09:17:32Z

Dear Author,

I really am appreciated and fascinated by your work, and feel thankful of releasing your code.

I know that CLIP4clip + meanP have all the best performance among CLIP4Clip + seqTranf, seqLSTM, and tightTransf,

But I found that in your script, always seqTransf are recommended in sh files.

Is that any special reason that why "sim_header == seqTransf" is default setting?

I had looked your Table 2 on MSVD, your model recorded X-CLIP(ViT-B/32) R@1 scores 47.1 .
Is it mean that when X-Clip with seqTransf is the best than any other mode -meanP, tightTransf- ?
I cannot find that what kind of sim_header retrieved that scores in that table.

If X-CLIP + seqtrasnf is recommended anyway,
any special reason why seqTrasnf outperforms than meanP, unlike Clip4Clip did?

Sincerely,

xuguohai · 2022-10-05T11:35:45Z

We propose a temporal encoder to model the temporal relationship by setting "sim_header == seqTransf" (as shown in Figure 2.)
The ablation study of temporal encoder is shown in Table 8.

celestialxevermore · 2022-10-05T12:34:01Z

Thank you for replying.
As I know, The temporal encoder, Transformer is randomly initialized, which causes some sub-optimal phenomenon as the randomly initialized weights of the seqTransf do harm on CLIP pretrained weights. Am I wrong? or Any ideas about this?

Thx.

xuguohai · 2022-10-05T13:12:06Z

I agree with you. If the seqTranf is randomly initialized (actually initialized from clip as shown in line 116 of modules/modeling.py), it may cause some sub-optimal phenomenon. That is why CLIP4Clip + meanP is better than CLIP4Clip + seqTranf in most datasets.

Therefore, in our paper, we recommend using original clip to obtain frame-level visual features as shown in line 298 of modules/modeling_xclip.py. The temporal encoder helps to obtain the global video-level visual representation.

celestialxevermore · 2022-10-06T03:05:06Z

Oh, Really Thank you for your very kind and fast reply.

I didn't notice that line 116 of modules/modeling.py means that code makes initialized the seqTransf from clip.

Q1.
Then, What about Cross model? in tightTransf?

Q2.
Plus, As I novice for Deep Learning, I cannot understand exactly that why line 298 of modules/modeling_xclip.py the seemly simply just only 'copying' action from visual_output can be interpreted as using original clip to earn the frame-level visual features. I guess that the visual_output give all the objects earned from clip parameters to visual_output_original.

Q3.
Then, what if I do modelling newly on using some other Layers like seqTransf or seqLSTM or TightTransf,
is there no need to freeze some layers but do what you did in line 298 of modules/modeling_xclip.py is enough to help making better in performance?
Can you teach me about this comment?

Thx. you're very kind.

willyfh · 2023-05-20T07:22:14Z

I did an experiment on another language (Indonesian) on MSVD using XCLIP. And I found that X-CLIP+meanP performs the best compared to the other. I haven't tried on the English one tho. But my experiment indicates that X-CLIP+seqTransf, i.e., the proposed temporal encoder, don't always perform the best on a dataset with different characteristics as in MSVD-Indonesian. I will share my experiment results later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SeqTransf & meanP #1

SeqTransf & meanP #1

celestialxevermore commented Oct 5, 2022 •

edited

Loading

xuguohai commented Oct 5, 2022

celestialxevermore commented Oct 5, 2022 •

edited

Loading

xuguohai commented Oct 5, 2022

celestialxevermore commented Oct 6, 2022 •

edited

Loading

willyfh commented May 20, 2023 •

edited

Loading

SeqTransf & meanP #1

SeqTransf & meanP #1

Comments

celestialxevermore commented Oct 5, 2022 • edited Loading

xuguohai commented Oct 5, 2022

celestialxevermore commented Oct 5, 2022 • edited Loading

xuguohai commented Oct 5, 2022

celestialxevermore commented Oct 6, 2022 • edited Loading

willyfh commented May 20, 2023 • edited Loading

celestialxevermore commented Oct 5, 2022 •

edited

Loading

celestialxevermore commented Oct 5, 2022 •

edited

Loading

celestialxevermore commented Oct 6, 2022 •

edited

Loading

willyfh commented May 20, 2023 •

edited

Loading