You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How is the post-training for the two tasks of multimodal understanding and image generation conducted? Is it done jointly like in Show-O, or are they trained separately? Also, what are the approximate total number of training samples and the ratio between the two tasks?
The text was updated successfully, but these errors were encountered:
How is the post-training for the two tasks of multimodal understanding and image generation conducted? Is it done jointly like in Show-O, or are they trained separately? Also, what are the approximate total number of training samples and the ratio between the two tasks?
The text was updated successfully, but these errors were encountered: