-
希望能出一个多模态数据的融合处理教程,比如对文本、图像、音频、视频等数据进行特征提取,然后使用各类注意力机制或transform架构进行特征融合处理,最后完成分类或回归等相关任务,谢谢。 |
Beta Was this translation helpful? Give feedback.
Answered by
Sm1les
Dec 11, 2024
Replies: 2 comments
-
这其实就是多模态,可以直接学习qwen-vl的微调:https://github.com/datawhalechina/self-llm/blob/master/models/Qwen2-VL/06-Qwen2-VL-2B-Instruct%20Lora%20%E5%BE%AE%E8%B0%83%E6%A1%88%E4%BE%8B%20-%20LaTexOCR.md |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Sm1les
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
这其实就是多模态,可以直接学习qwen-vl的微调:https://github.com/datawhalechina/self-llm/blob/master/models/Qwen2-VL/06-Qwen2-VL-2B-Instruct%20Lora%20%E5%BE%AE%E8%B0%83%E6%A1%88%E4%BE%8B%20-%20LaTexOCR.md