-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text Feature Unsupervised Clustering #16
Comments
Hello, I find your exploration very interesting. In our current findings, the DBSCAN results of the text encoder output in LLM2CLIP are indeed relatively average (although perhaps not as poor as your evaluation suggests, as different DBSCAN parameters are needed). However, the visual side performs quite well. Comparatively, we also tested the pre-adapter LLM part of the LLM2CLIP text encoder, and its performance is actually quite good. But after applying the adapter, the performance seems to degrade significantly. We suspect that retrieval tasks and DBSCAN may not necessarily have such a high level of consistency. What are your thoughts on this issue? |
We would be happy to work with you to analyze similar phenomena, and perhaps we can assist you in conducting some experiments if needed. |
Hello, sorry for taking so long to reply. I have moved to other projects due to project rotation. I carefully reviewed the principles of DBSCAN and am thinking that the clustering performance of dbscan indeed do not necessary align with the performance of image-text retrieval. |
Observations on Text Clustering
Thank you for your excellent work.
I am conducting an experiment on clustering text using DBSCAN with text features. However, after comparing the results from
microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned
andopenai/clip-vit-base-patch16
, I have noticed some peculiarities in the clustering outcomes.Here is a simplified example:
Text Samples:
DBSCAN Setup:
eps = 0.8
min_samples = 3
Clustering Results:
Note: This is a simplified example for the purpose of this issue. My actual dataset is much more complex, and the performance of
microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned
appears to be significantly worse.My Questions:
LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned
is inferior?The text was updated successfully, but these errors were encountered: