Text Feature Unsupervised Clustering #16

mightycatty · 2024-11-28T06:48:02Z

Observations on Text Clustering

Thank you for your excellent work.

I am conducting an experiment on clustering text using DBSCAN with text features. However, after comparing the results from microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned and openai/clip-vit-base-patch16, I have noticed some peculiarities in the clustering outcomes.

Here is a simplified example:

Text Samples:

this is a small apple
a small apple on the table
a rotten apple
a green apple
red apple

a running dog
dogs fighting each other
a dog is playing with a ball
cute dog

DBSCAN Setup:

eps = 0.8
min_samples = 3

Clustering Results:

microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned: (1 cluster)

this is a small apple

a rotten apple

a green apple

red apple

openai/clip-vit-base-patch16: (2 clusters)

this is a small apple

a small apple on the table

a rotten apple

a green apple

red apple

a running dog

dogs fighting each other

a dog is playing with a ball

cute dog

Note: This is a simplified example for the purpose of this issue. My actual dataset is much more complex, and the performance of microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned appears to be significantly worse.

My Questions:

Is it correct that the clustering performance of LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned is inferior?
I have compared the image features and they do indeed perform better than OpenCLIP. If the first point is valid, why does the retrieval performance exceed that of OpenCLIP?

The text was updated successfully, but these errors were encountered:

Yif-Yang · 2024-12-03T10:40:55Z

Hello, I find your exploration very interesting. In our current findings, the DBSCAN results of the text encoder output in LLM2CLIP are indeed relatively average (although perhaps not as poor as your evaluation suggests, as different DBSCAN parameters are needed). However, the visual side performs quite well. Comparatively, we also tested the pre-adapter LLM part of the LLM2CLIP text encoder, and its performance is actually quite good. But after applying the adapter, the performance seems to degrade significantly. We suspect that retrieval tasks and DBSCAN may not necessarily have such a high level of consistency. What are your thoughts on this issue?

Yif-Yang · 2024-12-03T10:41:46Z

We would be happy to work with you to analyze similar phenomena, and perhaps we can assist you in conducting some experiments if needed.

mightycatty · 2024-12-17T06:31:01Z

Hello, sorry for taking so long to reply. I have moved to other projects due to project rotation.

I carefully reviewed the principles of DBSCAN and am thinking that the clustering performance of dbscan indeed do not necessary align with the performance of image-text retrieval.
Because dbscan is based on density clustering, the text embeddings obtained do not necessarily exist in clusters on the hypercircle surface, especially for LLM, which has a far more complex internal understanding of texts than CLIP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Feature Unsupervised Clustering #16

Text Feature Unsupervised Clustering #16

mightycatty commented Nov 28, 2024 •

edited

Loading

Yif-Yang commented Dec 3, 2024

Yif-Yang commented Dec 3, 2024

mightycatty commented Dec 17, 2024 •

edited

Loading

Text Feature Unsupervised Clustering #16

Text Feature Unsupervised Clustering #16

Comments

mightycatty commented Nov 28, 2024 • edited Loading

Observations on Text Clustering

Yif-Yang commented Dec 3, 2024

Yif-Yang commented Dec 3, 2024

mightycatty commented Dec 17, 2024 • edited Loading

mightycatty commented Nov 28, 2024 •

edited

Loading

mightycatty commented Dec 17, 2024 •

edited

Loading