Is there a way to do multi-label classification with CLIP? #975

justlike-prog · 2023-01-02T10:34:55Z

justlike-prog
Jan 2, 2023

The concrete use case is a as following. I have the classes baby, child, teen, adult. My idea was to use similarity between text and image features (for text features I used the prompt 'there is at least one (c) in the photo', c being one of the 4 classes).

I went through quite a lot of examples, but I am running into the issue that the similarity scores are often very different for a fixed class or/and classes that appear might have a very similar threshold (like baby and child). For similarity scores I use the cosine similarity multiplied by 2.5 to stretch the score into the interval [0, 1] as is done in the CLIP Score paper.

Setting a threshold in that sense doesn't seem possible.

Does anyone have an idea for that? I feel quite stuck here, how I should proceed.

mitchellnw · 2023-01-02T17:37:12Z

mitchellnw
Jan 2, 2023
Maintainer

not sure if it would work but have you by any chance looked at using captions like "this is a photo of a ','.join(subset)" where subset iterates over all subsets of your current classes? so then you'd have 2^4 classes instead of 4

0 replies

AmericanPresidentJimmyCarter · 2023-03-22T13:59:23Z

AmericanPresidentJimmyCarter
Mar 22, 2023

I am attempting this now training on captions with multiple labels and then querying with single labels, and it works pretty badly compared to any normal multi-label classifier.

{'f1': 0.08291136675917679, 'precision': 0.07481833065257353, 'recall': 0.10588978264912757}

If I figure this out I will let you know.

0 replies

Msalehi237 · 2023-06-23T05:36:18Z

Msalehi237
Jun 23, 2023

Take a look at this paper:
"DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations"

I struggled with this problem for a while and this approach is working for me.

0 replies

travellingsasa · 2024-03-28T19:29:33Z

travellingsasa
Mar 28, 2024

@AmericanPresidentJimmyCarter did find a way to improve the multi-labelling performance?

0 replies

AmericanPresidentJimmyCarter · 2024-03-31T18:15:59Z

AmericanPresidentJimmyCarter
Mar 31, 2024

No, I just trained multilabel classifiers instead and those worked.

0 replies

miguelalba96 · 2024-04-19T12:04:26Z

miguelalba96
Apr 19, 2024

@travellingsasa

You can do some sort of anti-text or placeholder text to do multi-label classification, ex:

your objective is checking in there is the presence of "red" in an image of a dress, then use:

["a red dress", "a dress"]

that will give you a probability distribution and you take the zero index

0 replies

AmericanPresidentJimmyCarter · 2024-04-20T20:21:45Z

AmericanPresidentJimmyCarter
Apr 20, 2024

@travellingsasa

You can do some sort of anti-text or placeholder text to do multi-label classification, ex:

your objective is checking in there is the presence of "red" in an image of a dress, then use:
["a red dress", "a dress"]
that will give you a probability distribution and you take the zero index

How does that work? If the image contains neither your result will be essentially random. I think it only works if you have a multi-label classifier to identify a dress in the first place.

0 replies

Fengjun-Wang · 2026-02-08T23:20:43Z

Fengjun-Wang
Feb 8, 2026

Take a look at this AAAI paper (arxiv): MuMIC-Multimodal Embedding for Multi-label Image Classification with Tempered Sigmoid

I developed the model for Booking.com. It's been serving production traffic for years and outperforms standard multi-label fine-tuning methods in efficiency.

If you don't want train models and wanna use CLIP out of the box, you can still use the tips in the paper's Section 4.2 Basline - a photo VS a photo of {your class name}.

1 reply

Fengjun-Wang Feb 8, 2026

It also offers you image embedding which has better content-relevance representation, as a side product.
We present real-world engineering tips in the paper. Hope you find it useful even if you are not on this topic for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to do multi-label classification with CLIP? #975

Uh oh!

{{title}}

Uh oh!

Replies: 8 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is there a way to do multi-label classification with CLIP? #975

Uh oh!

Replies: 8 comments · 1 reply

Uh oh!

mitchellnw Jan 2, 2023 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 8 comments 1 reply

mitchellnw
Jan 2, 2023
Maintainer