diff --git a/docs/teaser.gif b/docs/teaser.gif new file mode 100644 index 0000000..7bc6d1a Binary files /dev/null and b/docs/teaser.gif differ diff --git a/docs/teaser.jpg b/docs/teaser.jpg new file mode 100644 index 0000000..97826a0 Binary files /dev/null and b/docs/teaser.jpg differ diff --git a/index.html b/index.html index 2d66ddf..1e86db5 100644 --- a/index.html +++ b/index.html @@ -30,10 +30,14 @@
Textural Inversion, a prompt learning method, learns a singular embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images. However, identifying and integrating multiple object-level concepts within one scene poses significant challenges even when embeddings for individual concepts are attainable. This is further confirmed by our empirical tests. To address this challenge, we introduce a framework for Multi-Concept Prompt Learning (MCPL), where multiple new "words" are simultaneously learned from a single sentence-image pair. To enhance the accuracy of word-concept correlation, we propose three regularization techniques: Attention Masking (AttnMask) to concentrate learning on relevant areas; Prompts Contrastive Loss (PromptCL) to separate the embeddings of different concepts; and Bind adjective (Bind adj.) to associate new "words" with known words. We evaluate via image generation, editing, and attention visualization with diverse images. Extensive quantitative comparisons demonstrate that our method can learn more semantically disentangled concepts with enhanced word-concept correlation. Additionally, we introduce a novel dataset and evaluation protocol tailored for this new task of learning object-level concepts.