Retrieve target image from composed image and text

Retrieval Model

Generative Model

This project implements a deep learning neural network model to retrieve the target image from a given image and a text containing desired modifications in the input image. The text and the image are composed using the TIRG model, use this composed feature to retrieve the target image. An experimental study to generate the target image is also implemented using StackGAN. This project is implemented as part of course project for Deep Learning Course at University of Victoria.

More details could be found at: 'Report.pdf'.

Getting Started

Model TIRG is trained using the PYTHON file "train_TIRG.py":

Download the dataset and give the location of the dataset in config.py.
Change the following according to the needs: batch_size, epochs, dataset location.
Load the pre-trained model in 'save' directory if needed.

bash download.sh
unzip CSS.zip
python train_TIRG.py --batch_size=64  --num_iters=210000

StackGAN is trained using the PYTHON file "train_GAN.py":

Training consist of two stages, stage I and Stage II.
Train the TIRG model or use the pretrained TIRG weights before training.
To train Stage I:

 python train_GAN.py --stage=1 --batch_size=64

To train Stage II:

python train_GAN.py --stage=2 --batch_size=8

Change the following according to the needs: batch_size, epochs, lr (learning rate).
Load the pre-trained model in 'save' directory if needed.

TensorFlow for TIRG model:

To train in TensorFlow:

bash download.sh
unzip CSS.zip
python main.py

Dataset

CSS 16K (1 GB):
Extract and load the dataset outside the folder.

Download the pretrained model

TIRG. Pretrained model is trained on 1 NVIDIA GeForce GTX 1080Ti for 20 hours(450 epoches).
StackGAN Stage I. Pretrained model is trained on 2 NVIDIA Tesla T4 (120 epoches).
StackGAN Stage II. Pretrained model is trained on 2 NVIDIA Tesla T4 (90 epoches).

Reference

This project was implemented taking reference from the following papers:

Composing Text and Image for Image Retrieval - An Empirical Odyssey (arXiv 2018) **[Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays]

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (arXiv 2016) **[Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Models		Models
TensorBoard		TensorBoard
TensorFlow_implementation		TensorFlow_implementation
utils		utils
Feature_visualize.py		Feature_visualize.py
README.md		README.md
Report.pdf		Report.pdf
config.py		config.py
datasets.py		datasets.py
diagram.png		diagram.png
diagram2.png		diagram2.png
diagram3.png		diagram3.png
download.sh		download.sh
requirements.txt		requirements.txt
test_retrieval.py		test_retrieval.py
train_GAN.py		train_GAN.py
train_TIRG.py		train_TIRG.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieve target image from composed image and text

Retrieval Model

Generative Model

Getting Started

Model TIRG is trained using the PYTHON file "train_TIRG.py":

StackGAN is trained using the PYTHON file "train_GAN.py":

TensorFlow for TIRG model:

Dataset

Download the pretrained model

Reference

About

Releases

Packages

Languages

alinstein/Modify-image-by-text

Folders and files

Latest commit

History

Repository files navigation

Retrieve target image from composed image and text

Retrieval Model

Generative Model

Getting Started

Model TIRG is trained using the PYTHON file "train_TIRG.py":

StackGAN is trained using the PYTHON file "train_GAN.py":

TensorFlow for TIRG model:

Dataset

Download the pretrained model

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages