- The paper attempts to rethink Text-based Search and Image-based Search by proposing a novel Region-Based Image retrieval (RBIR) methodology
- RBIR attempts to address the relationship between spatial query and relationship,it can handle the query composed of semantic concepts (i.e., object categories) and their spatial layout.
- This is because users want to constrain their retrieval both semantically and spatially
- The paper leverages how those elements are spatially are arranged.
- Users can to create, manipulate and annotate all the bounding boxes to specify their search intent, a neural agent can automatically retrieve relevant images.
- Represent database images using pre-trained deep visual features
- Train a ConvNet model to synthesize visual representation from the query
- Use the synthesized feature to retrieve the database images
Specifically, the method transforms the user canvas queryto spatial-semantic representation, each spatial location is associated with the semantic word vector (word2vec). Then a Convolutional Neural Network (CNN) synthesizes the appropriate visual feature. The network is trained with three loss functions, Similarity Loss
, Discriminative Loss
and Ranking Loss
.
Quantitative Evaluation on Visual Genome and MS-COCO datasets shows that the method outperforms all the other methods.