From 701c8eb4340d0605b74d3c9e02b8c651560974d0 Mon Sep 17 00:00:00 2001 From: yxgeee Date: Tue, 25 Aug 2020 03:06:32 +0800 Subject: [PATCH] update README --- README.md | 186 +++++++------------------------------------ docs/INSTALL.md | 63 +++++++++++++++ docs/MODEL_ZOO.md | 10 +++ docs/REPRODUCTION.md | 89 +++++++++++++++++++++ docs/SFRS.md | 7 ++ hubconf.py | 3 +- 6 files changed, 198 insertions(+), 160 deletions(-) create mode 100644 docs/INSTALL.md create mode 100644 docs/MODEL_ZOO.md create mode 100644 docs/REPRODUCTION.md create mode 100644 docs/SFRS.md diff --git a/README.md b/README.md index 295474f..40d6b69 100644 --- a/README.md +++ b/README.md @@ -12,185 +12,53 @@ `OpenIBL` is an open-source PyTorch-based codebase for image-based localization, or in other words, place recognition. It supports multiple state-of-the-art methods, and also covers the official implementation for our ECCV-2020 spotlight paper **SFRS**. We support **single/multi-node multi-gpu distributed** training and testing, launched by `slurm` or `pytorch`. #### Official implementation: -+ SFRS: Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (ECCV'20 **Spotlight**) [[paper]](https://arxiv.org/abs/2006.03926) [[Blog(Chinese)]](https://zhuanlan.zhihu.com/p/169596514) ++ [SFRS](docs/SFRS.md): Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (ECCV'20 **Spotlight**) [[paper]](https://arxiv.org/abs/2006.03926) [[Blog(Chinese)]](https://zhuanlan.zhihu.com/p/169596514) #### Unofficial implementation: + NetVLAD: CNN architecture for weakly supervised place recognition (CVPR'16) [[paper]](https://arxiv.org/abs/1511.07247) [[official code (MatConvNet)]](https://github.com/Relja/netvlad) + SARE: Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization (ICCV'19) [[paper]](https://arxiv.org/abs/1808.08779) [[official code (MatConvNet)]](https://github.com/Liumouliu/deepIBL) -## Self-supervising Fine-grained Region Similarities (ECCV'20 Spotlight) +## Quick Start -NetVLAD first proposed a VLAD layer trained with `triplet` loss, and then SARE introduced two softmax-based losses (`sare_ind` and `sare_joint`) to boost the training. Our SFRS is trained in generations with self-enhanced soft-label losses to achieve state-of-the-art performance. - -

- -

- -## Installation - -This repo was tested with Python 3.6, PyTorch 1.1.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=1.0.0. (0.4.x may be also ok) -```shell -python setup.py develop -``` - -## Preparation - -### Datasets - -Currently, we support [Pittsburgh](https://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Torii_Visual_Place_Recognition_2013_CVPR_paper.pdf), [Tokyo 24/7](https://www.di.ens.fr/~josef/publications/Torii15.pdf) and [Tokyo Time Machine](https://arxiv.org/abs/1511.07247) datasets. The access of the above datasets can be found [here](https://www.di.ens.fr/willow/research/netvlad/). - -```shell -cd examples && mkdir data -``` -Download the raw datasets and then unzip them under the directory like -```shell -examples/data -├── pitts -│   ├── raw -│   │   ├── pitts250k_test.mat -│   │   ├── pitts250k_train.mat -│   │   ├── pitts250k_val.mat -│   │   ├── pitts30k_test.mat -│   │   ├── pitts30k_train.mat -│   │   ├── pitts30k_val.mat -│   └── └── Pittsburgh/ -└── tokyo - ├── raw - │   ├── tokyo247/ - │   ├── tokyo247.mat - │   ├── tokyoTM/ - │   ├── tokyoTM_train.mat - └── └── tokyoTM_val.mat -``` - -### Pre-trained Weights - -```shell -mkdir logs && cd logs -``` -After preparing the pre-trained weights, the file tree should be -```shell -logs -├── vd16_offtheshelf_conv5_3_max.pth # refer to (1) -└── vgg16_pitts_64_desc_cen.hdf5 # refer to (2) -``` - -**(1) imageNet-pretrained weights for VGG16 backbone from MatConvNet** - -The official repos of NetVLAD and SARE are based on MatConvNet. To reproduce their results, we need to load the same pretrained weights. Directly download from [Google Drive](https://drive.google.com/file/d/1kYIbFjbb0RuNuD0cRIlKmOteFVI1jRzR/view?usp=sharing) and save it under the path of `logs/`. - -**(2) initial cluster centers for VLAD layer** - -**Note:** it is important as the VLAD layer cannot work with random initialization. - -The original cluster centers provided by NetVLAD are highly **recommended**. You could directly download from [Google Drive](https://drive.google.com/file/d/1G5I48fVGOrOk8hPaNGni6q7fRcD_37gI/view?usp=sharing) and save it under the path of `logs/`. - -Or you could compute the centers by running the script -```shell -./scripts/cluster.sh vgg16 -``` - - -## Train - -All the training details (hyper-parameters, trained layers, backbones, etc.) strictly follow the original MatConvNet version of NetVLAD and SARE. **Note:** the results of all three methods (SFRS, NetVLAD, SARE) can be reproduced by training on Pitts30k-train and directly testing on the other datasets. - -The default scripts adopt 4 GPUs (require ~11G per GPU) for training, where each GPU loads one tuple (anchor, positive(s), negatives). -+ In case you want to fasten training, enlarge `GPUS` for more GPUs, or enlarge the `--tuple-size` for more tuples on one GPU; -+ In case your GPU does not have enough memory (e.g. <11G), reduce `--pos-num` (only for SFRS) or `--neg-num` for fewer positives or negatives in one tuple. - -#### PyTorch launcher: single-node multi-gpu distributed training - -NetVLAD: -```shell -./scripts/train_baseline_dist.sh triplet -``` - -SARE: -```shell -./scripts/train_baseline_dist.sh sare_ind -# or -./scripts/train_baseline_dist.sh sare_joint -``` - -SFRS (state-of-the-art): -```shell -./scripts/train_sfrs_dist.sh -``` - -#### Slurm launcher: single/multi-node multi-gpu distributed training - -Change `GPUS` and `GPUS_PER_NODE` accordingly in the scripts for your need. - -NetVLAD: -```shell -./scripts/train_baseline_slurm.sh triplet -``` - -SARE: -```shell -./scripts/train_baseline_slurm.sh sare_ind -# or -./scripts/train_baseline_slurm.sh sare_joint -``` - -SFRS (state-of-the-art): +### Extract descriptor for a single image ```shell -./scripts/train_sfrs_slurm.sh -``` +import torch +from PIL import Image +from ibl.utils.data import get_transformer_test -## Test +# load the best model with PCA (trained by our SFRS) +model = torch.hub.load('yxgeee/OpenIBL', 'vgg16_netvlad', pretrained=True).eval() -During testing, the python scripts will automatically compute the PCA weights from Pitts30k-train or directly load from local files. Generally, `model_best.pth.tar` which is selected by validation in the training performs the best. +# read image +img = Image.open('image.jpg').convert('RGB') # modify the image path according to your need +transformer = get_transformer_test(480, 640) # (height, width) +img = transformer(img) -The default scripts adopt 8 GPUs (require ~11G per GPU) for testing. -+ In case you want to fasten testing, enlarge `GPUS` for more GPUs, or enlarge the `--test-batch-size` for larger batch size on one GPU, or add `--sync-gather` for faster gathering from multiple threads; -+ In case your GPU does not have enough memory (e.g. <11G), reduce `--test-batch-size` for smaller batch size on one GPU. +# use GPU (optional) +mdoel = model.cuda() +img = img.cuda() -#### PyTorch launcher: single-node multi-gpu distributed testing - -Pitts250k-test: -```shell -./scripts/test_dist.sh pitts 250k +# extract descriptor (4096-dim) +with torch.no_grad(): + des = model(img.unsqueeze(0))[0] +des = des.cpu().numpy() ``` -Pitts30k-test: -```shell -./scripts/test_dist.sh pitts 30k -``` - -Tokyo 24/7: -```shell -./scripts/test_dist.sh tokyo -``` +## Installation -#### Slurm launcher: single/multi-node multi-gpu distributed testing +Please refer to [INSTALL.md](docs/INSTALL.md) for installation and dataset preparation. -Pitts250k-test: -```shell -./scripts/test_slurm.sh pitts 250k -``` +## Train & Test -Pitts30k-test: -```shell -./scripts/test_slurm.sh pitts 30k -``` +To reproduce the results in papers, you could train and test the models following the instruction in [REPRODUCTION.md](docs/REPRODUCTION.md). -Tokyo 24/7: -```shell -./scripts/test_slurm.sh tokyo -``` +## Model Zoo -## Trained models +Please refer to [MODEL_ZOO.md](docs/MODEL_ZOO.md) for trained models. -**Note:** the models and results for NetVLAD and SARE here are trained by this repo, showing a slight difference from their original paper. +## License -| Model | Trained on | Tested on | Recall@1 | Recall@5 | Recall@10 | Download Link | -| :--------: | :---------: | :-----------: | :----------: | :----------: | :----------: | :----------: | -| SARE_ind | Pitts30k-train | Pitts250k-test | 88.4% | 95.0% | 96.5% | [Google Drive](https://drive.google.com/drive/folders/1ZNGdXVRwUJvGH0ZJdwy18A8e9H0wnFHc?usp=sharing) | -| SARE_ind | Pitts30k-train | Tokyo 24/7 | 81.0% | 88.6% | 90.2% | same as above | -| **SFRS** | Pitts30k-train | Pitts250k-test | 90.7% | 96.4% | 97.6% | [Google Drive](https://drive.google.com/drive/folders/1FLjxFhKRO-YJQ6FI-DcCMMHDL2K_Hsof?usp=sharing) | -| **SFRS** | Pitts30k-train | Tokyo 24/7 | 85.4% | 91.1% | 93.3% | same as above | +`OpenIBL` is released under the [MIT license](LICENSE). ## Citation diff --git a/docs/INSTALL.md b/docs/INSTALL.md new file mode 100644 index 0000000..221eaed --- /dev/null +++ b/docs/INSTALL.md @@ -0,0 +1,63 @@ +## Installation + +This repo was tested with Python 3.6, PyTorch 1.1.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=1.0.0. (0.4.x may be also ok) +```shell +python setup.py develop # OR python setup.py install +``` + +## Preparation + +### Datasets + +Currently, we support [Pittsburgh](https://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Torii_Visual_Place_Recognition_2013_CVPR_paper.pdf), [Tokyo 24/7](https://www.di.ens.fr/~josef/publications/Torii15.pdf) and [Tokyo Time Machine](https://arxiv.org/abs/1511.07247) datasets. The access of the above datasets can be found [here](https://www.di.ens.fr/willow/research/netvlad/). + +```shell +cd examples && mkdir data +``` +Download the raw datasets and then unzip them under the directory like +```shell +examples/data +├── pitts +│   ├── raw +│   │   ├── pitts250k_test.mat +│   │   ├── pitts250k_train.mat +│   │   ├── pitts250k_val.mat +│   │   ├── pitts30k_test.mat +│   │   ├── pitts30k_train.mat +│   │   ├── pitts30k_val.mat +│   └── └── Pittsburgh/ +└── tokyo + ├── raw + │   ├── tokyo247/ + │   ├── tokyo247.mat + │   ├── tokyoTM/ + │   ├── tokyoTM_train.mat + └── └── tokyoTM_val.mat +``` + +### Pre-trained Weights + +```shell +mkdir logs && cd logs +``` +After preparing the pre-trained weights, the file tree should be +```shell +logs +├── vd16_offtheshelf_conv5_3_max.pth # refer to (1) +└── vgg16_pitts_64_desc_cen.hdf5 # refer to (2) +``` + +**(1) imageNet-pretrained weights for VGG16 backbone from MatConvNet** + +The official repos of NetVLAD and SARE are based on MatConvNet. To reproduce their results, we need to load the same pretrained weights. Directly download from [Google Drive](https://drive.google.com/file/d/1kYIbFjbb0RuNuD0cRIlKmOteFVI1jRzR/view?usp=sharing) and save it under the path of `logs/`. + +**(2) initial cluster centers for VLAD layer** + +**Note:** it is important as the VLAD layer cannot work with random initialization. + +The original cluster centers provided by NetVLAD are highly **recommended**. You could directly download from [Google Drive](https://drive.google.com/file/d/1G5I48fVGOrOk8hPaNGni6q7fRcD_37gI/view?usp=sharing) and save it under the path of `logs/`. + +Or you could compute the centers by running the script +```shell +./scripts/cluster.sh vgg16 +``` diff --git a/docs/MODEL_ZOO.md b/docs/MODEL_ZOO.md new file mode 100644 index 0000000..7d06f8f --- /dev/null +++ b/docs/MODEL_ZOO.md @@ -0,0 +1,10 @@ +## Model Zoo + +**Note:** the models and results for NetVLAD and SARE here are trained by this repo, showing a slight difference from their original paper. + +| Model | Trained on | Tested on | Recall@1 | Recall@5 | Recall@10 | Download Link | +| :--------: | :---------: | :-----------: | :----------: | :----------: | :----------: | :----------: | +| SARE_ind | Pitts30k-train | Pitts250k-test | 88.4% | 95.0% | 96.5% | [Google Drive](https://drive.google.com/drive/folders/1ZNGdXVRwUJvGH0ZJdwy18A8e9H0wnFHc?usp=sharing) | +| SARE_ind | Pitts30k-train | Tokyo 24/7 | 81.0% | 88.6% | 90.2% | same as above | +| **SFRS** | Pitts30k-train | Pitts250k-test | 90.7% | 96.4% | 97.6% | [Google Drive](https://drive.google.com/drive/folders/1FLjxFhKRO-YJQ6FI-DcCMMHDL2K_Hsof?usp=sharing) | +| **SFRS** | Pitts30k-train | Tokyo 24/7 | 85.4% | 91.1% | 93.3% | same as above | diff --git a/docs/REPRODUCTION.md b/docs/REPRODUCTION.md new file mode 100644 index 0000000..6afaed0 --- /dev/null +++ b/docs/REPRODUCTION.md @@ -0,0 +1,89 @@ +## Train + +All the training details (hyper-parameters, trained layers, backbones, etc.) strictly follow the original MatConvNet version of NetVLAD and SARE. **Note:** the results of all three methods (SFRS, NetVLAD, SARE) can be reproduced by training on Pitts30k-train and directly testing on the other datasets. + +The default scripts adopt 4 GPUs (require ~11G per GPU) for training, where each GPU loads one tuple (anchor, positive(s), negatives). ++ In case you want to fasten training, enlarge `GPUS` for more GPUs, or enlarge the `--tuple-size` for more tuples on one GPU; ++ In case your GPU does not have enough memory (e.g. <11G), reduce `--pos-num` (only for SFRS) or `--neg-num` for fewer positives or negatives in one tuple. + +#### PyTorch launcher: single-node multi-gpu distributed training + +NetVLAD: +```shell +./scripts/train_baseline_dist.sh triplet +``` + +SARE: +```shell +./scripts/train_baseline_dist.sh sare_ind +# or +./scripts/train_baseline_dist.sh sare_joint +``` + +SFRS (state-of-the-art): +```shell +./scripts/train_sfrs_dist.sh +``` + +#### Slurm launcher: single/multi-node multi-gpu distributed training + +Change `GPUS` and `GPUS_PER_NODE` accordingly in the scripts for your need. + +NetVLAD: +```shell +./scripts/train_baseline_slurm.sh triplet +``` + +SARE: +```shell +./scripts/train_baseline_slurm.sh sare_ind +# or +./scripts/train_baseline_slurm.sh sare_joint +``` + +SFRS (state-of-the-art): +```shell +./scripts/train_sfrs_slurm.sh +``` + +## Test + +During testing, the python scripts will automatically compute the PCA weights from Pitts30k-train or directly load from local files. Generally, `model_best.pth.tar` which is selected by validation in the training performs the best. + +The default scripts adopt 8 GPUs (require ~11G per GPU) for testing. ++ In case you want to fasten testing, enlarge `GPUS` for more GPUs, or enlarge the `--test-batch-size` for larger batch size on one GPU, or add `--sync-gather` for faster gathering from multiple threads; ++ In case your GPU does not have enough memory (e.g. <11G), reduce `--test-batch-size` for smaller batch size on one GPU. + +#### PyTorch launcher: single-node multi-gpu distributed testing + +Pitts250k-test: +```shell +./scripts/test_dist.sh pitts 250k +``` + +Pitts30k-test: +```shell +./scripts/test_dist.sh pitts 30k +``` + +Tokyo 24/7: +```shell +./scripts/test_dist.sh tokyo +``` + +#### Slurm launcher: single/multi-node multi-gpu distributed testing + +Pitts250k-test: +```shell +./scripts/test_slurm.sh pitts 250k +``` + +Pitts30k-test: +```shell +./scripts/test_slurm.sh pitts 30k +``` + +Tokyo 24/7: +```shell +./scripts/test_slurm.sh tokyo +``` diff --git a/docs/SFRS.md b/docs/SFRS.md new file mode 100644 index 0000000..1c28971 --- /dev/null +++ b/docs/SFRS.md @@ -0,0 +1,7 @@ +## Self-supervising Fine-grained Region Similarities (ECCV'20 Spotlight) + +NetVLAD first proposed a VLAD layer trained with `triplet` loss, and then SARE introduced two softmax-based losses (`sare_ind` and `sare_joint`) to boost the training. Our SFRS is trained in generations with self-enhanced soft-label losses to achieve state-of-the-art performance. + +

+ +

diff --git a/hubconf.py b/hubconf.py index 16aa1d1..567e469 100644 --- a/hubconf.py +++ b/hubconf.py @@ -6,5 +6,6 @@ def vgg16_netvlad(pretrained=False): base_model = models.create('vgg16', pretrained=False) pool_layer = models.create('netvlad', dim=base_model.feature_dim) model = models.create('embednetpca', base_model, pool_layer) - model.load_state_dict(torch.hub.load_state_dict_from_url('https://github.com/yxgeee/OpenIBL/releases/download/v0.1.0-beta/vgg16_netvlad.pth')) + if pretrained: + model.load_state_dict(torch.hub.load_state_dict_from_url('https://github.com/yxgeee/OpenIBL/releases/download/v0.1.0-beta/vgg16_netvlad.pth')) return model