From 701c8eb4340d0605b74d3c9e02b8c651560974d0 Mon Sep 17 00:00:00 2001
From: yxgeee <geyixiao831@gmail.com>
Date: Tue, 25 Aug 2020 03:06:32 +0800
Subject: [PATCH] update README

---
 README.md            | 186 +++++++------------------------------------
 docs/INSTALL.md      |  63 +++++++++++++++
 docs/MODEL_ZOO.md    |  10 +++
 docs/REPRODUCTION.md |  89 +++++++++++++++++++++
 docs/SFRS.md         |   7 ++
 hubconf.py           |   3 +-
 6 files changed, 198 insertions(+), 160 deletions(-)
 create mode 100644 docs/INSTALL.md
 create mode 100644 docs/MODEL_ZOO.md
 create mode 100644 docs/REPRODUCTION.md
 create mode 100644 docs/SFRS.md
diff --git a/README.md b/README.md
index 295474f..40d6b69 100644
--- a/README.md
+++ b/README.md
@@ -12,185 +12,53 @@
 `OpenIBL` is an open-source PyTorch-based codebase for image-based localization, or in other words, place recognition. It supports multiple state-of-the-art methods, and also covers the official implementation for our ECCV-2020 spotlight paper **SFRS**. We support **single/multi-node multi-gpu distributed** training and testing, launched by `slurm` or `pytorch`.
 
 #### Official implementation:
-+ SFRS: Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (ECCV'20 **Spotlight**) [[paper]](https://arxiv.org/abs/2006.03926) [[Blog(Chinese)]](https://zhuanlan.zhihu.com/p/169596514)
++ [SFRS](docs/SFRS.md): Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (ECCV'20 **Spotlight**) [[paper]](https://arxiv.org/abs/2006.03926) [[Blog(Chinese)]](https://zhuanlan.zhihu.com/p/169596514)
 
 #### Unofficial implementation:
 + NetVLAD: CNN architecture for weakly supervised place recognition (CVPR'16) [[paper]](https://arxiv.org/abs/1511.07247) [[official code (MatConvNet)]](https://github.com/Relja/netvlad)
 + SARE: Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization (ICCV'19) [[paper]](https://arxiv.org/abs/1808.08779) [[official code (MatConvNet)]](https://github.com/Liumouliu/deepIBL)
 
-## Self-supervising Fine-grained Region Similarities (ECCV'20 Spotlight)
+## Quick Start
 
-NetVLAD first proposed a VLAD layer trained with `triplet` loss, and then SARE introduced two softmax-based losses (`sare_ind` and `sare_joint`) to boost the training. Our SFRS is trained in generations with self-enhanced soft-label losses to achieve state-of-the-art performance.
-
-<p align="center">
-    <img src="figs/sfrs_fm.png" width="60%">
-</p>
-
-## Installation
-
-This repo was tested with Python 3.6, PyTorch 1.1.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=1.0.0. (0.4.x may be also ok)
-```shell
-python setup.py develop
-```
-
-## Preparation
-
-### Datasets
-
-Currently, we support [Pittsburgh](https://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Torii_Visual_Place_Recognition_2013_CVPR_paper.pdf), [Tokyo 24/7](https://www.di.ens.fr/~josef/publications/Torii15.pdf) and [Tokyo Time Machine](https://arxiv.org/abs/1511.07247) datasets. The access of the above datasets can be found [here](https://www.di.ens.fr/willow/research/netvlad/).
-
-```shell
-cd examples && mkdir data
-```
-Download the raw datasets and then unzip them under the directory like
-```shell
-examples/data
-├── pitts
-│   ├── raw
-│   │   ├── pitts250k_test.mat
-│   │   ├── pitts250k_train.mat
-│   │   ├── pitts250k_val.mat
-│   │   ├── pitts30k_test.mat
-│   │   ├── pitts30k_train.mat
-│   │   ├── pitts30k_val.mat
-│   └── └── Pittsburgh/
-└── tokyo
-    ├── raw
-    │   ├── tokyo247/
-    │   ├── tokyo247.mat
-    │   ├── tokyoTM/
-    │   ├── tokyoTM_train.mat
-    └── └── tokyoTM_val.mat
-```
-
-### Pre-trained Weights
-
-```shell
-mkdir logs && cd logs
-```
-After preparing the pre-trained weights, the file tree should be
-```shell
-logs
-├── vd16_offtheshelf_conv5_3_max.pth # refer to (1)
-└── vgg16_pitts_64_desc_cen.hdf5 # refer to (2)
-```
-
-**(1) imageNet-pretrained weights for VGG16 backbone from MatConvNet**
-
-The official repos of NetVLAD and SARE are based on MatConvNet. To reproduce their results, we need to load the same pretrained weights. Directly download from [Google Drive](https://drive.google.com/file/d/1kYIbFjbb0RuNuD0cRIlKmOteFVI1jRzR/view?usp=sharing) and save it under the path of `logs/`.
-
-**(2) initial cluster centers for VLAD layer**
-
-**Note:** it is important as the VLAD layer cannot work with random initialization.
-
-The original cluster centers provided by NetVLAD are highly **recommended**. You could directly download from [Google Drive](https://drive.google.com/file/d/1G5I48fVGOrOk8hPaNGni6q7fRcD_37gI/view?usp=sharing) and save it under the path of `logs/`.
-
-Or you could compute the centers by running the script
-```shell
-./scripts/cluster.sh vgg16
-```
-
-
-## Train
-
-All the training details (hyper-parameters, trained layers, backbones, etc.) strictly follow the original MatConvNet version of NetVLAD and SARE. **Note:** the results of all three methods (SFRS, NetVLAD, SARE) can be reproduced by training on Pitts30k-train and directly testing on the other datasets.
-
-The default scripts adopt 4 GPUs (require ~11G per GPU) for training, where each GPU loads one tuple (anchor, positive(s), negatives).
-+ In case you want to fasten training, enlarge `GPUS` for more GPUs, or enlarge the `--tuple-size` for more tuples on one GPU;
-+ In case your GPU does not have enough memory (e.g. <11G), reduce `--pos-num` (only for SFRS) or `--neg-num` for fewer positives or negatives in one tuple.
-
-#### PyTorch launcher: single-node multi-gpu distributed training
-
-NetVLAD:
-```shell
-./scripts/train_baseline_dist.sh triplet
-```
-
-SARE:
-```shell
-./scripts/train_baseline_dist.sh sare_ind
-# or
-./scripts/train_baseline_dist.sh sare_joint
-```
-
-SFRS (state-of-the-art):
-```shell
-./scripts/train_sfrs_dist.sh
-```
-
-#### Slurm launcher: single/multi-node multi-gpu distributed training
-
-Change `GPUS` and `GPUS_PER_NODE` accordingly in the scripts for your need.
-
-NetVLAD:
-```shell
-./scripts/train_baseline_slurm.sh <PARTITION NAME> triplet
-```
-
-SARE:
-```shell
-./scripts/train_baseline_slurm.sh <PARTITION NAME> sare_ind
-# or
-./scripts/train_baseline_slurm.sh <PARTITION NAME> sare_joint
-```
-
-SFRS (state-of-the-art):
+### Extract descriptor for a single image
 ```shell
-./scripts/train_sfrs_slurm.sh <PARTITION NAME>
-```
+import torch
+from PIL import Image
+from ibl.utils.data import get_transformer_test
 
-## Test
+# load the best model with PCA (trained by our SFRS)
+model = torch.hub.load('yxgeee/OpenIBL', 'vgg16_netvlad', pretrained=True).eval()
 
-During testing, the python scripts will automatically compute the PCA weights from Pitts30k-train or directly load from local files. Generally, `model_best.pth.tar` which is selected by validation in the training performs the best.
+# read image
+img = Image.open('image.jpg').convert('RGB') # modify the image path according to your need
+transformer = get_transformer_test(480, 640) # (height, width)
+img = transformer(img)
 
-The default scripts adopt 8 GPUs (require ~11G per GPU) for testing.
-+ In case you want to fasten testing, enlarge `GPUS` for more GPUs, or enlarge the `--test-batch-size` for larger batch size on one GPU, or add `--sync-gather` for faster gathering from multiple threads;
-+ In case your GPU does not have enough memory (e.g. <11G), reduce `--test-batch-size` for smaller batch size on one GPU.
+# use GPU (optional)
+mdoel = model.cuda()
+img = img.cuda()
 
-#### PyTorch launcher: single-node multi-gpu distributed testing
-
-Pitts250k-test:
-```shell
-./scripts/test_dist.sh <PATH TO MODEL> pitts 250k
+# extract descriptor (4096-dim)
+with torch.no_grad():
+    des = model(img.unsqueeze(0))[0]
+des = des.cpu().numpy()
 ```
 
-Pitts30k-test:
-```shell
-./scripts/test_dist.sh <PATH TO MODEL> pitts 30k
-```
-
-Tokyo 24/7:
-```shell
-./scripts/test_dist.sh <PATH TO MODEL> tokyo
-```
+## Installation
 
-#### Slurm launcher: single/multi-node multi-gpu distributed testing
+Please refer to [INSTALL.md](docs/INSTALL.md) for installation and dataset preparation.
 
-Pitts250k-test:
-```shell
-./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> pitts 250k
-```
+## Train & Test
 
-Pitts30k-test:
-```shell
-./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> pitts 30k
-```
+To reproduce the results in papers, you could train and test the models following the instruction in [REPRODUCTION.md](docs/REPRODUCTION.md).
 
-Tokyo 24/7:
-```shell
-./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> tokyo
-```
+## Model Zoo
 
-## Trained models
+Please refer to [MODEL_ZOO.md](docs/MODEL_ZOO.md) for trained models.
 
-**Note:** the models and results for NetVLAD and SARE here are trained by this repo, showing a slight difference from their original paper.
+## License
 
-|   Model   |  Trained on  |   Tested on    |  Recall@1    |  Recall@5    |  Recall@10   | Download Link |
-| :--------: | :---------: | :-----------: | :----------: | :----------: | :----------: | :----------: |
-| SARE_ind | Pitts30k-train | Pitts250k-test | 88.4% | 95.0% | 96.5% | [Google Drive](https://drive.google.com/drive/folders/1ZNGdXVRwUJvGH0ZJdwy18A8e9H0wnFHc?usp=sharing) |
-| SARE_ind | Pitts30k-train | Tokyo 24/7 | 81.0% | 88.6% | 90.2% | same as above |
-| **SFRS** | Pitts30k-train | Pitts250k-test | 90.7% | 96.4% | 97.6% | [Google Drive](https://drive.google.com/drive/folders/1FLjxFhKRO-YJQ6FI-DcCMMHDL2K_Hsof?usp=sharing) |
-| **SFRS** | Pitts30k-train | Tokyo 24/7 | 85.4% | 91.1% | 93.3% | same as above |
+`OpenIBL` is released under the [MIT license](LICENSE).
 
 
 ## Citation
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
new file mode 100644
index 0000000..221eaed
--- /dev/null
+++ b/docs/INSTALL.md
@@ -0,0 +1,63 @@
+## Installation
+
+This repo was tested with Python 3.6, PyTorch 1.1.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=1.0.0. (0.4.x may be also ok)
+```shell
+python setup.py develop # OR python setup.py install
+```
+
+## Preparation
+
+### Datasets
+
+Currently, we support [Pittsburgh](https://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Torii_Visual_Place_Recognition_2013_CVPR_paper.pdf), [Tokyo 24/7](https://www.di.ens.fr/~josef/publications/Torii15.pdf) and [Tokyo Time Machine](https://arxiv.org/abs/1511.07247) datasets. The access of the above datasets can be found [here](https://www.di.ens.fr/willow/research/netvlad/).
+
+```shell
+cd examples && mkdir data
+```
+Download the raw datasets and then unzip them under the directory like
+```shell
+examples/data
+├── pitts
+│   ├── raw
+│   │   ├── pitts250k_test.mat
+│   │   ├── pitts250k_train.mat
+│   │   ├── pitts250k_val.mat
+│   │   ├── pitts30k_test.mat
+│   │   ├── pitts30k_train.mat
+│   │   ├── pitts30k_val.mat
+│   └── └── Pittsburgh/
+└── tokyo
+    ├── raw
+    │   ├── tokyo247/
+    │   ├── tokyo247.mat
+    │   ├── tokyoTM/
+    │   ├── tokyoTM_train.mat
+    └── └── tokyoTM_val.mat
+```
+
+### Pre-trained Weights
+
+```shell
+mkdir logs && cd logs
+```
+After preparing the pre-trained weights, the file tree should be
+```shell
+logs
+├── vd16_offtheshelf_conv5_3_max.pth # refer to (1)
+└── vgg16_pitts_64_desc_cen.hdf5 # refer to (2)
+```
+
+**(1) imageNet-pretrained weights for VGG16 backbone from MatConvNet**
+
+The official repos of NetVLAD and SARE are based on MatConvNet. To reproduce their results, we need to load the same pretrained weights. Directly download from [Google Drive](https://drive.google.com/file/d/1kYIbFjbb0RuNuD0cRIlKmOteFVI1jRzR/view?usp=sharing) and save it under the path of `logs/`.
+
+**(2) initial cluster centers for VLAD layer**
+
+**Note:** it is important as the VLAD layer cannot work with random initialization.
+
+The original cluster centers provided by NetVLAD are highly **recommended**. You could directly download from [Google Drive](https://drive.google.com/file/d/1G5I48fVGOrOk8hPaNGni6q7fRcD_37gI/view?usp=sharing) and save it under the path of `logs/`.
+
+Or you could compute the centers by running the script
+```shell
+./scripts/cluster.sh vgg16
+```
diff --git a/docs/MODEL_ZOO.md b/docs/MODEL_ZOO.md
new file mode 100644
index 0000000..7d06f8f
--- /dev/null
+++ b/docs/MODEL_ZOO.md
@@ -0,0 +1,10 @@
+## Model Zoo
+
+**Note:** the models and results for NetVLAD and SARE here are trained by this repo, showing a slight difference from their original paper.
+
+|   Model   |  Trained on  |   Tested on    |  Recall@1    |  Recall@5    |  Recall@10   | Download Link |
+| :--------: | :---------: | :-----------: | :----------: | :----------: | :----------: | :----------: |
+| SARE_ind | Pitts30k-train | Pitts250k-test | 88.4% | 95.0% | 96.5% | [Google Drive](https://drive.google.com/drive/folders/1ZNGdXVRwUJvGH0ZJdwy18A8e9H0wnFHc?usp=sharing) |
+| SARE_ind | Pitts30k-train | Tokyo 24/7 | 81.0% | 88.6% | 90.2% | same as above |
+| **SFRS** | Pitts30k-train | Pitts250k-test | 90.7% | 96.4% | 97.6% | [Google Drive](https://drive.google.com/drive/folders/1FLjxFhKRO-YJQ6FI-DcCMMHDL2K_Hsof?usp=sharing) |
+| **SFRS** | Pitts30k-train | Tokyo 24/7 | 85.4% | 91.1% | 93.3% | same as above |
diff --git a/docs/REPRODUCTION.md b/docs/REPRODUCTION.md
new file mode 100644
index 0000000..6afaed0
--- /dev/null
+++ b/docs/REPRODUCTION.md
@@ -0,0 +1,89 @@
+## Train
+
+All the training details (hyper-parameters, trained layers, backbones, etc.) strictly follow the original MatConvNet version of NetVLAD and SARE. **Note:** the results of all three methods (SFRS, NetVLAD, SARE) can be reproduced by training on Pitts30k-train and directly testing on the other datasets.
+
+The default scripts adopt 4 GPUs (require ~11G per GPU) for training, where each GPU loads one tuple (anchor, positive(s), negatives).
++ In case you want to fasten training, enlarge `GPUS` for more GPUs, or enlarge the `--tuple-size` for more tuples on one GPU;
++ In case your GPU does not have enough memory (e.g. <11G), reduce `--pos-num` (only for SFRS) or `--neg-num` for fewer positives or negatives in one tuple.
+
+#### PyTorch launcher: single-node multi-gpu distributed training
+
+NetVLAD:
+```shell
+./scripts/train_baseline_dist.sh triplet
+```
+
+SARE:
+```shell
+./scripts/train_baseline_dist.sh sare_ind
+# or
+./scripts/train_baseline_dist.sh sare_joint
+```
+
+SFRS (state-of-the-art):
+```shell
+./scripts/train_sfrs_dist.sh
+```
+
+#### Slurm launcher: single/multi-node multi-gpu distributed training
+
+Change `GPUS` and `GPUS_PER_NODE` accordingly in the scripts for your need.
+
+NetVLAD:
+```shell
+./scripts/train_baseline_slurm.sh <PARTITION NAME> triplet
+```
+
+SARE:
+```shell
+./scripts/train_baseline_slurm.sh <PARTITION NAME> sare_ind
+# or
+./scripts/train_baseline_slurm.sh <PARTITION NAME> sare_joint
+```
+
+SFRS (state-of-the-art):
+```shell
+./scripts/train_sfrs_slurm.sh <PARTITION NAME>
+```
+
+## Test
+
+During testing, the python scripts will automatically compute the PCA weights from Pitts30k-train or directly load from local files. Generally, `model_best.pth.tar` which is selected by validation in the training performs the best.
+
+The default scripts adopt 8 GPUs (require ~11G per GPU) for testing.
++ In case you want to fasten testing, enlarge `GPUS` for more GPUs, or enlarge the `--test-batch-size` for larger batch size on one GPU, or add `--sync-gather` for faster gathering from multiple threads;
++ In case your GPU does not have enough memory (e.g. <11G), reduce `--test-batch-size` for smaller batch size on one GPU.
+
+#### PyTorch launcher: single-node multi-gpu distributed testing
+
+Pitts250k-test:
+```shell
+./scripts/test_dist.sh <PATH TO MODEL> pitts 250k
+```
+
+Pitts30k-test:
+```shell
+./scripts/test_dist.sh <PATH TO MODEL> pitts 30k
+```
+
+Tokyo 24/7:
+```shell
+./scripts/test_dist.sh <PATH TO MODEL> tokyo
+```
+
+#### Slurm launcher: single/multi-node multi-gpu distributed testing
+
+Pitts250k-test:
+```shell
+./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> pitts 250k
+```
+
+Pitts30k-test:
+```shell
+./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> pitts 30k
+```
+
+Tokyo 24/7:
+```shell
+./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> tokyo
+```
diff --git a/docs/SFRS.md b/docs/SFRS.md
new file mode 100644
index 0000000..1c28971
--- /dev/null
+++ b/docs/SFRS.md
@@ -0,0 +1,7 @@
+## Self-supervising Fine-grained Region Similarities (ECCV'20 Spotlight)
+
+NetVLAD first proposed a VLAD layer trained with `triplet` loss, and then SARE introduced two softmax-based losses (`sare_ind` and `sare_joint`) to boost the training. Our SFRS is trained in generations with self-enhanced soft-label losses to achieve state-of-the-art performance.
+
+<p align="center">
+    <img src="figs/sfrs_fm.png" width="60%">
+</p>
diff --git a/hubconf.py b/hubconf.py
index 16aa1d1..567e469 100644
--- a/hubconf.py
+++ b/hubconf.py
@@ -6,5 +6,6 @@ def vgg16_netvlad(pretrained=False):
     base_model = models.create('vgg16', pretrained=False)
     pool_layer = models.create('netvlad', dim=base_model.feature_dim)
     model = models.create('embednetpca', base_model, pool_layer)
-    model.load_state_dict(torch.hub.load_state_dict_from_url('https://github.com/yxgeee/OpenIBL/releases/download/v0.1.0-beta/vgg16_netvlad.pth'))
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url('https://github.com/yxgeee/OpenIBL/releases/download/v0.1.0-beta/vgg16_netvlad.pth'))
     return model