This project builds upon CosPlace, originally created by Gabriele Berton. For installation and usage details, please refer to the original repository.
Our research aims to enhance the CosPlace framework for Visual Geo-localization (VG) by exploring various architectural and feature choices. We focus on constructing, training, and evaluating different models to gain insights into the effectiveness of specific design decisions in the VG pipeline. Our objective is to establish a systematic evaluation protocol for method comparison. Leveraging our framework, we conduct extensive experiments to benchmark and optimize model parameters while assessing the impact of engineering techniques on model performance.
We've implemented several data augmentation techniques, including:
- Horizontal flipping
- Blurring
- Color jittering
Each technique offers different parameter configurations. Refer to the accompanying table for results.
SF_XS R@1 | SF_XS R@5 | SF_XS R@10 | SF_XS R@20 | Tokyo_XS R@1 | Tokyo_XS R@5 | Tokyo_XS R@10 | Tokyo_XS R@20 | |
---|---|---|---|---|---|---|---|---|
Baseline | 16.3 | 28.1 | 34.0 | 40.1 | 28.9 | 46.0 | 59.0 | 71.1 |
Random Horizontal Flip | 15.1 | 27.1 | 32.6 | 37.9 | 27.6 | 51.7 | 61.9 | 72.1 |
Gaussian Blur (kernel_size=5, sigma=(0.5,1)) | 14.5 | 25.3 | 32.1 | 38.3 | 26.1 | 49.8 | 60.0 | 70.1 |
Color-Jitter with contrast [1.0, 1.5] | 19.7 | 33.0 | 37.9 | 43.6 | 37.8 | 53.7 | 59.0 | 70.2 |
Color-Jitter with contrast [1.5, 2.0] | 19.5 | 32.1 | 38.3 | 43.8 | 36.5 | 52.7 | 62.2 | 70.8 |
Color-Jitter with contrast [3.0, 4.0] | 16.9 | 30.1 | 35.8 | 41.9 | 30.8 | 49.2 | 54.0 | 66.0 |
Table: Augmentation results
We explore three final aggregation layers:
- GeM Pooling Layer: The default layer in the original 'CosPlace.'
- MixVPR Pooling Layer: Inspired by the paper "Brahim Chaib-draa Amar Ali-bey and Philippe Giguere, Mixvpr: Feature Mixing for Visual Place Recognition (2023)." We made customizations to suit our specific requirements.
- NetVLAD Layer: This layer is initialized using an unsupervised learning approach to determine cluster centroids. The optimal number of clusters (15) is established through an elbow search using the k-means algorithm.
Table results demonstrate that MixVPR consistently improves model performance across various recall metrics.
Aggregator | SF_XS R@1 | SF_XS R@5 | SF_XS R@10 | SF_XS R@20 | Tokyo_XS R@1 | Tokyo_XS R@5 | Tokyo_XS R@10 | Tokyo_XS R@20 |
---|---|---|---|---|---|---|---|---|
GeM | 19.1 | 30.4 | 36.2 | 43.1 | 38.1 | 55.9 | 63.5 | 71.4 |
NetVLAD | 22.8 | 39.0 | 44.7 | 51.0 | 36.2 | 55.2 | 64.4 | 72.7 |
MixVPR | 30.9 | 44.4 | 51.8 | 57.4 | 48.9 | 68.9 | 75.9 | 81.6 |
Table: Aggregation results
To address the domain shift problem, we incorporated Adversarial Domain Adaptation (ADDA) using Tokyo_XS and St_Lucia as target domains while testing on SF_XS. In this setup, we employed the NetVLAD aggregation layer and utilized a domain discriminator network during training. The goal was to minimize the domain distribution discrepancy between the source and target domains.
Table results show improved model generalization when Tokyo is chosen as the target domain.
Source Domain | Target Domain | SF_XS Test |
---|---|---|
R@1: 24.7 | ||
SF_XS | Tokyo_XS | R@5: 39.0 |
train | database | R@10: 46.4 |
R@20: 54.0 | ||
R@1: 24.8 | ||
SF_XS | St_Lucia | R@5: 38.3 |
train | database | R@10: 44.2 |
R@20: 51.3 |
Table: Domain Adaptation results
In our final set of experiments, we evaluated the performance of the Adam and AdamW optimizers using both the NetVLAD and MixVPR aggregation layers. However, the results did not demonstrate a significant improvement in performance. For detailed optimizer results, please refer to the ReportML_DL.pdf file.
This research extends the CosPlace framework, providing valuable insights into architectural and feature choices for visual geo-localization. Our experiments and findings contribute to a better understanding of how different design decisions impact model performance in this domain.