Skip to content

Latest commit

 

History

History
93 lines (55 loc) · 7.48 KB

README.md

File metadata and controls

93 lines (55 loc) · 7.48 KB

SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction

Code for the paper

SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction
Jasmine Sekhon, Cody Fleming
Accepted at AAAI-2021

SCAN is a Spatial Context Attentive Network that can jointly predict trajectories for all pedestrians in a scene over a future time window by attending to spatial contexts experienced by them individually over an observed time window.

Model Architecture

Our model contains an LSTM-based Encoder-Decoder Framework that takes as input observed trajectories for all pedestrians in the frame and jointly predicts future trajectories for all the pedestrians in the given frame. To account for spatial influences of spatially close pedestrians on each other, our model uses a Spatial Attention Mechanism that infers and incorporates perceived spatial contexts into each pedestrian's LSTM's knowledge. In the decoder, our model additionally uses a temporal attention mechanism to attend to the observed spatial contexts for each pedestrian, to enable the model to learn how to navigate by learning from previously encountered spatial situations.

Example Predictions

Multiple Socially Plausible Predictions

Human motion is multimodal, and often there is no single correct future trajectories. Given an observed trajectory and spatial context, a pedestrian may follow several different trajectories in the future. Taking this uncertain nature of pedestrian motion into account, we also propose GenerativeSCAN which is a GAN-based SCAN framework, capable of generating multiple socially feasible future trajectories for all pedestrians in the frame. Below we show examples of socially acceptable trajectories being generated by our model for pedestrians in the ZARA1 dataset. The sample is from ZARA1 test dataset plotted for k=10 and \lambda=0. 10 trajectories are generated and visualized per the pedestrians in the scene. The generated trajectories are shown in blue and the red trajectories are the observed trajectories for the pedestrians.

Diverse Trajectories

In GenerativeSCAN we attempt to encourage the model to generate diverse trajectories for a lower value of k by incorporating a diversity loss. Below we show examples of multiple socially acceptable trajectories being generated by our model for pedestrians in the ZARA1 dataset plotted for k=5 and \lambda=0.1.

As can be seen by comparing the two sets of predictions, incorporating diversity loss yields much more diverse predictions for a smaller k value.

Socially Acceptable Future Trajectories

It can also be observed from the predictions above that our spatial attention mechanism is able to learn and account for the influence of neighboring pedestrians' observed and future trajectories on a pedestrian. Therefore, our model's predictions reflect behavior that respects social navigation norms, such as, avoiding collision, yielding right-of-way and is also able to exhibit complex social behavior such as walking in groups/pairs, groups avoiding collisions with other groups and so on.

Training Details

We train and evaluate our models on five publicly available datasets: ETH, HOTEL, UNIV, ZARA1, ZARA2. We follow a leave-one-out process where we train on four of the five models and test on the fifth. The exact training, validation, test datasets we use are in directory data/ . For each pedestrian in a given frame, our model observes the trajectory for 8 time steps (3.2 seconds) and predicts intent over future 12 time steps (3.2 seconds) jointly for all pedestrians in the scene.

There are several available models to choose from, main variants being:

  1. vanilla, which is a vanilla LSTM-based autoencoder,
  2. temporal, which is an LSTM-based autoencoder with temporal attention in the decoder,
  3. spatial, which is referred to in the paper as vanillaSCAN, and is an LSTM-based autoencoder with spatial attention mechanism,
  4. spatial_temporal, which is our proposed model SCAN,
  5. generative_spatial_temporal, which is GenerativeSCAN, a GAN-based SCAN capable of predicting multiple socially plausible trajectories.

in progress: SCAN with scene context trained to extract scene relevant features from the static scene image pertaining to a dataset, using pretrained resnet-18 model

To train SCAN, i.e., the deterministic model with our chosen hyperparameters, simply edit --dset_name and model_type arguments in the scripts/script_train_deterministic.sh script.

All other arguments are specified and explained in arguments.py.

To train GenerativeSCAN with our chosen hyperparameters, similarly edit --dset_name and model_type arguments in the scripts/script_train_generative.sh script. All other arguments are specified and explained in arguments.py.

To evaluate trained models for SCAN, i.e., the deterministic model with our chosen hyperparameters on all of the datasets, run

sh scripts/script_evaluate_deterministic.sh 

and

sh scripts/script_evaluate_generative.sh

Dataset Splits

The directory data_sgan contains data splits originally used by Social GAN, adopted by many other more recent works. However, for the ETH dataset, the original video is accelerated compared to the others, and the original datasets and the original annotations are already sampled at 2.5fps to account for that accordingly. The authors in Social GAN erroneously treat every 10 frames as 0.4s instead of the original dataset which treats every 6 frames as 0.4s. Instead of using the Social GAN dataset splits, we split the original ETH data into training and validation dataset as 80:20, obtaining training dataset and validation dataset split at nearly the same timestamps as the SGAN data. The corrected data is in directory data. We observe that we are able to achieve significantly better performance for ETH and UNIV datasets if we use the corrected data. To evaluate the model on the Social GAN splits, simply change data/ to data_sgan/ in the train_deterministic.py and train_generative.py files.

Results

We evaluate our proposed methods on two metrics: average displacement error and final displacement error. The ADE / FDE values (in m) for our methods across the five datasets are reported below.

ETH HOTEL ZARA1 ZARA2 UNIV AVG
vanillaSCAN 0.79 / 1.36 0.46 / 0.95 0.39 / 0.86 0.33 / 0.71 0.64 / 1.34 0.52 / 1.04
SCAN 0.78 / 1.29 0.40 / 0.76 0.38 / 0.80 0.33 / 0.72 0.62 / 1.28 0.50 / 0.97
GenerativeSCAN 0.79 / 1.49 0.37 / 0.74 0.37 / 0.78 0.31 / 0.66 0.58 / 1.23 0.48 / 0.98