6DDirect H+B: Body-Aware Head Pose Estimation

In a well-functioning democratic constitutional state, it is crucial for professionals such as journalists and politicians to work unimpeded, yet threats against them have increased, necessitating surveillance. Manual surveillance requires extensive manpower and is limited in effectiveness, leading to the adoption of computer vision systems. Our research aims to enhance surveillance by accurately predicting head and body rotation, as well as gaze direction, in surveillance footage.

To achieve this, we developed a model named 6DDirect H+B that accurately determines the 6D poses of head and body for multiple individuals in surveillance images. This model addresses challenges such as occlusions, varying subject distances from the camera, and diverse lighting conditions. By integrating localization, classification, and rotation learning within a unified framework using a fine-tuned YOLOv5 backbone, our approach enhances the accuracy of rotation estimation. Then, we apply 6DDirect H+B to the task of gaze direction estimation, using an LSTM to leverage changes in head and body rotations over time to predict where a person is looking, to demonstrate the effectiveness of our approach.

We would like to extend our gratitude to the following repositories, as much of our code was inspired by and adapted from their work:

We highly encourage you to explore these repositories for more in-depth insights and advancements in the field.

Datasets - Head Pose and Body Orientation Estimation

AGORA

Project link: [https://agora.is.tue.mpg.de/]. Github link: [https://github.com/pixelite1201/agora_evaluation]. Using and downloading this dataset needs personal registration. You can construct AGORA-H+B following the steps below.

Step 1: Download Raw Images

Download the raw images for train-set and validation-set from AGORA website.

Create necessary directories and extract the downloaded images:

mkdir -p AGORA/demo/images
cd AGORA
unzip ./path/to/download/validation_images_1280x720.zip -d demo/images
unzip ./path/to/download/train_images_1280x720_<id>.zip -d demo/images

# Move 10 folders of train-set raw images into one folder. The id is from 0 to 9.
mv demo/images/train_<id>/* demo/images/train

Step 2: Download and Extract Raw Data

Download the following raw data from the AGORA website:
- Camera (approx. 392KB) -> train_Cam.zip & validation_Cam.zip
- SMPL-X fits gendered (1.3GB) -> smplx_gt.zip
- Scan/Fit Info (43KB) -> gt_scan_info.zip
- SMIL (SMPL-X format), the kid model -> smplx_kid_template.npy

Create necessary directories and extract the downloaded files:

mkdir -p demo/Cam/validation_Cam demo/Cam/train_Cam demo/GT_fits demo/model/smplx

# Extract Camera data
unzip -j ./path/to/download/validation_Cam.zip validation_Cam/Cam/* -d demo/Cam/validation_Cam/
unzip -j ./path/to/download/train_Cam.zip -d demo/Cam/train_Cam/

# Extract SMPL-X fits and Scan/Fit Info
unzip ./path/to/download/gt_scan_info.zip -d demo/GT_fits
unzip ./path/to/download/smplx_gt.zip -d demo/GT_fits

Download and extract the SMPL-X models (npz version) from SMPL-X website:

unzip ./path/to/download/models_smplx_v1_1.zip SMPLX_FEMALE.npz SMPLX_MALE.npz SMPLX_NEUTRAL.npz -d demo/model/smplx/

Step 3: Download AGORA Evaluation Code and Generate `Cam/*_withjv.pkl` Files

Clone the AGORA evaluation repository:

git clone https://github.com/pixelite1201/agora_evaluation
cd agora_evaluation
# If this fails, change sklearn to scikit-learn in `setup.py`
pip install .

Move the downloaded SMIL kid model template:

mv ./path/to/download/smplx_kid_template.npy ./utils/

Create symbolic links and replace with modified files:

# Using full path is necessary!
ln -s ~/full/path/to/AGORA/demo ~/full/path/to/AGORA/agora_evaluation/demo

# Replace with two modified files
cp ../../exps/AGORA/agora_evaluation/projection.py agora_evaluation/
cp ../../exps/AGORA/agora_evaluation/get_joints_verts_from_dataframe.py agora_evaluation/
cp ../../exps/AGORA/agora_evaluation/project_points.py agora_evaluation/

Install the SMPL-X model:

git clone https://github.com/vchoutas/smplx ../smplx
cd ../smplx
pip install .

Run the script to generate the .pkl files with joint and vertex data:

cd ../agora_evaluation

# Validation
python agora_evaluation/project_joints.py --imgFolder demo/images/validation --loadPrecomputed demo/Cam/validation_Cam \
  --modeltype SMPLX --kid_template_path utils/smplx_kid_template.npy --modelFolder demo/model \
  --gt_model_path demo/GT_fits/ --imgWidth 1280 --imgHeight 720

# Train
python agora_evaluation/project_joints.py --imgFolder demo/images/train --loadPrecomputed demo/Cam/train_Cam \
  --modeltype SMPLX --kid_template_path utils/smplx_kid_template.npy --modelFolder demo/model \
  --gt_model_path demo/GT_fits/ --imgWidth 1280 --imgHeight 720

NOTE: This takes a long time and a lot of memory.

Step 4: Generate the Final AGORA-H+B Dataset

Create directories for the final dataset and annotations:

cd ..
mkdir -p H_B/images/validation H_B/images/train H_B/annotations

Copy necessary scripts and process the data:

cp ../exps/AGORA/H_B_data_process.py ./
cp ../exps/AGORA/H_B_utils.py ./

# Generate head and body bounding boxes + 6D rotations
python H_B_data_process.py --load_pkl_flag # for head and body

Step 5: Prepare Labels for Training

Ensure labels are within the [0, 1] range and write them to the correct files for YOLOv5 integration:

cd ../sixDDirect_H_B/
python utils/labels.py --data data/agora_coco_H_B.yaml # data/agora_coco.yaml for heads only

This should give approximately the following folder setup for AGORA:

└── AGORA
    ├── agora_evaluation/
    │   ├── agora_evaluation/
    │   └── ...
    ├── demo/
    │   ├── Cam/
    │   │   ├── train_Cam/
    │   │   └── validation_Cam/
    │   ├── GT_fits/
    │   │   ├── gt_scan_info/
    │   │   └── smplx_gt/
    │   │       └── ...
    │   ├── images/
    │   │   ├── train/
    │   │   └── validation/
    │   └── model/
    │       └── smplx/
    ├── H_B/
    │   ├── 6D_body_head_yolov5_labels_coco/
    │   │   ├── img_txt/
    │       ├── train/
    │   │   └── validation/
    │   ├── annotations/
    │       ├── full_body_head_coco_style_train.json
    │       └── full_body_head_coco_style_validation.json
    │   └── images/
    │       ├── train/
    │       └── validation/
    └── smplx/
        └── ...

CMU

Project link: [http://domedb.perception.cs.cmu.edu/]. Github link: [https://github.com/CMU-Perceptual-Computing-Lab/panoptic-toolbox]. Using and downloading this dataset needs personal registration. We have no right to directly disseminate its data. You can construct CMU-H+B following steps below.

Step 1: Setting Up Directories and Copying Necessary Scripts

Create and navigate to the CMU directory:
```
mkdir CMU
cd CMU
```

Copy the necessary scripts and Python files from the exps directory:

cp -r ../exps/scripts ./
cp ../exps/H_B_data_process.py ./
cp ../exps/H_B_utils.py ./
cp ../exps/data_split.py ./

Understanding the Provided Files:

scripts/:
- This directory contains scripts from the CMU-Perceptual-Computing-Lab/panoptic-toolbox and some modifications made by DirectMHP and this author.
H_B_data_process.py:
- Tweaked from DirectMHP GitHub to get head and body rotations.
- Downloads data in a loop and directly removes folders to manage the large data size from the CMU Panoptic dataset.
data_split.py:
- Modified from DirectMHP GitHub to accommodate different COCO dictionary templates.
H_B_utils.py:
- Adjusted from DirectMHP GitHub to e.g. include a reference body.

Step 3: Processing the CMU Panoptic Dataset

The dataset is very large and downloading it will take a long time. To manage this:

We process and sample data per sequence.
Delete the folder after processing each sequence to save space.
Save annotations after processing each sequence to prevent data loss if something crashes.

Create a directory for head and body data processing:
```
mkdir H_B
```

Run the data processing script:

python H_B_data_process.py  # Processes head and body data

Step 4: Combining and Splitting the Data

After processing and saving annotations for each sequence:

Combine the separate annotations into a single file and split it into training and validation datasets:
```
python data_split.py
```

Step 5: Prepare Labels for Training

Ensure labels are within the [0, 1] range and write them to the correct files for YOLOv5 integration:

cd ../sixDDirect_H_B/
python utils/labels.py --data data/cmu_panoptic_coco_H_B.yaml # data/cmu_panoptic_coco.yaml for heads only

This should approximately give the following folder structure for CMU:

└── CMU/
    ├── H_B/
    │   ├── 6D_body_head_yolov5_labels_coco/
    │   │   ├── img_txt/
    │   │   ├── train/
    │   │   └── val/
    │   ├── annotations/
    │   │   ├── coco_style_sample.json
    │   │   ├── coco_style_sampled_train.json
    │   │   └── coco_style_sampled_validation.json
    │   ├── images/
    │   │   ├── train/
    │   │   └── validation/
    │   └── images_sampled/
    └── scripts/

Datasets - Gaze Direction Estimation

GAFA

Step 1: Downloading the Dataset

Download the data from the GAFA GitHub. Note that this will take a very long time due to the large file sizes. Place the downloaded files in the GazeNet/data/raw_data folder.

Step 2: Processing the Dataset

Option 1: Process All Data at Once (Requires Ample Disk Space). Extract and preprocess each tar.gz file:

cd GazeNet/
names=("living_room" "courtyard" "library" "kitchen" "lab")
for NAME in "${names[@]}"; do
  tar -zxvf "data/raw_data/$NAME.tar.gz"
done

python data/preprocess.py
python data/reprocess.py

Option 2: Process Data One Folder at a Time (For Limited Disk Space). Unzip, preprocess, and clean up each folder one by one:

cd GazeNet/
names=("living_room" "courtyard" "library" "kitchen" "lab")
for NAME in "${names[@]}"; do
  tar -zxvf "data/raw_data/$NAME.tar.gz"
  python data/preprocess.py
  rm -rf "$NAME"
done

python data/reprocess.py

Step 3: Details of Processing Scripts

preprocess.py:
- Processes the GAFA dataset to create annotations and pickle files.
- Steps per video:
  1. For all directions (gaze, body, head), get the corresponding rotation matrices.
  2. Resize the frames so the width is 720 pixels.
  3. Extract head and body bounding boxes from OpenPose 2D annotations.
  4. Save all frames to an image.pkl file.
  5. Save all annotations to an annotations.pkl file.
- Note: Long videos are split into three parts to save RAM.
reprocess.py:
- Further processes the preprocessed data.
- Steps:
  1. Load annotations.
  2. Get consecutive frames (skipping frames where subjects walk out of frame).
  3. Limit outlier widths and heights to a maximum of ±0.75 the standard deviation.
  4. Smooth the x, y locations over 7 frames using Gaussian kernel convolution.

Step 4: Generating COCO Style Annotations and Splitting Data

Run the coco_GAFA.py script:
```
python data/coco_GAFA.py
```
- coco_GAFA.py:
  - Converts GAFA dataset to COCO style annotations.
  - Steps:
    1. Samples every 7th frame of valid frames.
    2. Gets all valid frames for samples and their annotations (bounding boxes, gaze/head/body direction, 6D poses, Euler angles).
    3. Saves image information and annotations to a COCO style dictionary.
Run the split_GAFA.py script:
```
python data/split_GAFA.py
```
- split_GAFA.py:
  - Splits the COCO style annotations into training and validation sets (75% train, 25% validation).

Step 5: Prepare Labels for Training

Ensure labels are within the [0, 1] range and write them to the correct files for YOLOv5 integration:

cd ../sixDDirect_H_B/
python utils/labels.py --data data/gafa_coco_H_B.yaml # data/gafa_coco.yaml for heads only

This should give this approximate folder structure:

└── GazeNet/data/preprocessed/
    ├── courtyard/
    ├── H_B_G/
    │   ├── 6D_body_head_yolov5_labels_coco/
    │   │   ├── img_txt/
    │   │   ├── train/
    │   │   └── val/
    │   ├── annotations/
    │   │   ├── GAFA_coco_style.json
    │   │   ├── GAFA_coco_style_train.json
    │   │   └── GAFA_coco_style_validation.json
    │   ├── images/
    │   │   ├── train/
    │   │   └── validation/
    ├── kitchen/
    ├── lab/
    ├── library/
    └── living_room/

Method: 6DDirect H+B

Installation

This repository contains methods for validating, training, and inferring models that predict heads and bodies directly, specifically for the AGORA, CMU Panoptic, and GAFA datasets.

The first step is setting up the environment by doing the following:

git clone https://github.com/noanonkes/6DDirect_H_B
cd 6DDirect_H_B/sixDDirect_H_B/

conda env create -f conda.yaml
conda activate hb

pip3 install torch==1.10.0+cu111 torchvision==0.11.1+cu111 torchaudio==0.10.0+cu111 \
  -f https://download.pytorch.org/whl/cu111/torch_stable.html

Install the renderer, which is used to visualize predictions.

cd Sim3DR
sh build_sim3dr.sh

NOTE: All files in sixDDirect_H_B need to be called within the hb environment.

Validate 6DDirect H+B and MHP

Prerequisites

First, download the necessary weights from here and place them in the weights folder.

Testing on AGORA

To validate the model that predicts both heads and bodies directly on the AGORA dataset, execute the following command:

python val.py --rect --data data/agora_coco_H_B.yaml --img 1280 \
  --weights weights/agora_m_1280_e300_t40_bs32_c04_b04_h06 \
  --batch-size 16 --device 0

For testing the model that only predicts heads on AGORA, use this command:

python val.py --rect --data data/agora_coco.yaml --img 1280 \
  --weights weights/agora_m_1280_e300_t40_bs32_H \
  --batch-size 16 --device 0

Testing on CMU Panoptic or GAFA

To test on the CMU Panoptic or GAFA dataset, adjust the --data and --weights flags accordingly.

Train 6DDirect H+B and MHP

Training from Scratch

You can train the models from scratch. The instructions below are based on using two GPUs; adjust the --device flag according to your setup.

For training the head and body model on AGORA:

python train.py --workers 9 --device 0,1 \
  --img 1280 --batch 32 --epochs 300 --data data/agora_coco_H_B.yaml --hyp data/hyp-p6.yaml \
  --weights weights/yolov5m6.pt --project runs/6DDirect_H_B \ 
  --conf_thres 0.4 --conf_thres_body 0.4 --conf_thres_head 0.4 \
  --l2_loss_w 0.1 --name agora_m_1280_e300_t40_bs32_c04_b04_h06 --bbox_interval 50

For training a model that only predicts heads:

python train.py --workers 9 --device 0,1 \
  --img 1280 --batch 32 --epochs 300 --data data/agora_coco.yaml --hyp data/hyp-p6.yaml \
  --weights weights/yolov5m6.pt --project runs/6DDirectMHP --conf_thres 0.4 \
  --l2_loss_w 0.1 --name agora_m_1280_e300_t40_bs32_c04 --bbox_interval 50

Training on CMU Panoptic or GAFA

To train on the CMU Panoptic or GAFA datasets, modify the --name, --data, and --weights flags as needed.

Inference 6DDirect H+B and MHP

For testing trained or pre-trained models on your own images, use the image_vis3d.py script. Place all images in the test_imgs folder and adjust the weights accordingly.

python demos/image_vis3d.py \
 --imgsz 1280 \
 --weights weights/agora_m_1280_e300_t40_bs32_H \
 --img-path test_imgs \
 --data data/agora_coco_H_B.yaml

Method: GazeNet

GazeNet involves a two-step process: first, obtaining head and body predictions, and then predicting gaze directions. This separation allows for independent experimentation with gaze predictions.

Installation

The first step is setting up the environment by doing the following:

cd GazeNet/
conda env create -f conda.yaml

NOTE: All files in GazeNet need to be called within the gafa environment.

Head and Body Orientation Predictions

First, get the head and body orientation predictions on the GAFA dataset:

# needs to be done with other environment
conda activate hb

cd sixDDirect_H_B/
python GAFA/predictions.py --data data/gafa_coco_H_B.yaml --img 720 \
  --weights weights/gafa_ptAGORA_720_e50_t40_b128_b04_h06.pt \
  --batch-size 128 --device 0 --iou-thres 0.5 --conf-thres 0.0001

Since GazeNet LSTM trains with 7-frame sequences, extract valid samples for each 7-frame sequence:

python GAFA/valid_frames.py

Train GazeNet

To train GazeNet, run:

cd GazeNet/
conda activate gafa

python train.py

Evaluate GazeNet

To evaluate using pre-trained weights, download the weights from here and place them in the GazeNet/output folder. Then, run:

python eval.py

Baselines for Head Pose Estimation

6DRepNet

For complete steps on downloading and installing 6DRepNet, we refer to their GitHub.

Clone the 6DRepNet repository from their GitHub.

Create the output directory and download the pre-trained weights:

mkdir output
wget https://huggingface.co/HoyerChou/DirectMHP/resolve/main/SixDRepNet_AGORA_bs256_e100_epoch_last.pth -P output/
wget https://huggingface.co/HoyerChou/DirectMHP/resolve/main/SixDRepNet_CMU_bs256_e100_epoch_last.pth -P output/

Copy the necessary files from the exps/sixdrepnet directory:

cp ../../exps/sixdrepnet/gen_dataset_full_AGORA_CMU.py ./
cp ../../exps/sixdrepnet/test.py ./
cp ../../exps/sixdrepnet/datasets.py ./
cp ../../exps/sixdrepnet/model.py ./
cp ../../exps/sixdrepnet/utils.py ./

Overview of Files and Changes:

Most of the files are originally from DirectMHP which we tweaked.

gen_dataset_full_AGORA_CMU.py:
- Tweaked to fix file paths and skip annotations that are not heads, as the dataset now includes heads and bodies.
test.py:
- Adjusted to add the geodesic distance metric.
datasets.py, model.py, utils.py:
- Sourced from the DirectMHP GitHub.

Generating Head Crops and Testing:

Generate head crops for AGORA:

python gen_dataset_full_AGORA_CMU.py --db ../../AGORA/agora_evaluation/HPE/ --img_size 256 --root_dir ./datasets/ --data_type val --filename files_val.txt

Test on AGORA:

python test.py --dataset AGORA --data_dir ./datasets/AGORA/val --filename_list ./datasets/AGORA/files_val.txt --snapshot output/SixDRepNet_AGORA_bs256_e100_epoch_last.pth --gpu 0 --batch_size 1

Generate head crops for CMU:

python gen_dataset_full_AGORA_CMU.py --db ../../CMU/HPE/ --img_size 256 --root_dir ./datasets/ --data_type val --filename files_val.txt

Test on CMU:

python test.py --dataset CMU --data_dir ./datasets/CMU/val --filename_list ./datasets/CMU/files_val.txt --snapshot output/SixDRepNet_CMU_bs256_e100_epoch_last.pth --gpu 0 --batch_size 1

DirectMHP

For complete steps on downloading and installing DirectMHP, we refer to their GitHub.

Clone the DirectMHP repository from their GitHub

Download the pre-trained weights:

wget https://huggingface.co/HoyerChou/DirectMHP/resolve/main/agora_m_1280_e300_t40_lw010_best.pt -P weights/
wget https://huggingface.co/HoyerChou/DirectMHP/resolve/main/cmu_m_1280_e200_t40_lw010_best.pt -P weights/

Copy the necessary files:

cp ../exps/DirectMHP/val.py ./
cp ../exps/DirectMHP/data/{agora_coco,cmu_panoptic_coco}.yaml ./data/
cp ../exps/DirectMHP/utils/mae.py ./utils/
cp -r ../exps/DirectMHP/visualize/ ./

Overview of Files and Changes:

val.py:
- Added calculation for geodesic distance.
utils/mae.py:
- Added geodesic loss class, geodesic error as return, and skipped classifications that are not heads.
data/agora_coco.yaml:
- Updated paths to data.
visualize/*:
- New code for comparisons between our method and DirectMHP.

Evaluating:

Evaluate on the AGORA dataset:

python val.py --rect --data data/agora_coco.yaml --img 1280 --weights weights/agora_m_1280_e300_t40_lw010_best.pt --batch-size 8 --device 0

Evaluate on the CMU dataset:

python val.py --rect --data data/cmu_panoptic_coco.yaml --img 1280 --weights weights/cmu_m_1280_e200_t40_lw010_best.pt --batch-size 8 --device 0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
GazeNet		GazeNet
exps		exps
materials		materials
sixDDirect_H_B		sixDDirect_H_B
.gitignore		.gitignore
README.md		README.md
gazenet_demo.ipynb		gazenet_demo.ipynb

noanonkes/6DDirect_H_B

Folders and files

Latest commit

History

Repository files navigation

6DDirect H+B: Body-Aware Head Pose Estimation

Table of Contents

Datasets - Head Pose and Body Orientation Estimation

AGORA

Step 1: Download Raw Images

Step 2: Download and Extract Raw Data

Step 3: Download AGORA Evaluation Code and Generate Cam/*_withjv.pkl Files

Step 4: Generate the Final AGORA-H+B Dataset

Step 5: Prepare Labels for Training

CMU

Step 1: Setting Up Directories and Copying Necessary Scripts

Step 3: Processing the CMU Panoptic Dataset

Step 4: Combining and Splitting the Data

Step 5: Prepare Labels for Training

Datasets - Gaze Direction Estimation

GAFA

Step 1: Downloading the Dataset

Step 2: Processing the Dataset

Step 3: Details of Processing Scripts

Step 4: Generating COCO Style Annotations and Splitting Data

Step 5: Prepare Labels for Training

Method: 6DDirect H+B

Installation

Validate 6DDirect H+B and MHP

Prerequisites

Testing on AGORA

Testing on CMU Panoptic or GAFA

Train 6DDirect H+B and MHP

Training from Scratch

Training on CMU Panoptic or GAFA

Inference 6DDirect H+B and MHP

Method: GazeNet

Installation

Head and Body Orientation Predictions

Train GazeNet

Evaluate GazeNet

Baselines for Head Pose Estimation

6DRepNet

Overview of Files and Changes:

Generating Head Crops and Testing:

DirectMHP

Overview of Files and Changes:

Evaluating:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 3: Download AGORA Evaluation Code and Generate `Cam/*_withjv.pkl` Files

Packages