Learning to Estimate 3D Hand Pose from Single RGB Images

Clova AI Intern Interview Assignment

Implementation of Learning to Estimate 3D Hand Pose from Single RGB Images by tensorflow.

Recommended Python Packages Version

python : 3.5.6
tensorflow : 1.10.0
numpy : 1.18.5
matplotlib : 3.0.0
opencv : 3.4.1

Data Set

Rendered Handpose Dataset

This dataset provides 41258 training and 2728 testing samples. Each sample provides:

RGB image (320x320 pixels)
Depth map (320x320 pixels); See examples for how to convert the provided RGB
Segmentation masks (320x320 pixels) for the following classes: background, person, three classes for each finger and one for each palm
21 Keypoints for each hand with their uv coordinates in the image frame, xyz coordinates in the world frame and an indicator if they are visible
Intrinsic Camera Matrix K It was created with freely available character from www.mixamo.com and rendered with www.blender.org For more details see the aforementioned paper.

Keypoints available:

0: left wrist
1-4: left thumb(tip to palm)
5-8: left index, ...17-20: left pinky
21: right wrist
22-25: right thumb, ..., 38-41: right pinky

Segmentation masks available:

0: background
1: person,
2-4: left thumb(tip to palm)
5-7: left index, ..., 14-16: left pinky, 17: palm
18-20: right thumb, ..., 33: right palm

Load data

Download data and set 'path_to_db' at loader.py to where the dataset is located on your machine
Optionally modify data_set variable to training or evaluation

# loader.py
path_to_db = './RHD_published_v2/'
tloader = loader.Data('training')
eloader = loader.Data('evaluation')

You can open data sample by sample_id

# loader.py
sample_id = 3
image, mask, depth = tloader.load_id(sample_id)
anno = tloader.anno_all[sample_id]

Training Network

The model saved automatically when it is initialized. If there is a same model ID(i.e. name) on it's directory, the remain one can be removed. if you want to load model, you need to use diffrent name for model

The model need configuration infro to initialize.

<Configuration info>
ID : Model ID
n_iter : Total number of iterations
n_prt : Loss print cycle
input_h : Image height
input_w : Image width
input_ch : Image channel (e.g. RGB)
n_output : Dimension of output
n_batch : Size of batch
n_save : Model save cycle
n_history : Train/Test loss save cycle
LR : Learning rate
random_crop : Random crop by 256x256 when training HandSegNet
training : True or False, it will determine dropout condition

<Configuration example>
config = {
    'ID' : 'test_handseg',
    'n_iter' : 20000,
    'n_prt' : 100,
    'input_h' : 320,
    'input_w' : 320,
    'input_ch' : 3,
    'n_output' : 10,
    'n_batch' : 8,
    'n_save' : 1000,
    'n_history' : 50,
    'LR' : 1e-5,
    'random_crop' : True,
    'training' : True,
}

Example of training network

import loader
import model

training_loader = loader.Data('training')

config = {
    'ID' : 'handseg_01',
    'n_iter' : 20000,
    'n_prt' : 100,
    'input_h' : 320,
    'input_w' : 320,
    'input_ch' : 3,
    'n_output' : 10,
    'n_batch' : 8,
    'n_save' : 1000,
    'n_history' : 50,
    'LR' : 1e-5,
    'random_crop' : True,
    'training' : True,
}

SegNet = model.Hand3DPoseNet(config=config)
SegNet.train_HadSegNet(training_loader)

Loading Network

You can simply load the model by checkpoint

Example of loading network

config_test = {
    'ID' : 'test_model',
    'n_iter' : 20000,
    'n_prt' : 100,
    'input_h' : 320,
    'input_w' : 320,
    'input_ch' : 3,
    'n_output' : 10,
    'n_batch' : 8,
    'n_save' : 1000,
    'n_history' : 50,
    'LR' : 1e-5,
    'random_crop' : False,
    'training' : False,
}
SegNet_test = model.Hand3DPoseNet(config=config_test)
SegNet_test.load(checkpoint) # (e.g) checkpoint = './handseg_01/checkpoint/handseg_01_20000'

Example output of HandSegNet

Issues

When doing image intermediate process (crop and resize) based on HandSegNet's hand segmentation, offset and scale should be applied to the coordinates of hand keypoints. Since keypoints have 3D(x,y,z) as well as 2D(u,v), 3D coordinates must be transformed when cropping and resizing 2D image, but this is not solved.
When training PoseNet, ground truth generation of the score maps is based on gaussian normal distributions with a sigma of 5 pixels and the mean being equal to the given keypoint location. For invisible keypoints the map is zero everywhere.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.vscode		.vscode
sample		sample
.gitattributes		.gitattributes
.gitignore		.gitignore
HandSegNet.ipynb		HandSegNet.ipynb
Intermediate Process Test.ipynb		Intermediate Process Test.ipynb
README.md		README.md
TestHandseg.ipynb		TestHandseg.ipynb
loader.py		loader.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning to Estimate 3D Hand Pose from Single RGB Images

Recommended Python Packages Version

Data Set

Rendered Handpose Dataset

Keypoints available:

Segmentation masks available:

Load data

Training Network

Example of training network

Loading Network

Example of loading network

Example output of HandSegNet

Issues

About

Releases

Packages

Languages

khlee369/3DhandPose

Folders and files

Latest commit

History

Repository files navigation

Learning to Estimate 3D Hand Pose from Single RGB Images

Recommended Python Packages Version

Data Set

Rendered Handpose Dataset

Keypoints available:

Segmentation masks available:

Load data

Training Network

Example of training network

Loading Network

Example of loading network

Example output of HandSegNet

Issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages