GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB

소프트웨어 개발/Computer Vision

GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB

Leo's notes 2020. 1. 15. 21:48

Goal

Track hand pose from unconstrained monocular RGB video streams at real-time framerates

Background

Multi-view methods

Hard to setup (calibration)
Hard to operate on general hand motions in unconstrained scenes
Expensive

Monocular methods

Without setup overhead
Do not work in all scenes (e.g. outdoor with sunlight)
Higher power consumption
Not robust to occlusions by objects
Not able to distinguish 3D poses with the same 2D joint position projection

Learning-based methods

Difficult to obtain annotated data with sufficient real-world variations
Suffer from occlusions due to objects being manipulated by the hand
Synthetic data has a domain gap when models trained on this data are applied to real input
Hard to obtain real-synthetic image pairs

Dataset

Since the annotation of 3D joint positions in hundreds of real hand images is infeasible, synthetically generated images are commonly used.

Real hand image

28,903 Real hand image with desktop webcam

Synthetic hand image

SynthHands dataset from state-of-the-art datasets

Real-time hand tracking under occlusion from an egocentric RGB-D sensor
Learning to Estimate 3D Hand Pose from Single RGB Images

GeoConGAN

Translate synthetic to real images based on CycleGAN

Uses adversarial discriminators to learn cycle-consistent forward and backward mappings
Has two trainable translators synth2real and real2synth
Does not require paired images
Preserve poses during translation

Extract the silhouettes of the images by training a binary classification network, SilNet based on simple UNet.

Has three 2-strided convolutions and three deconvolutions

Data Augmentation

Composite GANerated images with random background images
Randomly textured object by leveraging the object masks

Hand Joints Regression

Train a CNN, the RegNet derived from the ResNet50, that predicts 2D and 3D positions of 21 hand joints with about 440,000 samples (60% GANerated)

2D joint positions are represented as heatmaps in image space : represent uncertainities
3D positions are represented as 3D coordinates relative to the root joint : resolve the depth ambiguities
Additional refinement module based on projection layer (ProjLayer) to better coalesce 2D & 3D predictions

Kinematic Skeleton Fitting

Kinematic Hand Model

Comprises one root joint and 20 finger joints.

Per-user skeleton adaptation : obtain averaging relative bone lengths of 2D prediction over 30 frames while the users hold their hand parallel to the camera image plane

2D Fitting Term

Minimize the distance between the hand point position projected onto the image plane and the heatmap maxima.

3D Fitting Term

Obtain a good hand articulation by using the predicted relative 3D joint positions
Resolve depth ambiguities that are present when using 2D joint positions only

Joint Angle Constraints

Penalize anatomically implausible hand articulations by enforcing that joints do not bend too far

Temporal Smoothness

Penalize deviations from constant velocity

Optimization

Minimize the energy in gradient-descent strategy,
Use that the root joint and four direct children joints (respective non-thumb MCP joints) are rigid

Experiments

Use the Percentage of Correct Keypoints (PCK) score that is a popular criterion to evaluate pose estimation accuracy

Conclusion

Real-time hand tracking system with,

Monocular RGB-only image without multi-view or depth image
Synthetic generation of training data based on geometrically consistent image-to-image translation network (GeoConGAN)
Training convolutional neural network (RegNet)
Kinematic skeleton fitting

Outperforms,

Without setup overhead
Lower power consumption
From unconstrained images (does not require paired images)
Preserve poses during translation
Robust to occlusions and varying camera viewpoints
More precise than state-of-the-art methods

Reference

https://handtracker.mpi-inf.mpg.de/projects/GANeratedHands/content/GANeratedHands_CVPR2018.pdf

저작자표시 비영리 변경금지

현재글GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB

Leo's notes