Skip to content
/ ntu-x Public

NTU-X, which is an extended version of popular NTU dataset

License

Notifications You must be signed in to change notification settings

skelemoa/ntu-x

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NTU-X

PWC

This repository contains details and pretrained models for the newly introduced dataset NTU-X, which is an extended version of popular NTU dataset. For additional details and results on experiments using NTU-X, take a look at the paper NTU-X: An Enhanced Large-scale Dataset for Improving Pose-based Recognition of Subtle Human Actions


Watch the video
Video Overview(Click on Image above)

What is NTU-X

The original NTU dataset contains the human action skeleton which are captured using the Kinect. These skeletons have 25 joints. However, all the current top performing models seem to be bottlenecked at certain classes which involve finer finger level movements such as, reading, writing, eat meal etc.

Hence the new NTU-X dataset, introduces a more detailed 118 joints skeleton for the action sequences of the NTU dataset. This new dataset, along with 25 body joints, contains 42 finger joints and 51 face joints.

Model NTU60 NTU60-X NTU120 NTU120-X
DSTA-Net 91.50 93.56 86.60 87.80
CTR-GCN 92.40 93.90 88.90 88.36
4s-ShiftGCN 90.70 91.78 85.90 86.18
MsG3d 91.50 91.76 86.90 87.10
PA-ResGCN 90.90 91.64 87.30 86.42

Overlayed Examples

NTU NTU-X (ours)
Write
Read
Eat Meal

Comparing NTU-X to other popular action datasets.

Dataset Body Fingers Face # Joints # Sequences # Classes
MSR-Action 3d ✔️ 20 567 20
Northwestern-UCLA ✔️ 24 1475 10
NTU-RGB+D ✔️ 25 56880 60
NTU-RGB+D 120 ✔️ 25 114035 120
NTU60-X (Ours) ✔️ ✔️ ✔️ 118 56148 60
NTU120-X (Ours) ✔️ ✔️ ✔️ 118 113821 120

Download NTU-X dataset

ROSE Lab, creators of NTU-RGBD dataset have refused to give us permission to release the dataset. NTU-X addresses a fundamental raw-data-level shortcoming of existing Kinect-based dataset, a shortcoming no amount of novel deep models can somehow undo. We had hoped to release the dataset for the benefit of the community once we receive consent. Unfortunately, due to reasons best known to them, keeping the dataset out of community's reach was the decision they arrived at. Apologies to those who were awaiting the release. We hope you understand.

Pretained Models

Few experiments are performed to benchmark this new dataset using the top performing models of the original NTU RGB+D dataset. Details about this models can be found at Models

The classes included in NTU-X

NTU-X contains same classes as NTU RGB+D dataset. The action labels are mentioned below:

A1 drink water. A2 eat meal/snack. A3 brushing teeth.
A4 brushing hair. A5 drop. A6 pickup.
A7 throw. A8 sitting down. A9 standing up (from sitting position).
A10 clapping. A11 reading. A12 writing.
A13 tear up paper. A14 wear jacket. A15 take off jacket.
A16 wear a shoe. A17 take off a shoe. A18 wear on glasses.
A19 take off glasses. A20 put on a hat/cap. A21 take off a hat/cap.
A22 cheer up. A23 hand waving. A24 kicking something.
A25 reach into pocket. A26 hopping (one foot jumping). A27 jump up.
A28 make a phone call/answer phone. A29 playing with phone/tablet. A30 typing on a keyboard.
A31 pointing to something with finger. A32 taking a selfie. A33 check time (from watch).
A34 rub two hands together. A35 nod head/bow. A36 shake head.
A37 wipe face. A38 salute. A39 put the palms together.
A40 cross hands in front (say stop). A41 sneeze/cough. A42 staggering.
A43 falling. A44 touch head (headache). A45 touch chest (stomachache/heart pain).
A46 touch back (backache). A47 touch neck (neckache). A48 nausea or vomiting condition.
A49 use a fan (with hand or paper)/feeling warm. A50 punching/slapping other person. A51 kicking other person.
A52 pushing other person. A53 pat on back of other person. A54 point finger at the other person.
A55 hugging other person. A56 giving something to other person. A57 touch other person's pocket.
A58 handshaking. A59 walking towards each other. A60 walking apart from each other.
A61 put on headphone. A62 take off headphone. A63 shoot at the basket.
A64 bounce ball. A65 tennis bat swing. A66 juggling table tennis balls.
A67 hush (quite). A68 flick hair. A69 thumb up.
A70 thumb down. A71 make ok sign. A72 make victory sign.
A73 staple book. A74 counting money. A75 cutting nails.
A76 cutting paper (using scissors). A77 snapping fingers. A78 open bottle.
A79 sniff (smell). A80 squat down. A81 toss a coin.
A82 fold paper. A83 ball up paper A84 play magic cube.
A85 apply cream on face. A86 apply cream on hand back. A87 put on bag.
A88 take off bag. A89 put something into a bag. A90 take something out of a bag.
A91 open a box. A92 move heavy objects. A93 shake fist.
A94 throw up cap/hat. A95 hands up (both hands). A96 cross arms.
A97 arm circles. A98 arm swings. A99 running on the spot.
A100 butt kicks (kick backward). A101 cross toe touch. A102 side kick.
A103 yawn. A104 stretch oneself. A105 blow nose.
A106 hit other person with something. A107 wield knife towards other person. A108 knock over other person (hit with body).
A109 grab other person’s stuff. A110 shoot at other person with a gun. A111 step on foot.
A112 high-five. A113 cheers and drink. A114 carry something with other person.
A115 take a photo of other person. A116 follow other person. A117 whisper in other person’s ear.
A118 exchange things with other person. A119 support somebody with hand. A120 finger-guessing game (playing rock-paper-scissors).

FAQs

1. How is the NTU-X dataset created?

It is collected by estimating 3D SMPL-X pose outputs from the RGB frames of the NTU-60 RGB videos. We use both SMPL-X and Expose to perform these estimations.

2. How the pose extractor (SMPLx/ExPose) is decided for each class?

We use a semi-automatic approach to estimate the 3D pose for the videos of each class. Keeping the intra-view and intra-subject variance of the NTU dataset in mind, we sample random videos covering each view perclass of NTU and estimate the SMPL-X, ExPose outputs. The estimated skeleton is then backprojected to its corresponding RGB frame and the accuracy of the alignment is used to select between SMPL-X and Expose.

3. Which class IDs have ExPose used as pose extractor and which class IDs have SMPLx used as pose extractor?

Empirically, we observe that ExPose,SMPL-X perform equally well for single-person actions but SMPL-X, though slow, provides better pose estimates for multi-person action class sequences. The classes selected for SMPL-X and Expose are as follows:

Expose

  • A1. drink water
  • A2. eat meal/snack
  • A3. brushing teeth.
  • A4. brushing hair.
  • A8. sitting down.
  • A9. standing up (from sitting position).
  • A12. writing.
  • A14. wear jacket.
  • A15. take off jacket.
  • A16. wear a shoe.
  • A17. take off a shoe.
  • A18. wear on glasses.
  • A19. take off glasses.
  • A20. put on a hat/cap.
  • A21. take off a hat/cap.
  • A24. kicking something.
  • A25. reach into pocket.
  • A26. hopping (one foot jumping).
  • A29. playing with phone/tablet.
  • A30. typing on a keyboard.
  • A33. check time (from watch).
  • A34. rub two hands together.
  • A36. shake head.
  • A37. wipe face.
  • A38. salute.
  • A41. sneeze/cough.
  • A42. staggering.
  • A43. falling.
  • A44. touch head (headache).
  • A46. touch back (backache).
  • A47. touch neck (neckache).
  • A48. nausea or vomiting condition.
  • A49. use a fan (with hand or paper)/feeling warm.
  • A61. put on headphone.
  • A62. take off headphone.
  • A63. shoot at the basket.
  • A64. bounce ball.
  • A65. tennis bat swing.
  • A68. flick hair.
  • A71. make ok sign.
  • A72. make victory sign.
  • A73. staple book.
  • A75. cutting nails.
  • A76. cutting paper (using scissors).
  • A78. open bottle.
  • A79. sniff (smell).
  • A80. squat down.
  • A82. fold paper.
  • A83. ball up paper.
  • A84. play magic cube.
  • A85. apply cream on face.
  • A86. apply cream on hand back.
  • A87. put on bag.
  • A88. take off bag.
  • A89. put something into a bag.
  • A90. take something out of a bag.
  • A91. open a box.
  • A92. move heavy objects.
  • A93. shake fist.
  • A94. throw up cap/hat.
  • A95. hands up (both hands).
  • A96. cross arms.
  • A97. arm circles.
  • A98. arm swings.
  • A99. running on the spot.
  • A100. butt kicks (kick backward).
  • A101. cross toe touch.
  • A103. yawn.
  • A104. stretch oneself.
  • A105. blow nose.
  • A106. hit other person with something.
  • A107. wield knife towards other person.
  • A108. knock over other person (hit with body).
  • A111. step on foot.
  • A112. high-five.
  • A114. carry something with other person.
  • A119. support somebody with hand.

SMPL-X

  • A5. drop.
  • A6. pickup.
  • A7. throw.
  • A10. clapping.
  • A11. reading.
  • A13. tear up paper.
  • A22. cheer up.
  • A23. hand waving.
  • A27. jump up.
  • A28. make a phone call/answer phone.
  • A31. pointing to something with finger.
  • A32. taking a selfie.
  • A35. nod head/bow.
  • A39. put the palms together.
  • A40. cross hands in front (say stop).
  • A45. touch chest (stomachache/heart pain).
  • A50. punching/slapping other person.
  • A51. kicking other person.
  • A52. pushing other person.
  • A53. pat on back of other person.
  • A54. point finger at the other person.
  • A55. hugging other person.
  • A56. giving something to other person.
  • A57. touch other person's pocket.
  • A58. handshaking.
  • A59. walking towards each other.
  • A60. walking apart from each other.
  • A66. juggling table tennis balls.
  • A67. hush (quite).
  • A69. thumb up.
  • A70. thumb down.
  • A74. counting money.
  • A77. snapping fingers.
  • A81. toss a coin.
  • A102. side kick.
  • A109. grab other person’s stuff.
  • A110. shoot at other person with a gun.
  • A113. cheers and drink.
  • A115. take a photo of other person.
  • A116. follow other person.
  • A117. whisper in other person’s ear.
  • A118. exchange things with other person.
  • A120. finger-guessing game (playing rock-paper-scissors).

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages