THU-READ is an RGB-D dataset collected in Tsinghua University for action recognition in egocentric videos. The dataset contains 40 action classes, which are “all-about-hand”. In order to balance the data distribution, we asked 8 subjects (6 males and 2 females, height ranging from 162 cm to 185 cm) to repeat performing the action of each class for the same N times (here we chose N = 3). Finally, we obtained 1920 video clips, where 1920 = 8(subjects)*2(modalities)*40*(classes)*3(times).
We mounted an RGB-D sensor on a helmet, which was placed on the subject's head. For the purpose of acquiring egocentric action videos, we kept the camera in the same direction with the subject's eyesight so as to simulate the real conditions.
We would like to thank Honghui Liu, Guangyi Chen, Shan Gu, Ziyan Li, Yaoyao Wu, Weixiang Chen and Jianwei Feng for dataset collection, and thank Prof. Song-Chun Zhu, Dr. Tianfu Wu and Yang Liu from VCLA, UCLA for valuable discussions.