4.5 Article

Temporally guided articulated hand pose tracking in surgical videos

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s11548-022-02761-6

关键词

Articulated pose; Surgical videos; Computer vision; Hand pose; Video tracking

向作者/读者索取更多资源

This study proposes a novel hand pose estimation model, CondPose, which improves detection and tracking accuracy by incorporating pose prior information. The researchers also collect the Surgical Hands dataset, which provides multi-instance articulated hand pose annotations for publicly available surgical videos, including bounding boxes, pose annotations, and tracking IDs for multi-instance tracking research.
Purpose Articulated hand pose tracking is an under-explored problem that carries the potential for use in an extensive number of applications, especially in the medical domain. With a robust and accurate tracking system on surgical videos, the motion dynamics and movement patterns of the hands can be captured and analyzed for many rich tasks. Methods In this work, we propose a novel hand pose estimation model, CondPose, which improves detection and tracking accuracy by incorporating a pose prior into its prediction. We show improvements over state-of-the-art methods which provide frame-wise independent predictions, by following a temporally guided approach that effectively leverages past predictions. Results We collect Surgical Hands, the first dataset that provides multi-instance articulated hand pose annotations for videos. Our dataset provides over 8.1k annotated hand poses from publicly available surgical videos and bounding boxes, pose annotations, and tracking IDs to enable multi-instance tracking. When evaluated on Surgical Hands, we show our method outperforms the state-of-the-art approach using mean Average Precision, to measure pose estimation accuracy, and Multiple Object Tracking Accuracy, to assess pose tracking performance. Conclusion In comparison to a frame-wise independent strategy, we show greater performance in detecting and tracking hand poses and more substantial impact on localization accuracy. This has positive implications in generating more accurate representations of hands in the scene to be used for targeted downstream tasks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据