4.6 Article

Interactions with 3D virtual objects in augmented reality using natural gestures

Journal

VISUAL COMPUTER
Volume -, Issue -, Pages -

Publisher

SPRINGER
DOI: 10.1007/s00371-023-03175-4

Keywords

Augmented reality; Deep learning; Interaction with virtual objects

Ask authors/readers for more resources

This paper focuses on using the human palm as a natural target for rendering and interacting with 3D virtual objects in augmented reality (AR) applications. A two-stage palm detection model is proposed to track multiple palms and calculate the camera pose. Intuitive one-handed natural gestures are used for interaction, and a finite state machine is employed to detect gesture changes. The proposed method outperforms state-of-the-art methods in terms of precision and frame rate.
Markers are the backbone of various cross-domain augmented reality (AR) applications available to the research community. However, the use of markers may limit anywhere augmentation. As smart sensors are being deployed across the large spectrum of consumer electronic (CE) products, it is becoming inevitable to rely upon natural gestures to render and interact with such CE products. It provides limitless options for augmented reality applications. This paper focuses on the use of the human palm as the natural target to render 3D virtual objects and interact with the virtual objects in a typical AR set-up. While printed markers are comparatively easier to detect for camera pose estimation, palm detection can be challenging as a replacement for physical markers. To mitigate this, we have used a two-stage palm detection model that helps to track multiple palms and the related key-points in real-time. The detected key-points help to calculate the camera pose before rendering the 3D objects. After successfully rendering the virtual objects, we use intuitive, one-handed (uni-manual) natural gestures to interact with them. A finite state machine (FSM) has been proposed to detect the change in gestures during interactions. We have validated the proposed interaction framework using a few well-known 3D virtual objects that are often used to demonstrate scientific concepts to students in various grades. Our framework has been found to perform better as compared to SOTA methods. Average precision of 96.5% (82.9% SSD+Mobilenet) and FPS of 58.27 (37.93 SSD+Mobilenet) have been achieved. Also, to widen the scope of the work, we have used a versatile gesture dataset and tested it with neural network-based models to detect gestures. The approach fits perfectly into the proposed AR pipeline at 46.83 FPS to work in real-time. This reveals that the proposed method has good potential to mitigate some of the challenges faced by the research community in the interactive AR space.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available