3 KPIs you need to look at to improve hand tracking and gesture controls user experience in AR and VR

Thomas Amilien
Inborn Experience (UX in AR/VR)
4 min readOct 7, 2020

--

How to measure the user’s perceived precision, speed and performance of hand tracking and gesture recognition in augmented and virtual reality.

To test the perceived precision and speed of our software, I decided to play the piano and record my session with the cameras on the lightweight AR glasses Nreal.

Hand tracking and gesture recognition technology is usually integrated into a complex stack- a hardware and software ecosystem.

This ecosystem includes cameras, displays, operating systems, computing boxes, and SDKs — where many key gesture performance indicators (KPIs) can be measured that are interdependent, affecting the overall user experiences (UX).

With the intent of debunking the KPI definitions, we will discuss three gesture performance indicators that allow us to monitor and improve the UX in Augmented Reality and Virtual Reality hand tracking and gesture recognition: perceived speed, perceived accuracy and performance.

Perceived speed

End-to-end latency definition

“My virtual hand moves as quickly as my real hand” — a user

This dynamic is defined by end-to-end latency, or the delay between the user’s action and the system’s response.

It’s experienced as the delay from the real hand being interpreted by the onboard sensors and Clay AIR’s software, and the 3D rendering to the display of the glasses.

End-to-end latency calculation method

The end-to-end latency is calculated in frames per second (FPS) or in milliseconds (ms).

End-to-end latency is dependent on other elements of the stack as detailed above, which includes Clay AIR’s SDK latency.

In summary, the perceived speed is key to the UX as it reflects a user’s perception that the system reacts in real-time to their movement, and therefore relays their actions in the virtual reality or augmented reality experience, whether it be an in-game, engineering visualization experience or training program that requires the highest standard of interactivity.

Perceived Accuracy

“The gesture I perform is accurately recognized” or “I can interact with virtual objects” — a user

When a user performs a gesture or interacts with content, this must be successfully interpreted and carried out in the Augmented Reality or Virtual Reality experience.

Clay AIR’s SDK Gesture Accuracy

Formula: (# gestures recognized successfully)/(# gestures performed in the FoV)

In the case of gesture recognition technology, this KPI is measured as the percentage of hands or gestures recognized out of the number of hands or gestures that are in the field of view of the device’s sensors.

Clay AIR’s SDK Hand Tracking Accuracy, or Tracking Success Rate

In the case of hand tracking technology, the KPI of perceived precision is hand tracking accuracy, which refers to the position, on a pixel by pixel basis, of 22 key points of the rendered model skeleton on the hand in the raw image.

It’s the probability of the model to guess the position of these points of the hand in space.

Perceived precision: The impact of the accuracy on the overall UX

Perceived precision is key to a user’s sense of immersion. If the gesture accuracy is low, the software will interpret the hand to be somewhere else in space than it actually is.

The user may not be able to interact with virtual objects with virtual touch, leading to an uncomfortable user experience.

Clay AIR’s machine learning model gesture accuracy ranges between 94% and 99% on gestures such as victory, call, swipe, pinch and grab.

What does 99% gesture recognition accuracy mean from a user experience standpoint?

Often, 99% gesture accuracy being interpreted by ‘out of 100 gestures, 99 will be recognized’. It’s not as simple as it sounds.

A 99% gesture recognition accuracy means that of one out of 100 input images, one will not be interpreted successfully. The assumption here is that the user also performs the gestures clearly and in optimal conditions, in a well lit room, and in the success zone of the camera’s field of view.

For instance, if a user puts their hand in front of the success zone of a 30fps camera, after 3.3 seconds, the hand might not have been recognized during 1/30 second (one image), this is almost imperceptible to the human eye.

Explanation

The reason for this is that a 30fps camera’s system refreshes every 33ms. As Clay AIR SDK refreshes at every frame, within 3.3 seconds, Clay AIR SDK has processed 100 input images.

Performance

“My battery runs low” / “My battery heats up” — a user

CPU/DSP load

Finally, the third KPI we will discuss for UX in Augmented Reality and Virtual Reality is performance, referring to the CPU (computer processing unit) and DSP (digital signal processor) load.

Hand tracking and gesture recognition is based on computer vision and powered by machine learning, making it necessary for the SDK to make use of the device’s power.

Optimizing the CPU/DSP load for a more comfortable UX

Optimizing the CPU/DSP load preserves the battery from overheating or draining too quickly, resulting in more comfort from a cooler device and extended battery life.

Clay AIR’s software minimizes its load, enabling it to coexist with other SDKs on the device that run on the same processing unit.

Additional Specs Affecting KPIs

Each of these KPIs are interdependent on the hardware specifications and software of the augmented reality or virtual reality device that is being used.

In this article, we talk about this interdependence and additional factors that affect the end-to-end UX.

Clay AIR specializes in hand tracking and gesture recognition solutions. Feel free to reach out anytime!

Originally published at https://clayair.io/three-key-hand-tracking-and-gesture-recognition-indicators-you-need-to-track-to-improve-ux-in-augmented-reality-and-virtual-reality/ on October 7, 2020.

--

--

Thomas Amilien
Inborn Experience (UX in AR/VR)

Thomas has a background in Science, Arts & Leadership. His passion for immersive experiences led him to co-found Clay AIR in 2015, acquired by Qualcomm in 2021.