3 KPIs you need to look at to improve hand tracking and gesture controls user experience in AR and VR
How to measure the user’s perceived precision, speed and performance of hand tracking and gesture recognition in augmented and virtual reality.
Hand tracking and gesture recognition technology is usually integrated into a complex stack- a hardware and software ecosystem.
This ecosystem includes cameras, displays, operating systems, computing boxes, and SDKs — where many key gesture performance indicators (KPIs) can be measured that are interdependent, affecting the overall user experiences (UX).
With the intent of debunking the KPI definitions, we will discuss three gesture performance indicators that allow us to monitor and improve the UX in Augmented Reality and Virtual Reality hand tracking and gesture recognition: perceived speed, perceived accuracy and performance.
Perceived speed
End-to-end latency definition
“My virtual hand moves as quickly as my real hand” — a user
This dynamic is defined by end-to-end latency, or the delay between the user’s action and the system’s response.
It’s experienced as the delay from the real hand being interpreted by the onboard sensors and Clay AIR’s software, and the 3D rendering to the display of the glasses.
End-to-end latency calculation method
The end-to-end latency is calculated in frames per second (FPS) or in milliseconds (ms).
End-to-end latency is dependent on other elements of the stack as detailed above, which includes Clay AIR’s SDK latency.
In summary, the perceived speed is key to the UX as it reflects a user’s perception that the system reacts in real-time to their movement, and therefore relays their actions in the virtual reality or augmented reality experience, whether it be an in-game, engineering visualization experience or training program that requires the highest standard of interactivity.
Perceived Accuracy
“The gesture I perform is accurately recognized” or “I can interact with virtual objects” — a user
When a user performs a gesture or interacts with content, this must be successfully interpreted and carried out in the Augmented Reality or Virtual Reality experience.
Clay AIR’s SDK Gesture Accuracy
Formula: (# gestures recognized successfully)/(# gestures performed in the FoV)
In the case of gesture recognition technology, this KPI is measured as the percentage of hands or gestures recognized out of the number of hands or gestures that are in the field of view of the device’s sensors.
Clay AIR’s SDK Hand Tracking Accuracy, or Tracking Success Rate
In the case of hand tracking technology, the KPI of perceived precision is hand tracking accuracy, which refers to the position, on a pixel by pixel basis, of 22 key points of the rendered model skeleton on the hand in the raw image.
It’s the probability of the model to guess the position of these points of the hand in space.
Perceived precision: The impact of the accuracy on the overall UX
Perceived precision is key to a user’s sense of immersion. If the gesture accuracy is low, the software will interpret the hand to be somewhere else in space than it actually is.
The user may not be able to interact with virtual objects with virtual touch, leading to an uncomfortable user experience.
Clay AIR’s machine learning model gesture accuracy ranges between 94% and 99% on gestures such as victory, call, swipe, pinch and grab.
What does 99% gesture recognition accuracy mean from a user experience standpoint?
Often, 99% gesture accuracy being interpreted by ‘out of 100 gestures, 99 will be recognized’. It’s not as simple as it sounds.
A 99% gesture recognition accuracy means that of one out of 100 input images, one will not be interpreted successfully. The assumption here is that the user also performs the gestures clearly and in optimal conditions, in a well lit room, and in the success zone of the camera’s field of view.
For instance, if a user puts their hand in front of the success zone of a 30fps camera, after 3.3 seconds, the hand might not have been recognized during 1/30 second (one image), this is almost imperceptible to the human eye.
Explanation
The reason for this is that a 30fps camera’s system refreshes every 33ms. As Clay AIR SDK refreshes at every frame, within 3.3 seconds, Clay AIR SDK has processed 100 input images.
Performance
“My battery runs low” / “My battery heats up” — a user
CPU/DSP load
Finally, the third KPI we will discuss for UX in Augmented Reality and Virtual Reality is performance, referring to the CPU (computer processing unit) and DSP (digital signal processor) load.
Hand tracking and gesture recognition is based on computer vision and powered by machine learning, making it necessary for the SDK to make use of the device’s power.
Optimizing the CPU/DSP load for a more comfortable UX
Optimizing the CPU/DSP load preserves the battery from overheating or draining too quickly, resulting in more comfort from a cooler device and extended battery life.
Clay AIR’s software minimizes its load, enabling it to coexist with other SDKs on the device that run on the same processing unit.
Additional Specs Affecting KPIs
Each of these KPIs are interdependent on the hardware specifications and software of the augmented reality or virtual reality device that is being used.
In this article, we talk about this interdependence and additional factors that affect the end-to-end UX.
Clay AIR specializes in hand tracking and gesture recognition solutions. Feel free to reach out anytime!
Originally published at https://clayair.io/three-key-hand-tracking-and-gesture-recognition-indicators-you-need-to-track-to-improve-ux-in-augmented-reality-and-virtual-reality/ on October 7, 2020.