This lets them know when the gesture is complete even before it’s classification is returned from the Azure Deep Learning service, making the experience vastly more tactile and usable. If the gesture response comes back successfully a slightly deeper popping sound is played and a picture of the gesture type is shown in the bottom left corner. While the system doesn’t always predict the user’s intended gesture correctly, performing the gesture in a certain way can greatly improve the chance of recognition. By adding the noises and quickly removing in-view gesture tracking and predicted gesture classifications, the user receives clear, instant feedback when the gesture recognition fails and can adjust their technique accordingly.
While there is still plenty of room for improvement, the resulting ability to increase gesture options speaks loudly to 3D developers who have grown accustomed to limitations. Another exciting path that could be explored is using an ONNX export of it in conjunction with the WinML on-device inference that was introduced with the Windows’ RS4 update. This will help to improve latency and also help to overcome potential issues with backend connectivity and network limitations.