Humane AI Pin, a voice-controlled personal assistant, was one of Silicon Valley’s most popular artificial intelligence products. However, the media hype died down quite quickly. Outside of Amazon’s Alexa and Apple’s Siri, voice interfaces have been technology’s biggest failure. Mint asks why:
Why the hype around the Humane AI Pin?
AI Pin, launched by Silicon Valley startup Humane, promised to overhaul consumer technology. For this purpose, the company’s first gadget used an operating system running on the ChatGPT platform from OpenAI, which was intended to automate most of the operations performed by users on smartphones. These include playing music, booking taxis, ordering food and more using voice commands without having to open multiple apps. It is this seamless interoperability that has led to the hype around AI Pin. The Pin also had a camera that could be used to input images, thus promising a future where gadgets would become more interactive.
How was the product received?
In mid-April, Humane began issuing units for evaluation to technology reviewers and testers. So far, the overwhelming amount of feedback has been highly critical of the feature, claiming that most voice interactions don’t work as promised. However, this problem may largely be due to the fact that most applications run in silos and require multiple permissions to keep them all running smoothly and in sync. Moreover, generative AI is not yet very right, which adds to the complications of using a voice interface as the main way to operate a gadget.
Why do we need specialized AI equipment?
One of the reasons the AI Pin has been questioned is that as a gadget it doesn’t do anything that a smartphone doesn’t already do. Both Android and iOS platforms have voice interfaces for most operations. The only difference dedicated AI hardware can make is to change the shape of the smartphone and present devices that are not focused on touch screens. This may take a while.
Is voice well suited for AI operations?
Most generative AI tools available to consumers are text-based. However, the number of voice interfaces is growing. Microsoft’s Vall-E can take three-second audio clips and exploit them to generate voice responses based on the source voice. Generative AI can conduct voice conversations, which is why companies like Robot and Humane are trying gadgets that support voice natively. However, multimodal, generative AI models are still cloud-based and costly, making them tough to run on offline devices.
So why has the voice failed so far?
Reliability is a substantial issue. Even with Amazon’s Alexa, interfaces rely heavily on basic commands or pre-built commands created by developers to integrate third-party software. In particular, generative AI needs seamless conversational interactions via voice, which is not yet possible with high accuracy. Most voice interfaces have so far been basic, and therefore voice devices have not yet performed the intricate tasks that a mature user interface should provide.