The deepest technologies are those that disappear. They are melting into the production of everyday life that are indistinguishable from him.
– Mark Weiser
Many of us grew up and watched Star Trek, where the crew could easily talk to the computer and would understand not only their words, but their intention. The “computer, localization of Mr. Spock” concerned the recognition of voice – it was about understanding, context and action. This vision of the surrounding calculation, where the interface disappears and interaction becomes natural (speech, gestures, etc.) for scientists and builders for decades for decades.
The Computing Research Foundation Foundation for the production of this vision was laid in 1988 by Mark Weiser from Xerox Park when he created the term Ubiqutious Computing. Mark together with John Sely Brown defined the concept Calm computer Have these attributes:
- The purpose of the computer is to help you do something else.
- The best computer is a quiet and invisible portion.
- The more you can do intuition, the smartist you are; The computer should extend your unconsciousness.
- The technology should create peace.
When Amazon launched Alexa in 2014, we were not the first to sell the recognition of his voice. For decades, Dragon has transformed speech to the text, and Siri and Cortana have already helped users with basic tasks. But Alexa’s representation something else – watered Voice service where developers could build. Anyone who has a good idea and coding skills could contribute to Alex’s abilities.
I remember building my first device for DIY Alexa with Raspberry Pi, microphone $ 5 and cheap speaker. It costs less than $ 50 and I let it work in less than an hour. The experience was perfect, but it was rough. Builders are enthusiastic about the voice potential as an interface – especially when they could then build it themselves.
However, the initial days of skill development were not without challenges. Our first interaction model was based on establishment – as an interface of the command line of the 1970s, but in a voice. Developers had to anticipate accurate phrases (and extensive maintenance lists of certificates) and users had to remember specific formulas. “Alexo, ask (the name of the skill) to (something)” has become a family, but an unnatural formula. Over time, we have simplified it with functions such as interactions without and more rotation, but we were still limited by basic restrictions on comparison of patterns and nationwide classifications.
Generative AI allows us to attract different access to voice interfaces. Alexa+ and our new Ai-Rodák SDKS remove the complex of natural language understanding from the development of the developer. For example, SDK Alexa AI Action allows developers to exhibit its services through simple APIs, allowing Alex’s large language models to solve the nuances of human conversation. Behind the scenes corresponds to the sophisticated routing system using Amazon Bedrock models – including Amazon Nova and Anthropic Claude – every application with an optimal task model, balances the requirements for the accident and conversational fluency.
This shift from the ex -district command patterns to natural conversation reminds me of the development of database interfaces. In the first days of relational databases, questions had to be precisely structured. The introduction of a natural language questioning, even though it initially met with skepticism, has become increasingly stronger and more accurate. Similarly, Alexa+ can now improve occasional propists, such as “I need a rustic white picture, around 11 to 17” into a structured search, maintain a context through specification and make a transaction – and feel like a conversation that you should with another person.
For builders, this is a major shift in how we build voice experiences. Instead of mapping statements to interts, we can focus on the exhibition of our main business logic through API and let Alexa manage the complexes of understanding natural language. And for services without externalized APIs, we have added agent capacitance that allow Alexa+ navigate the digital interface and space as we should, which significantly expanded the tasks that can accomplish.
Jeff’s vision was to build a Star Trek computer. Ten years ago it was an ambitious goal. Since then, we have gone a long way – from basic voice commands to much more conversational interfaces. Generative AI gives us a look at what is possible. And even though we are not flying voice ships yet, the basic technical problems with understanding natural language and automate action become monitoring.
The Alexa+ team accepts applications for timely access to AI-ROV SDK. You can register here. Ten years and I am as excited as always when I see what the builders dream.
As always, now go build!