How to Create My Own Digital Assistant like Siri
She has a silken voice, which sounds so sweet. And she is always ready to come to your aid: answer a question, suggest something, help as best as she is able to. And she has a wonderful name - Siri.
Although, we shouldn't use the word "she", for we're not talking about a human being, as you might have thought. Siri is a special AI-based program, the first of various voice assistants destined to come later.
Today, these digital artificial intelligence assistants are increasingly conquering the world, which is happening thanks to the onrush of technology. And modern people have no choice but to find the proper use for them both in day-to-day life and in business. In our new blog post, we'll do our utmost to convince you that this is quite possible.
As you've probably already guessed, we're going to discuss a nice girl named Siri, the faithful virtual helper to all Apple users. You'll get a lot of valuable information, including the best way to build your own Siri. It would be great, don’t you think?
What are voice assistants?
Before moving on to more complex topics, we'd like to grasp the very basics, at least briefly.
A digital assistant is an AI-based service recognizing human speech and performing a specific action in response to a user's command (which he must give to a personal assistant by voice). Most often, these solutions are being used in mobile devices, smart speakers, web browsers.
The functionality of all popular voice assistants like Siri is aimed at helping a user in solving his simple everyday tasks, such as reminding about significant dates, calling a taxi, setting an alarm, ordering some goods in online stores, etc.
Most personal assistants in mobile apps have the machine learning ability, so they analyze the user's behavior when communicating with him and take into account a lot of factors to make his experience better in the future. Among these factors are user location, time of day, history of his online searches, and more.
Reasons to create a Siri-like app
We promised to convince you of the business benefits of building artificial intelligent voice assistants, and we'll do it with the help of latter statistics.
What users think about personal assistants like Siri?
As research from Accenture Interactive has shown, a majority of consumers have a positive attitude towards virtual assistants and find them rather useful. More precisely, we may say the following…
80% of users believe that the best feature of voice assistants is their ability to give specific answers to the questions posed;
More than 50% of people would like Siri (and other personal assistants like Siri) to help them with relevant tips when shopping online (for example, they wouldn't mind getting information on the origin of goods);
Almost 75% of users would be happy to interact with a digital assistant on a daily basis once it learns to respond like a real person.
By the way, if you seriously intend to create your own Siri, pay attention to PwC's market analysis either. According to its report, today your potential target audience mostly consists of the younger generation (under 30). People aged 30+ (and especially 50+) have a hard time getting used to these services. However, the situation is gradually improving.
What to expect in the near future in the voice assistants market?
First of all, according to Juniper analysts, we should expect an increase in the number of devices equipped with voice features (up to 870 million by 2022, if we talk about the United States).
Moreover, in the immediate future, transport companies, utility providers, and telecommunications organizations are likely to join the number of businesses ready to integrate voice assistant technology into their mobile platforms.
These achievements are admirable, aren't they? The more especially as voice technology has only recently reached its optimal level of development.
As you probably know (and as the infographic above shows), the history of artificial intelligence assistants starts with Siri, which is still considered one of the most successful among them.
Let's try to look into the reasons to explain such fame: in other words, let’s see why Siri is so popular.
What makes Siri a famous artificial intelligence assistant?
In Norwegian, the word ‘Siri’ means “a beautiful woman who leads you to victory”; in addition, it can be translated as "secret" from the Swahili language. At least, that's what one of Siri's creators explained to iOS app users.
Though, 'Siri' also has a more reasonable meaning and stands for Speech Interpretation and Recognition Interface. This version of the name sounds quite eloquent, right? It says Siri is a program that recognizes and interprets human speech, which allows it to communicate with the users and answer their questions.
At the moment, Siri has reached great heights, but we still remember the times when it could recognize the voice of a random person and respond to it at the wrong moment: say, start a search, send a message, or perform any other undesirable action. Fortunately, all this is in the past: the developers had been working hard on updates, and today Siri communicates with us in a pleasant voice and no longer makes regrettable mistakes.
However, the story of Siri's improvement is too interesting to skip. What’s more, this story might help you create a mobile AI voice assistant of your own.
Siri's success story
It all started in late 2007 when a small startup Siri Inc (a team of 24 people) decided to develop a special AI assistant. Doug Kitlauss (who was CEO at the time), Tom Gruber, Adam Cheyer (together with Norman Vinarsky from SRI International) were the ones who took the lead on the Siri project.
Of course, such a large-scale development is impossible without financial injection, so to say, (keep it in mind if you're considering creating a voice assistant for AI apps too). Siri Inc. was lucky in this regard: the startup received funding of $8.5 million from Menlo Ventures and Morgenthaler Ventures in the fall of 2008. The second portion of the investment was obtained a year later and amounted to as much as $15.5 million.
Work on the project lasted several years and was successfully completed at the very beginning of 2010 when the Siri application appeared in App Store. And just a few months later (to be precise, in April 2010), the startup had been bought out by Apple (the purchase price remains a secret, presumably something about $150 or even $250 million).
The Siri app required further improvement, as it didn't fully match Apple's vision of what the ideal voice interface should be. It didn't take much time to make basic changes, and on October 4, 2011, Siri was added as a default software element to iOS 5 (and later) devices.
What has been embodied in Siri?
Initially, in the pre-Apple days, Siri was planned as an artificial intelligence assistant, able to predict the user's wishes and fulfill them before he verbalizes what he wants to get.
The idea is well illustrated by the following example: say, Siri app user scheduled a flight, but it has been canceled; he is lost wondering what to do next. But he shouldn't worry because Siri is already looking for other suitable flights to offer him new options (even without his request).
By the by, prior to the time Siri was acquired by Apple, its work was based on information from 42 services, including Rotten Tomatoes, Yelp, StubHub before, and Wolfram Alpha.
After the deal was closed, Apple's team went to great lengths to perfect the service they bought. Some unnecessary features were removed, as the company wanted to bring the basic functionality required by personal assistant mobile apps to the ideal. In particular, Siri has gotten a real voice, whereas previously the program answered users' questions only in writing.
Let’s summarize: Siri is a real study, which includes the combined work of several advanced research groups (for instance, the results of 40 years of SRI International investigation). Technology has come a long way, from primitive dialogue to understanding natural language. Perhaps, Siri became practically useful with the release of iOS 9, when it learned to filter out extraneous noise and began to process complex user requests 40% faster.
Siri's competitive advantages
Now let's look at the main advantages of Siri. After that, you'll understand why it’d be smart of you to integrate Siri into your app.
Ease of use. iOS users just need to say the magic phrase "Hey, Siri!", and the program comes to life immediately.
Efficiency at operation. Siri understands the user's natural speech very well and therefore is more efficient at work.
Continuous development of the program. Since Siri is an artificial intelligence assistant based on machine learning technology, it is constantly evolving and improving itself. One of the latest achievements is the ability to interact with some elements of a smart home system.
Multilanguage support. Initially, Siri knew only a few languages: English, Japanese, French, and German. However, with each OS update, the language spectrum was gradually expanding: the developers had added support for Korean, Chinese, Italian, Russian, Spanish, and so on. And today Siri speaks 20 languages.
A good imitation of human speech. The user gets an impression of communicating with a real girl named Siri, and not with the program.
The speed of work. Voice assistants work very quickly, especially when compared to touch interfaces.
User-friendliness. If the user is unsure what Siri is capable of, he may safely ask the system itself what it can do. The screen would display the functionality of Apple's digital assistant.
High level of personalization. Siri is able to adapt to each user individually, learning his or her preferences over time.
How can Siri help me?
Well, now you understand why Siri is so popular, right? And it's time to figure out what it actually can do.
So, the features of Siri include:
- Making calls (both audio and video ones). The original function of a smartphone is the opportunity to make calls, and Siri, of course, would help you with such a basic task;
- Sending messages and emails. It’s enough to dictate the desired text of the message (or email) to the virtual assistant, it’ll write it down and send it to the specified user. In addition, Siri can read the last SMS message received;
- Laying the route. Siri shows the user how to get to his destination (and how long it would take to get there);
- An online search of any kind: you may search for products, news, whatever! Photo search, by the way, is quite possible too.
- Planning your calendar. Siri is also playing the part of a kind of a reminder: the program fixes the time of the event scheduled and reminds the user of it in advance;
- Setting an alarm. "Wake me up in an hour!" - ask Siri, and it'll fulfill your order. Or just indicate at what time you need to wake up tomorrow;
- Informing on important news. Siri is well aware of the weather, exchange rates and stock prices, the latest news, and other things like that;
- Making payments. Of course, they couldn't possibly create a mobile AI voice assistant of a Siri level without such a feature as helping users with online transactions.
- Social interaction. Also, Siri provides us with the possibility to use voice commands to communicate easily on social networks;
- Audio-video help. Resort to Siri if you want to turn on a specific music (or video) track or recognize the song playing;
- Home delivery order. It’s about the delivery of a certain dish from your favorite restaurant (or stuff like that);
- Calculator. Siri is capable of performing mathematical calculations and speaking the result aloud;
- Management of settings. With Siri, you can switch to Airplane mode or enable (or disable) "Bluetooth";
- Entertaining the user with games. There are also voice commands that Siri responds to in order to entertain you. Say "Siri, pull out the card", and the program will name any of them. You may also ask it to flip a coin, give a number from 1 to 20, etc.
As you may see now, the main responsibility of Siri is to save the user's time. By the by, why don't you use Siri's experience to create your own feature list for voice assistant apps?
What are the components of a voice interface?
Now we’d like to discuss basic technologies in mobile assistants. Simply put, what characterizes the voice interface and how does it differ from the usual visual one?
There are a few key components of a voice user interface (VUI):
Voice input. Users no longer need to resort to typing or touch-sensitive graphic elements to make a particular request. Now a voice command is more than enough.
Voice output. The same can be said about getting feedback from Siri: information is being transmitted through voice (as opposed to the output of data displayed on the device's screen).
Natural language. The user should be free when communicating with personal assistants like Siri, he mustn’t be limited to a certain standardized set of phrases the software understands.
Intelligent interpretation. To provide the most ideal user experience, the voice interface must analyze and correctly interpret the behavior of the person it’s dealing with.
Contribution. The user sometimes sets a task without understanding all the steps accompanying it. And the digital assistant must guess about them itself and perform them.
All five UI technologies listed aren't always active at the same time: let's say, some of the artificial intelligence assistants may use visual output instead of voice. Nevertheless, if you want to create a Siri-like app and succeed in doing so, consider integrating each of the above technologies. Following our tip, you’ll get 2 important benefits:
Human-like communication. Users will be able to formulate their goal the way they want, in their familiar, natural language, without the need to take advantage of a visual interface.
Proactivity. The voice interface can predict the user's intentions based on his behavior and, therefore, anticipate his desires. So the idea of “mind-reading” is getting closer to reality than ever.
Speaking of technologies in mobile assistants, by the way... Take a look at the infographic below to understand what you'll have to deal with to make your own Siri.
It's time to figure out how to integrate Siri into a mobile app or create a new voice service from scratch.
How to make a voice assistant like Siri?
There are several approaches to solving the problem:
you can integrate voice assistant technology into an existing application, which will help you stand out from competitors who haven't implemented such a module yet;
another way to achieve the goal is to create your own Siri, that is, develop a completely new voice app.
Let's consider different options for building artificial intelligent voice assistants.
First of all, developers can add Siri functionality to their apps using the SiriKit API, available since summer 2016 (Apple announced SiriKit at WWDC 2016).
There are many scenarios for third-party use of Siri:
Audio and video calls;
Working with contacts;
Processing of text messages;
Working with maps;
And much more!
Do you want to know how the Siri-integrated mobile app works? Let’s see!
The whole process is rather simple: the user interacts with Siri according to his usual scheme, the only difference is the need to indicate the application name he wants to activate.
Open-Source platform strategies
The second way to make your own Siri is to take advantage of various open-source solutions. All you have to do is add a couple of lines of code to your project, and it's done.
The Aimybox service provides a laconic and customizable UI, as well as the assistant SDK itself. As recognition, NLP, and synthesis engines, you may choose one from the existing modules or create yours.
In fact, Aimybox implements the architecture of a voice artificial intelligence assistant by standardizing the interfaces of these modules and organizing their interaction in the right way.
Melissa is another example of an open-source solution allowing you to create a Siri-like app without the hassle.
Melissa is quite demanded and popular among developers. Let's see why it's the case.
Melissa’s benefits include:
Lego approach. Lego constructor is famous all over the world. Kids love to experiment with Lego pieces and make various kinds of models out of them. The same is true of Melissa software. The developer, too, deals with small "details" of sorts and uses them to build a desired voice-featured product. Obviously, the Lego approach expands his possibilities and makes it easier to customize the app at will.
Incredible ease of use. The above list item logically leads to the ease of use of the service. With Melissa, even a novice developer would cope with such a simple task as voice assistant integration into your mobile app.
Compatibility with different OS. Finally, Melissa works great on different platforms and supports many operating systems, not excluding Windows, Linux, and OS X.
Development of a voice assistant from scratch
Well, all popular voice assistants like Siri are built in this very way, without the use of third-party solutions. The method is expensive and time-consuming, but the result is surely worth it.
To build your own Siri from scratch, you must find and hire experienced, highly skilled developers. There is a lot of work to be done, and you're unlikely to manage it yourself: one needs to connect to speech recognition and synthesis systems, activate the language processing engine, create a unique UI/UX, implement the architecture and, of course, thoroughly test the final product.
If you're okay with the challenges and costs ahead, read the next section.
The main stages of voice app development
Interacting with the VUI, the user doesn't see any graphical parts, everything looks like a set of dialogue. However, the development steps are similar to those required to create GUI solutions.
Discovery phase. Firstly, you need to decide on the main app idea, analyze the market, and draw up a plan for further actions.
VUI. The main goal of the next development step is to design the interaction between the user and the application. Though, while the designer of graphic UIs draws app screen maps, the VUI expert works out all the possible dialogues between the user and the artificial intelligence assistant (with possible deviations from the baseline scenario).
By the way! If you’re planning to add voice technology to your existing application as a new optional feature, then you don't need a GUI (your app already has it, right?). However, if your goal is a separate voice application aimed at helping the user in solving simple everyday tasks through a VUI (something like the original Siri as it existed before Apple acquired it), you cannot do without visual interaction with the user. Be sure to keep it in mind if you intend to create a Siri-like app.
Main development. It is divided into two parts: developing a speech understanding system and writing logic (which implies thinking over how the future voice assistant should accept and answer user questions, where it gets the data from, what services it cooperates with, etc). This is an extremely complicated stage, in which various solutions might help you, such as Google's Tensorflow (Google could not help but offer cool technology to make the work of developers easier!), Amazon Machine Learning (as the name suggests, Amazon provides a tool to implement machine learning), Azure ML Studio, and many others.
Testing. Testing is especially important when it comes to voice assistants. You see, in the realm of graphical interfaces, you’re limited by what the designer has drawn: say, the user won't have a chance to tap a button if it doesn't exist. Alas, in the world of sounds, everything is more complicated: the user is free to say whatever he wants. So it's better to test all possible options beforehand.
Naming. Don’t forget to pay special attention to the assistant name. It should be simple by ear because the user has to say it out loud every time the program starts.
Project publication. If we’re talking about personal assistant mobile apps, then we also need to consider the stage of publishing the project in App Store and/or Google Play. The whole process is fairly standard, you’re probably familiar with it.
Now you’ve got a basic idea of how to make an app like Siri and can choose the development method that suits you best. However, it would be wiser to seek professional help.