Siri: 40 years of research into intelligent voice assistants

Halyna Kubiv

The voice assistant in the iPhone 4S and in subsequent Apple devices has a female voice and an unmistakable sense of humor, is familiar with the latest weather reports and financial markets and can list all the current Bundesliga games.

The fact that Apple landed a direct hit with Siri in the further development of its software in October 2011 is shown by the range of currently available competition: The speech recognition specialist Nuance has been launching a similar application called Nina since last summer. Since the update to the latest Android version Jelly Bean, Google has offered its search service Google Now. The company has even equipped the usual search app on the iPhone with voice recognition. Hardware manufacturers such as LG and Samsung also offer Siri-style smartphone assistants for their users with Quick Voice and S Voice.

Beta version

When it was presented in October 2011, Scott Forstall described the function in the iPhone 4S as software in beta stage. Siri Shortly after the start, he was only able to speak a few languages ​​such as English, German and French and had problems with strong accents. The requests and commands, which the assistant could understand and execute in English without any problems, led to a standard answer in other languages: “Should I look for it on the Internet?” At the introduction, however, Apple promised that the built-in iPhone secretary would learn, ever more users use the function. Indeed, right now you can organize your day pretty easily with the wizard.

Although Apple calls Siri intelligent, the number of people who use it improves the functionality. Most speech recognition processes are handled by Apple's servers; the iPhone only sends the processed audio signal. After editing, recognizing, searching and preparing the results for voice output on the iPhone, however, the spoken snippets remain stored there.

Like other current speech recognition programs, Apple's Siri is probably based on a statistical method for evaluating spoken language. The software tries to predict which variables will follow the already recognized data in a unit. The software compares the given unit with a reference database and evaluates the probability of several possibilities for the following variables. The unit with the highest degree of probability is chosen as a possible solution (Hidden Markov Model).

In our example, the speech recognition software has the task of recognizing individual words. Based on the vowels and consonants (phonemes) in the word, the software tries to predict which word it is exactly. The new audio file on the server is compared with the known samples from the database. The larger and more diverse the database is equipped with such patterns, the more precisely the software recognizes individual spoken words.

Descent from Siri

The intelligent voice assistant from Apple comes from an app, the developer of which the group took over in 2010. In the meantime, however, only one of the founders, Tom Gruber, is with Apple. The other two, Adam Cheyer and Dag Kittlaus, have left the company, Kittlaus resigned immediately after Siri was released in October 2011, Adam Cheyer has not been with Apple since summer 2012.

But the story of the intelligent voice assistant begins much earlier. The first research began in 1966 when DARPA (Defense Advanced Research Projects Agency) founded a department for research into artificial intelligence, the Artificial Intelligence Center (AIC). DARPA is a US Department of Defense agency whose primary responsibility is research and development for military purposes. At times the authority was called the somewhat more harmless ARPA (without the addition of "Defense"). Other projects emerging from DARPA's research laboratories include APRANET, currently known as Internet, and the Global Positioning System (GPS). The project for an external input device, later simply called a mouse, was also invented by Douglas Engelbart and his team at DARPA in 1968.

The artificial intelligence department AIC at DARPA is one of the largest in the world. In addition to speech recognition, the researched areas also include knowledge databases, robotics, computer graphics, discourse and communication. The research projects are funded by the government. The AIC researchers presented the first prototype of a dialog assistant back in 1977. In the years that followed, AIC's projects became more and more complex. Siri comes from the newer PAL (Personal Assistant that Learns) project.

SRI International

In July 2003, the DARPA research and development agency tendered a $ 22 million funding program for the PAL project in the AIC Department of Artificial Intelligence. The tender was won by SRI (Stanford Research Institute) International, a science-oriented research institute at Stanford University. Funding was initially limited to five years. The first phase ended in 2008, which resulted in CALO (Cognitive Agent that Learns and Organizes). The word itself comes from Latin ("calonist") and means "servant" (of the soldier).

The main requirements for such software were proactive actions - ideally CALO should be able to explain something, learn from experience and follow instructions. Provided that the program really learns from the interactions with the user, it can take on simple, repeatable tasks and provide support in complex problem situations. Research on CALO is currently being continued at SRI International. During the transition from the first to the second financing phase of the project, there were organizational and personnel changes. Siri Inc. was spun off as a subsidiary of the research institute.

People behind Siri

Adam Cheyer was one of the project managers in CALO's first financing phase at SRI International. Together with him, Dag Kittlaus and Tom Gruber founded the new company. Dag Kittlaus has been with Motorola since August 2002, in 2007 he switched to SRI International for a short time, from then on he worked on Siri until 2010. Tom Gruber also came to the new company through Stanford University. He conducted research at the Knowledge Systems Laboratory, which specializes in ontology and interaction between humans and computers.

From its founding in December 2007 to November 2009, the new company was able to collect around 24 million US dollars in investments. The main investors were Morgenthaler Ventures, Menlo Ventures and a millionaire from Hong Kong, Li Ka-shing. In February 2010, the virtual personal assistant launched for the first time in the App Store, just two months later, in April, Apple bought the small company. The application initially disappeared from the App Store. Around a year later, the application was permanently implemented in the iPhone 4S as an iOS function.