Google DeepMind is an Artificial Intelligence (AI) system that aims to build learning algorithms using neuroscience and deep learning technologies to create advanced AI machines. It seems the company has found the way to make the system talk with sophisticated fluency, instead of the robot-alike voice people are used to.

This program is called WaveNet and according to Google, it finally managed to leave the computer speech behind to achieve a more human speech. The goal? Reach a completely fluent human speech like the one seen in the sci-fi movie “Her.”

deepmind-pulse
AI refers to machines doing intellectual tasks at a level comparable to humans. Image Credit: Google DeepMind

About DeepMind

DeepMind, a technology developed initially by an English company, was purchased by Google in 2014 and the tech-giant has been working on it ever since. The goal of DeepMind is to create AI that works not only to make advanced computers but also to create devices that can flow with the human mind.

So far, the team operating in DeepMind has been focused on researching about AI systems and computer software that can perform “thinking” activities and challenges, such as playing arcade games and develop other computer structures by itself.

DeepMind is different from other AI technologies (like DeepBlue or Watson) since it is not built for a particular task or activity. The way DeepMind is programmed reveals that the technology is made to perform any work that is required, without being programmed in advanced for that particular task.

This means that DeepMind “learns” how to perform the activity and then executes it, even when it was not programmed to perform that particular task. The technology has been tested with different games and DeepMind proved it could learn to play different games and reach high scores with a considerable level of efficiency.

WaveNet can talk

DeepMind created WaveNet as the expression of the AI technology, and the talking feature was important since it is a crucial part of communication between the machine and the human user.

But Google wanted to step in the future and make WaveNet talk like a person, so it decided to ditch the artificial voice generator that uses the text-to-speech systems (TTS) -the technology used by most systems with voice- and creates new technology that makes WaveNet sound more like a human.

However, even when the DeepMind team created the human-alike voice system, such technology is still in a beta phase and is not practical in real-life devices. Google stated that a lot of research and computing power is required to make WaveNet viable to commercial distribution so it won’t be integrated into Google devices anytime soon.

DeepMind released some sets of audio performed by WaveNet in U.S. English and Mandarin Chinese, and it certainly sounds more like a person than a robot.

deepmind-google
DeepMind founder Demis Hassabis. Image Credit: The Verge.

This is a step forward in AI systems

WaveNet uses the sound of waves produced by the language instead of using the language structure itself. The network is based on neural system and connections, using the human brain as a reference.

The model speech provides raw waveforms of the audio that WaveNet tries to replicate. To do this, the system requires a considerable amount of data to process and “study”, and Google’s existing TTS data were crucial to achieving this result.

“Mimicking realistic speech has always been a major challenge, with state-of-the-art systems, composed of a complicated and long pipeline of modules, still lagging behind real human speech. Our research shows that not only can neural networks learn how to generate speech, but they can already close the gap with human performance by over 50%. This is a major breakthrough for text-to-speech systems, with potential uses in everything from smartphones to movies, and we’re excited to publish the details for the wider research community to explore,”  said Aaron van den Oord, a researcher at DeepMind.

Source: DeepMind Google