Interview with Dr. Yu Kai of AISpeech — The Importance of Naturalness in Natural Language Processing

Moving from Vocal Recognition to Vocal Interaction

Today, I want to talk about cognitive intelligence dialogs, the operative word being “dialog.” This word not only refers to speech but also to language itself. In my eight years at Tsinghua, studying man-machine interaction, we have gone through several shifts in the ways humans and machines interact.

Why did we start paying attention to intelligence in verbal dialogs?

This first thing we will discuss today is why we started paying attention to intelligence in verbal dialogs.

We all talk about the Internet age, but to what extent have information systems progressed?

Looking at the statistics, at the end of 2017, the number of IoT smart devices worldwide exceeded the human population for the first time. However, a vast majority of these devices have tiny screens or no screens at all, and users cannot perform complex operations on them. This means that to access the complex abstract information, users only can interact with such devices vocally or via dialog. This is why, starting in 2014, many tech giants began releasing smart speakers. From a technological perspective, this calls for more than a solution or a technology framework. It also involves dialog management, recognition, synthesis, and our understanding.

Problems and Opportunities of a Natural Vocal Interaction System

What are the main problems and are there any opportunities?

The first thing is speech recognition. Speech recognition is a cutting-edge perception technology, and most people are already aware of its applications. Businesses and researchers have already solved the main challenges in speech recognition. If I use a comprehensive speech recognition system, it will have no problem recognizing most of what I say, even poetry. However, even if we use deep-learning technology, we cannot avoid occasional speech recognition errors. Our task is to make the program more human so that, when it makes a mistake, it can correct itself in the context of the complete man-machine interaction. This requires the mutual assistance of perception and cognitive technologies.

Making Interactions Natural through Cognition

Cognition is the most frustrating aspect. The man-machine dialog is not as simple as most people imagine, because there are many forms of dialog, some of which technology can achieve more effectively than others. If we sorted dialogs based on the number of rounds, we could divide them into several categories. First, the shortest dialog form would be a single round. For example, I would say a sentence, and the machine would respond with a phrase, with no specific structural semantics. This is a command-type dialog and is extremely simple. A more complex form of dialog is ‘question and answer.’ Currently, many systems rely on conventional deep learning technologies to solve problems with ‘question and answer’ dialogs. Because the structure of such a dialog is usually a single question and then a single answer, with only occasional context; this is not a valid multi-round dialog.

One Technology, Three Levels

Looking at the level of cognition, we can split cognitive technology into three levels.
The first is the static level. This determines if the program can understand the natural language of a random statement and map it to the correct meaning.

What can custom voice interaction technology do?

Now, you must be curious. What does this customization technology do? For example, when developing the technologies for real-time speech recognition and large-vocabulary speech recognition, we can create a function that during semantic changes enables automatic speech recognition of the words. For example, if we add the name of a movie star, say “Nicole Kidman,” the system should be able to automatically add it to the word list and recognize it as the actress’ name for subsequent understanding and interaction.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: