QA Systems and Deep Learning Technologies — Part 2


Relevant Deep Learning QA Technologies

First, compared to voices and images, language is a non-natural signal and symbol system that is entirely produced and processed by the brain. The variability and flexibility of languages are far more than that of image and voice signals. Second, images and voices have precise mathematical representations. For example, grayscale images are mathematical matrices, and even the most granular element has a definite physical meaning, with the value at each point of the pixel indicating a grayscale color value. In contrast, the previous bag-of-word representation method may result in problems such as difficulty in deriving meaningful insights due to excess number of dimensions, high sparsely, and semantic information loss in language representation.

Researchers are increasingly interested in the application of the deep learning model for natural language processing (NLP), focusing on the representation and learning of words, sentences, articles, and relevant applications. For example, Bengio et al. obtained a new vector image called word embedding or word vector using the neural network model [27]. This vector is a low-dimensional, dense, and continuous vector representation, and contains semantic and grammatical information of the words. At present, word vector representation influences the implementation of most neural network based NLP methods.

Researchers designed the DNN model to learn about vector representation of sentences, which includes sentence modeling of the recursive neural network, the recurrent neural network (RNN) and the convolutional neural network (CNN) [28–30]. Researchers applied sentence representation to a large number of NLP tasks and achieved prominent outcomes, such as machine translation [31, 32] and sentiment analysis [33, 34]. The representation of sentences and the learning of articles are still relatively difficult and receive little research. An example of such research is that done by Li and his team, who implemented a representation of articles by encoding and decoding them via hierarchical RNN [35].

Two fundamental issues exist in the QA field. The first is how to implement the semantic representation of a question and answer. Both the interpretation of the user’s question and the extraction and validation of the response require abstract representation of essential information of the question and answer. It involves representation of not only syntactic and grammatical information of the QA statements but also of the user’s intention and matching information on the semantic level.

The second is how to implement semantic matching between the question and the answer. To ensure the reply to the user’s question meets strict semantic rules, the system must make reasonable use of a high-level abstract semantic representation of the statements to capture the semantic-matching model for the two texts.

Given the language representation capability that CNN and RNN have shown in the NLP field in recent years, more researchers are trying the deep learning method to complete key activities in the QA field, such as question classification, answer selection, and automatic response generation. Also, the naturally annotated data [50] generated by internet users for exchange of information, such as microblog replies and community QA pairs provide reliable data resources for training the DNN model, thereby solving the data shortage problem in the QA research field to a large extent.

DNN-based Semantic Representation

CNN-based Semantic Representation

The output is meaningful after connecting values of multiple points on each line in the matrix, as it then represents the corresponding word in the sentence. The word vector matrix is obtained by transforming words in a sentence into corresponding word vectors and then arranging them in the same order as the words. This model is used to express a sentence as a vector of fixed length through multilayer overlapping, convolution, and max pooling. Such architectures can be used to handle various supervised natural languages by adding a classifier on the top layer of the model.

Figure 1: CNN-based Sentence Modeling

CNN-based sentence modeling can be presented as a “combination operator” with a local selection function. With the progressive deepening of the model level, the representation output obtained from the model can cover a wider range of words in a sentence. A multi-layer operation achieves sentence representation vectors of fixed dimension. This process is functionally similar to the recurrent operation mechanism [33] of “recursive automatic coding.”

The sentence model formed through one layer of convolution and global max pooling is called a shallow convolutional neural network model. It is widely used for sentence level classification in NLP, for example, sentence classification [36] and relation classification [37]. However, the shallow convolutional neural network model can neither be used for complicated local semantic relations in sentences nor provide a better representation of semantic combination at a deeper layer in the sentence. Global max pooling results in the loss of word order characteristics in the sentence. As a result, the shallow convolutional neural network model only can be used for local attribute matching between statements. For complex and diversified natural language representations in questions and answers, the QA matching model [38–40] usually uses the deep convolutional neural network (DCNN) to complete the sentence modeling for questions and answers and conducts QA matching by transferring QA semantic representations from high-level output to multilayer perceptrons (MLP).

RNN-based Semantic Representation

Figure 2: RNN-based Sentence Modeling

The RNN model has a structure similar to that of the hidden Markov model, but with more powerful presentation skills. The intermediate representation has no Markov assumption, and the model is non-linear. However, with the increase of the sequence length, the vanishing gradient problem occurs [43] during the RNN training. To solve this issue, researchers improved the design of recurrent computing units in the RNN and proposed different variants, such as Long Short-Term Memory (LSTM) [44, 45] and Gated Recurrent Unit (GRU) [56].

The two RNN types mentioned above can be used to process long-range dependence relations and to provide a better semantic representation of the whole sentence. Through Bidirectional LSTM, Wang and Nyberg [47] studied the semantic representation of question-answer pairs and inputted the acquired representations into a classifier to compute the classification confidence level.

Recently, researchers finished semantic representation learning in image scenarios of questions by integrating CNN and RNN. During the word sequence scanning conducted by RNN for questions, the model uses the combined learning mechanism based on deep learning to finish learning “with texts and graphics,” to realize question modeling in the image scenario for the final QA matching.

For instance, during the RNN’s traversal on the words in questions, the learning model proposed by Malinowski, et al. [48] considers the image representation obtained by CNN and the word vector on current word position as the input information for RNN. It tries to learn from the current intermediate representation, thus realizing the combined learning of images and questions.

By contrast, Gao et al. [49] first used RNN to complete the sentence modeling for questions and then regarded both the semantic representation vector of questions and the image representation vector obtained by CNN as the scenario information to generate answers during the answer generation.

DCNN-based Semantic Matching Architecture

Parallel Matching Architecture

Figure 3: DCNN-based Parallel Matching Architecture

The parallel matching architecture outlined in Figure 3 shows that two independent CNNs acquire representations of two sentences, and information between these two sentences will not affect each other before receiving their own representations. This model is used to match two sentences from the global semantic aspect but ignores more sophisticated local matching characteristics. However, in questions related to statement matching, local matching often exists between two sentences. For example, question-answer pairs as shown below:
Sx: I’m starving, where are we going to eat today?
Sy: I’ve heard that KFC recently launched new products, shall we try some.

In this QA pair, there is a strong matching relation between “to eat” and “KFC,” while the parallel matching reflects in the global representation of these two sentences. Before the representation for the whole sentence is concluded, “to eat” and “KFC” do not affect each other.

Interactive Matching Architecture

Next, the representation of sentences matching at various levels is noted and, finally, the equal representation of sentences for fixed dimensions is obtained with the matching representation being marked.

Figure 4: DCNN-based Interactive Matching Architecture

As shown in Figure 4, the first layer of the interactive matching architecture directly obtains the local matching representation at a lower layer between sentences by the convolutional matching of the sliding windows between them. In the subsequent high-level learning, it employs two-dimensional convolution and two-dimensional local max pooling similar to those during image field processing, to learn the high-level matching representation between sentences of questions and answers.

In this form, the matching model can not only conduct rich modeling for the local matching relation between two sentences but also perform modeling for information in each sentence. It is clear that the result vector obtained from interactive matching learning contains not only the position information of sliding windows for these two sentences but also their matching representation.

For semantic matching between questions and answers, interactive matching fully allows for the internal matching relation between questions and answers as well as obtaining matching representation vectors between them through two-dimensional convolution and two-dimensional local max pooling. During the process, interactive matching focuses more on the matching relation between sentences and conducts exact matching on them.

Compared with parallel matching, interactive matching considers not only the combination quality of words in the sliding window of each sentence but also the quality of matching relation for the combination of the two sentences. The advantage of parallel matching is that respective word order information can be maintained during matching, as the parallel matching carries out modeling on sliding windows for sequences on both sentences. Comparatively speaking, the QA matching process of interactive matching is the interactive mode to learn local information between statements.

Since neither the local convolution operation nor the local max pooling can change the overall sequence of local matching representation of two sentences, the interactive matching model can maintain the word order information of questions and answers. In short, interactive matching can obtain a local matching mode between two sentences by conducting modeling for the matching between questions and answers.

RNN-based Automatic Answer Generation

The automatic answer generation mode needs to solve two significant problems: sentence presentation and language production. In recent years, the recurrent neural network has performed well in both language representation and generation, in particular for the RNN-based encoding-decoding architecture, which has made a breakthrough in machine translation [31, 32] and automatic abstraction [51].

Based on the encoding-decoding frame of the GRU (Gated Recurrent Unit) [46] recurrent neural network, Shang [52] et al. proposed the dialog model “Neural Responding Machine” (NRM) that is based on the neural network and can be used to realize man-machine single-turn dialogs. The NRM is used to learn people’s replies from a large scale of information pairs (question-answer pairs, microblog-reply pairs) and to save models acquired in nearly four million model parameters for the system, that is, to obtain a natural language generation model.

As shown in Figure 5, NRM regards the sentence input as a sequence of word representations. Then, NRMS transforms it into a sequence of intermediate representations through an encoder, that is, an RNN model, and finally converts it into a series of words as inputs of a sentence through a decoder, that is, another RNN model. Since the NRM uses a hybrid mechanism during coding, the sequence of an intermediate representation obtained from coding cannot only fully grasp user’s statement information, but also retain other details of sentences. It also employs the attention mechanism [31] during decoding to ensure the generation model can easily grasp the complex interactive model in the QA process.

The generation-based question answering mechanism and the retrieval-based retrieval feedback mechanism have their characteristics: in microblog data with personalized expression forms, the accuracy rate of the former is relatively higher than that of the latter, namely 76 percent and 70 percent respectively. However, answers obtained from the former may have grammar impassibility and poor coherence while those from the latter may have rational and reliable expressions since microblog users edited them.

Figure 5: Answer Generation Model Based on Encoding-decoding Frame

At present, the NRM and Google’s Neural Conversational Model (NCM) [53] still realize the language generation at the upper level of complicated language model memory and combination but are unable to use external knowledge during an interaction. For instance, in the sentence, “How does the West Lake of Hangzhou compare to the May Day of last year?” they are unable to give a reply related to the real situation (comparison results).

Nevertheless, the significance of NRM and NCM is that they preliminarily realize the humanoid automatic language feedback. In the past few decades, most QA and dialogue models generated through researchers’ unremitting efforts were based on rules and templates or research conducted in an extensive database. These two modes cannot generate feedback and lack adequate language comprehension and representation. This is often due to limited data points and the expression of templates/examples. These modes have certain deficiencies in their accuracy and flexibility and struggle to consider both, the natural smoothness of language and the matching semantic content.



Original link:




Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: