QA Systems and Deep Learning Technologies — Part 2

Image for post
Image for post


This is the second article in a two part series about QA Systems and Deep Learning. You can read part 1 here. Deep Learning is a subfield of machine learning, and aims at using machines for data abstraction with the help of multiple processing layers and complex algorithms. Although similar to Artificial Intelligence (AI) and Machine Learning, Deep Learning takes a more granular approach to learning from data. One of the best applications of deep learning is digital assistants such as Siri and Google Now. These assistants can retrieve information about a human-to-human interaction. For instance, they can answer the question, “What movie is playing at my local cinema?” However, they do not know how to parse that sentence and need to be programmed to understand the context. This is where deep learning is vital, as it enables machines to decipher speech and text.

Relevant Deep Learning QA Technologies

In recent years, researchers further explored deep neural networks (DNNs) in regards to image classification and speech recognition. Language learning and representation via DNNs gradually became a new research trend. However, due to the flexibility of human languages and the complexity involved in the abstraction of semantic information, the DNN model is facing challenges in implementing language representation and learning.

DNN-based Semantic Representation

DNNs are gaining popularity in the world of machine translation. Researchers have designed various kinds of DNNs, such as deep stack networks (DSNs), deep belief networks (DBNs), recurrent neural networks (RNNs) and convolutional neural networks (CNNs). In NLP, the primary aim of all DNNs is to learn the syntactic and semantic representations of words, sentences, phrases, structures, and sentences so that it can grasp similar words (phrases or structures).

CNN-based Semantic Representation

The learning of CNN-based DNNs is aims at grasping representation vectors that form sentences. It does so by scanning sentences; extracting, and selecting characteristics. First, a sliding window is used to scan the sentence from left to right. Each sliding window contains multiple words, with a vector representing every word. In the sliding window, the convolution extracts the characteristics. Then, max pooling (a sample-based discretization process) selects characteristics. Repeating the above operation several times results in the retrieval of multiple vectors for representation. Connecting these vectors enables the semantic representation of whole sentences. As shown in Figure 1, inputs of CNN-based sentence modeling are in the form of word-vector matrices.

Image for post
Image for post

RNN-based Semantic Representation

In RNN-based sentence modeling, a sentence is considered a sequence of words and a vector represents each word. There is an intermediate representation of each position, and such representation is composed of vectors to account for the semantics from the beginning of the sentence to each position. The intermediate representation of each position is determined by the word vector on the current position and the intermediate representation of the previous position and formed through an RNN model. The RNN model considers the intermediate representation at the end of the sentence as the semantic representation of the whole sentence, as shown in Figure 2.

Image for post
Image for post

DCNN-based Semantic Matching Architecture

Primary function modules involved in the semantic matching of a QA system include the question retrieval (i.e., question paraphrasing detection), the answer extraction (i.e., matching calculation of questions and candidate text statements), and the answer confidence sequencing (i.e., marking on semantic matching between questions and candidate answers).

Parallel Matching Architecture

The first kind of DCNN-based semantic matching architecture is the parallel matching [38–40] architecture. Semantic representations (real value vectors) of two sentences result from inputting them into two CNN-based sentence models. These two semantic representations will then be input into a multilayer neural network to judge the semantic matching degree of the two sentences and determine whether they can form a matching sentence pair (QA pair). This is the basic idea of the DCNN-based parallel semantic matching model.

Image for post
Image for post

Interactive Matching Architecture

The second kind of DCNN-based semantic matching architecture is the interactive matching [39] architecture. Unlike parallel matching, the core idea of interactive matching is to understand the matching pattern of two sentences directly and conduct local interactions in different granularities between them at different depths of the model.

Image for post
Image for post

RNN-based Automatic Answer Generation

Compared with the retrieval-based reply mechanism, the generation-based answer feedback mechanism gives the answer that is automatically generated according to information entered by current users. It is composed of word orders, rather than answer statements generated by users editing through the retrieval of the knowledge base. This mechanism is used to construct a natural language generation model by using a large number of interactive data pairs. Using this information, the system can automatically generate a reply of natural language representation.

Image for post
Image for post


This paper briefly introduces the development history and basic architecture of the QA system. It also presents the DNN-based semantic representations, semantic matching models of different matching architectures, and answer generation models for solving some of the fundamental problems in the QA system. Deep learning has helped achieve this. However, there are still problems to be solved in the technical research of the QA system, for instance, how to understand users’ questions under a continuous interactive QA scenario, such as the language understanding in the interaction with Siri. In addition, how to learn the external semantic knowledge to ensure the QA system can conduct simple knowledge reasoning to reply to relation inference questions, such as “What hospital department should I visit if I feel chest distress and cough all the time?” Moreover, with the recent research and popularization of attention mechanism and memory network [54, 55] in the natural language understanding and knowledge reasoning, new development opportunities will be provided for research on automatic question answering.


[1] Terry Winograd. Five Lectures on Artificial Intelligence [J]. Linguistic Structures Processing, volume 5 of Fundamental Studies in Computer Science, pages 399- 520, North Holland, 1977.
[2] Woods W A. Lunar rocks in natural English: explorations in natural language question answering [J]. Linguistic Structures Processing, 1977, 5: 521−569.
[3] Dell Zhang and Wee Sun Lee. Question classification using support vector machines. In SIGIR, pages 26–32. ACM, 2003
[4] Xin Li and Dan Roth. Learning question classifiers. In COLING, 2002
[5] Hang Cui, Min-Yen Kan, and Tat-Seng Chua. Unsupervised learning of soft patterns for generating definitions from online news. In Stuart I. Feldman, Mike Uretsky, Marc Najork, and Craig E. Wills, editors, Proceedings of the 13th International conference on World Wide Web, WWW 2004, New York, NY, USA, May 17–20, 2004, pages 90–99. ACM, 2004.
[6] Clarke C, Cormack G, Kisman D, et al. Question answering by passage selection (multitext experiments for TREC-9) [C]//Proceedings of the 9th Text Retrieval Conference(TREC-9), 2000.
[7] Ittycheriah A, Franz M, Zhu W-J, et al. IBM’s statistical question answering system[C]//Proceedings of the 9th Text Retrieval Conference (TREC-9), 2000.
[8] Ittycheriah A, Franz M, Roukos S. IBM’s statistical question answering system — TREC-10[C]//Proceedings of the 10th Text Retrieval Conference (TREC 2001), 2001.
[9] Lee G, Seo J, Lee S, et al. SiteQ: engineering high performance QA system using lexico-semantic pattern.
[10] Tellex S, Katz B, Lin J, et al. Quantitative evaluation of passage retrieval algorithms for question answering[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘03). New York, NY, USA: ACM, 2003:41–47.
[11] Jiwoon Jeon, W. Bruce Croft, and Joon Ho Lee. Finding similar questions in large question and answer archives. In Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 — November 5, 2005, pages 84–90. ACM, 2005.
[12] S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, Y. Liu, Statistical machine translation for query expansion in answer retrieval, in: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics, Prague, Czech Republic, 2007, pp. 464–471.
[13] M. Surdeanu, M. Ciaramita, H. Zaragoza, Learning to rank answers on large online qa collections, in: ACL, The Association for Computer Linguistics, 2008, pp. 719–727.
[14] A. Berger, R. Caruana, D. Cohn, D. Freitag, V. Mittal, Bridging the lexical chasm: statistical approaches to answer-finding, in: SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, NY, USA, 2000, pp. 192–199.
[15] Gondek, D. C., et al. “A framework for merging and ranking of answers in DeepQA.” IBM Journal of Research and Development 56.3.4 (2012): 14–1.
[16] Wang, Chang, et al. “Relation extraction and scoring in DeepQA.” IBM Journal of Research and Development 56.3.4 (2012): 9–1.
[17] Kenneth C. Litkowski. Question-Answering Using Semantic Triples[C]. Eighth Text REtrieval Conference (TREC-8). Gaithersburg, MD. November 17–19, 1999.
[18] H. Cui, R. Sun, K. Li, M.-Y. Kan, T.-S. Chua, Question answering passage retrieval using dependency relations., in: R. A. Baeza-Yates, N. Ziviani, G. Marchionini, A. Moffat, J. Tait (Eds.), SIGIR, ACM, 2005, pp. 400–407.
[19] M. Wang, N. A. Smith, T. Mitamura, What is the jeopardy model? a quasi-synchronous grammar for qa., in: J. Eisner (Ed.), EMNLP-CoNLL, The Association for Computer Linguistics, 2007, pp. 22–32.
[20] K. Wang, Z. Ming, T.-S. Chua, A syntactic tree matching approach to finding similar questions in community-based qa services, in: Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, 2009, pp. 187–194.
[21] Hovy, E.H., U. Hermjakob, and Chin-Yew Lin. 2001. The Use of External Knowledge of Factoid QA. In Proceedings of the 10th Text Retrieval Conference (TREC 2001) [C], Gaithersburg, MD, U.S.A., November 13–16, 2001.
[22] Jongwoo Ko, Laurie Hiyakumoto, Eric Nyberg. Exploiting Multiple Semantic Resources for Answer Selection. In Proceedings of LREC(Vol. 2006).
[23] Kasneci G, Suchanek F M, Ifrim G, et al. Naga: Searching and ranking knowledge. IEEE, 2008:953–962.
[24] Zhang D, Lee W S. Question Classification Using Support Vector Machines[C]. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003. New York, NY, USA: ACM, SIGIR’03.
[25] X. Yao, B. V. Durme, C. Callison-Burch, P. Clark, Answer extraction as sequence tagging with tree edit distance., in: HLT-NAACL, The Association for Computer Linguistics, 2013, pp. 858–867.
[26] C. Shah, J. Pomerantz, Evaluating and predicting answer quality in community qa, in: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, 2010, pp. 411–418.
[27] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, CoRR abs/1301.3781.
[28] Socher R, Lin C, Manning C, et al. Parsing natural scenes and natural language with recursive neural networks[C]. Proceedings of International Conference on Machine Learning. Haifa, Israel: Omnipress, 2011: 129–136.
[29] A. Graves, Generating sequences with recurrent neural networks, CoRR abs/1308.0850.
[30] Kalchbrenner N, Grefenstette E, Blunsom P. A Convolutional Neural Network for Modelling Sentences[C]. Proceedings of ACL. Baltimore and USA: Association for Computational Linguistics, 2014: 655–665.
[31] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate [J]. arXiv, 2014.
[32] Sutskever I, Vinyals O, Le Q V V. Sequence to Sequence Learning with Neural Networks[M]. Advances in Neural Information Processing Systems 27. 2014: 3104–3112.
[33] Socher R, Pennington J, Huang E H, et al. Semi-supervised recursive auto encoders for predicting sentiment distributions[C]. EMNLP 2011
[34] Tang D, Wei F, Yang N, et al. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification[C]. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Baltimore, Maryland: Association for Computational Linguistics, 2014: 1555–1565.
[35] Li J, Luong M T, Jurafsky D. A Hierarchical Neural Autoencoder for Paragraphs and Documents[C]. Proceedings of ACL. 2015.
[36] Kim Y. Convolutional Neural Networks for Sentence Classification[C]. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, 2014: 1746–1751.
[37] Zeng D, Liu K, Lai S, et al. Relation Classification via Convolutional Deep Neural Network[C]. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin, Ireland: Association for Computational Linguistics, 2014: 2335–2344.
[38] L. Yu, K. M. Hermann, P. Blunsom, and S. Pulman. Deep learning for answer sentence selection. CoRR, 2014.
[39] B. Hu, Z. Lu, H. Li, Q. Chen, Convolutional neural network architectures for matching natural language sentences., in: Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, K. Q. Weinberger (Eds.), NIPS, 2014, pp. 2042–2050.
[40] A. Severyn, A. Moschitti, Learning to rank short text pairs with convolutional deep neural networks., in: R. A. Baeza-Yates, M. Lalmas, A. Moffat, B. A. Ribeiro-Neto (Eds.), SIGIR, ACM, 2015, pp. 373–382.
[41] Wen-tau Yih, Xiaodong He, and Christopher Meek. 2014. Semantic parsing for single-relation question answering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 643–648. Association for Computational Linguistics.
[42] Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question Answering over Freebase with Multi-Column Convolutional Neural Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL) and the 7th International Joint Conference on Natural Language Processing.
[43] Hochreiter S, Bengio Y, Frasconi P, et al. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies [M]. A Field Guide to Dynamical Recurrent Neural Networks. New York, NY, USA: IEEE Press, 2001.
[44] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Comput., 1997, 9(8): 1735–1780.
[45] Graves A. Generating Sequences With Recurrent Neural Networks[J]. CoRR, 2013, abs/1308.0850.
[46] Chung J, Gülçehre Ç, Cho K, et al. Gated Feedback Recurrent Neural Networks[C]. Proceedings of the 32nd International Conference on Machine Learning (ICML-15). Lille, France: JMLR Workshop and Conference Proceedings, 2015: 2067–2075.
[47] D.Wang, E. Nyberg, A long short-term memory model for answer sentence selection in question answering., in: ACL, The Association for Computer Linguistics, 2015, pp. 707–712.
[48] Malinowski M, Rohrbach M, Fritz M. Ask your neurons: A neural-based approach to answering questions about images[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1–9.
[49] Gao H, Mao J, Zhou J, et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question[C]//Advances in Neural Information Processing Systems. 2015: 2287–2295.
[50] Sun M S. Natural Language Processing Based on Naturally Annotated Web Resources [J]. Journal of Chinese Information Processing, 2011, 25(6): 26–32.
[51] Hu B, Chen Q, Zhu F. LCSTS: a large scale Chinese short text summarization dataset[J]. arXiv preprint arXiv:1506.05865, 2015.
[52] Shang L, Lu Z, Li H. Neural Responding Machine for Short-Text Conversation[C]. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China: Association for Computational Linguistics, 2015: 1577–1586.
[53] O. Vinyals, and Q. V. Le. A Neural Conversational Model. arXiv: 1506.05869,2015.
[54] Kumar A, Irsoy O, Su J, et al. Ask me anything: Dynamic memory networks for natural language processing[J]. arXiv preprint arXiv:1506.07285, 2015.
[55] Sukhbaatar S, Weston J, Fergus R. End-to-end memory networks[C]//Advances in Neural Information Processing Systems. 2015: 2431–2439.

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store