This project focuses on the Natural Language Generation (NLG) problem. More specifically, we aim to design models and techniques that are able to synthesise information from multivariate datasets and, given a question trigger from the end user, present this information in the form of meaningful text. Previous work on neural networks shows their great potential at tackling a wide variety of Natural Language Processing (NLP) tasks. Consequently, at the current stage we investigate the extent to which neural network language models could be employed for context-sensitive, data-driven verbalisation and Natural Language synthesis. Two application domains are currently considered: (i) Response generation in social media; and (ii) Data-driven article generation. These applications exemplify text generation of different complexity. An imminent goal is the introduction of the context-sensitive parameters that would allow a neural network architecture to participate in conversations (e.g. within the various social-networking services, such as Twitter and reddit) and to produce meaningful responses to user questions in QA systems. Response generation results based on a Recurrent Neural Network (RNN) architecture have been presented and discussed in the 1st year technical report at the University of Southampton. These results served as a preliminary step towards the design of a novel response generation system. The system is based on the hypothesis that each participant in a conversation bases their response not only on the previous dialog utterances but also on their individual background knowledge. The alignment model is based on: (i) A Long Short-Term Memory Neural Network trained over concatenated sequences of comments, (ii) A Convolution Neural Network trained on a Wikipedia dataset, and (iii) A max-margin objective that aligns the two trained models in a multimodal space. The results of work will be submitted as a paper to an upcoming NLP conference (e.g. EMNLP 2016 or COLING 2016).

A long-term goal of this project is data-driven article generation. We want to examine the extent to which neural network architectures could learn to align tabular and textual data. More specifically, we wish to design a neural network model that would be able to identify how the values of a table are discussed in a corresponding article, and then given one or more unknown tables, it would produce a meaningful article, discussing the input values and their potential relationships. These methods will support generation of the Natural Language descriptions for the data-driven answers in QA systems.



Aligning Texts and Knowledge Bases with Semantic Sentence Simplification. Yassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman, Jonathon Hare, and Elena Simperl. 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG) 2016. PDF

A Neural Network Approach for Knowledge-Driven Response Generation. Pavlos Vougiouklis, Jonathon Hare, and Elena Simperl. COLING 2016: 3370-3380 PDF

What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References. Alessandro Piscopo, Pavlos Vougiouklis, Lucie-Aimee Kaffee, Christopher Phethean, Jonathon Hare, and Elena Simperl. OpenSym 2017: 1:1-1:10 URL PDF

A Glimpse into Babel: An Analysis of Multilinguality in Wikidata. Lucie-Aimée Kaffee, Alessandro Piscopo, Pavlos Vougiouklis, Elena Simperl, Leslie Carr, Lydia Pintscher. OpenSym 2017: 14:1-14:5 URL PDF