Abstract:
With the advancement in technology and establishment of Recurrent Neural
Networks (RNN), various Natural Language Generation (NLG) tasks have
achieved tremendous success, such as image caption generation, neural machine translation and abstract summarization based systems. However, inclusion of pre-specified lexical constraints in the sentences for increasing the
quality still remains a new, not well studied task in NLG. Lexical constraints
takes the form of words in the output sentences for mitigating out-of-domain
image tags (constraints) issue in image caption generation. Moreover, spoken
dialogue systems tend to generate universal replies that contain specific information, therefore pre-specified constraints can be incorporated in replies to
make them more realistic. However, existing methods allows the inclusion of
lexical constraints in the output sentences during the decoding process, which
increases the architectures complexity exponentially or linearly with respect to
number of constraints. Also, some approaches can only deal with single con straint. To this end, this thesis proposes an neural probabilistic architecture
based on backward/forward language model and word embedding substitution
method that can cater multiple constraints to generate fluent and coherent
sentences. Moreover, we split the sequence on Part-of-Speech verb category
for backward generative model to employ word’s positional information. The
analysis of the proposed architecture for generating lexical constrained sentences outperforms previous methods in terms of perplexity evaluation metric.
Human evaluation also presents that generated constrained sentences are close
to human-written sentences in particular aspect of fluency