Abstract:
One of the most vital steps in automatic Question Answering systems is question classification,
also known as Answer type classification, identification or prediction. Precise and accurate
question
classification can lead to the elimination of irrelevant candidate answers from the pool of answers
available for the question. High accuracy of question classification means accurate answer for the
given question. This paper proposes an approach, named as Question Sentence Embedding (QSE),
for question classification by utilizing semantic features. Extracting large number of features do
not solve the problem every time. Our proposed approach simplifies feature extraction stage by
not extracting features such as named entities, present in fewer questions because of their short
length, and hypernyms and hyponyms of a word that requires WordNet extension. These features
make the system more dependent on external sources. We have used Universal Sentence
Embedding with Transformer Encoder for obtaining sentence level embedding vector of fixed size
and then calculated the semantic similarity among these vectors to classify questions in their
predefined categories. As it is the time of global pandemic COVID-19 and people are more curios
to ask questions about COVID-19. Our experimental dataset is publicly available COVID-Q
dataset. Our results have achieved an accuracy of 69% on COVID-19 question classification task.
Our proposed approach has outperformed the baseline method, 53.4%, manifesting the efficacy of
proposed QSE method.