Abstract:
Developer forums are essential for the development of a software as they are used
to solve the problems/issues raised at such forums through assistance of experts.
Often their are multiple answers (solutions) to a single issue and some of these an swers are not helpful/satisfactory. The users usually browse all the answer within
a question thread to get the required answer. This is a tedious and time-consuming
task. In this thesis, we proposed an automatic classification approach to predict
high quality answers of the questions on a developer forum. First, we extract meta data features (such as length of words, number of characters/sentences and average
characters per word) and then, we utilize natural language processing techniques
(such as data cleaning, tokenization, stop words removal and spell corrections).
Also we employ a keyword ranking algorithm, which uses ranking scores on the
text of all answers under each question. Next, we used word embedding to trans form the preprocessed textual description of answers into feature vector. Finally,
we input the vectors of metadata, keywords and textual features to the proposed
deep learning based integrated model for training and prediction of high quality
answers. The proposed integrated model includes a combination of the convolu tional neural network (CNN) and long short term memory (LSTM) algorithm. The
results of the 10 fold cross-validation suggest that the proposed approach shows
2
better results as compared to a recent best answer prediction approach.
Keywords: Developer Forums, Best Answer Prediction, Stack Overflow, Technical
Q&A sites, Deep Learning