Acceptance Prediction for Answers on Online Health-care Community

Background With the development of e-Health, it plays a more and more important role in predicting whether a doctor’s answer can be accepted by a patient through online healthcare community. Unlike the previous work which focus mainly on the numerical feature, in our framework, we combine both numerical and textual information to predict the acceptance of answers. The textual information is composed of questions posted by the patients and answers posted by the doctors. To extract the textual features from them, we first trained a sentence encoder to encode a pair of question and answer into a co-dependent representation on a held-out dataset. After that,we can use it to predict the acceptance of answers by doctors. Results Our experimental results on the real-world dataset demonstrate that by applying our model additional features from text can be extracted and the prediction can be more accurate. That’s to say, the model which take both textual features and numerical features as input performs significantly better than model which takes numerical features only on all the four metrics (Accuracy, AUC, F1-score and Recall). Conclusions This work proposes a generic framework combining numerical features and textual features for acceptance prediction, where textual features are extracted from text based on deep learning methods firstly and can be used to achieve a better prediction results.


Background
Recently, the online service system develop so fast and it covers many fields such as the legal field, the medical field and the education field. As a representative among them, the Online Healthcare Communities (OHCs) has been played an important role in bridging the gap between doctors and patients. Now it's quite popular due to its convenience and accessibility. In OHCs, when a patient posts a question which describes his/her morbid conditions, there will be more than one doctors who post their suggestions under this question, it proves to be an effective way of communication between doctors and patients. Therefore, predicting whether a doctor's answer will be accepted by the patient is critical in OHCs. First, some *Correspondence: zywei@fudan.edu.cn 1 School of Data Science, Fudan University, Handan Road, Shanghai, China Full list of author information is available at the end of the article most relevant answers for patients could be retrieved from existing answers under the questions. Second, we can provide some suggestions for doctors on how to reply to patients' questions more appropriately.
A promising line of research based on OHCs mainly lies in the relationships between different factors such as the mechanism of these relationships, e.g., the effect of using OHCs on the doctor-patient relationship and patient wellbeing [1], the benefits of social support exchanged in OHCs to patients' mental health [2], the relationship between patients' exercise activities and participation in OHCs [3]. What's more, some researchers identified several factors that influence patients' behaviors in OHCs, such as the source of information in OHCs [4] and IT enablers and health motivators [5]. In terms of the satisfaction of patients, which has an impact on the acceptance of answers, Liu et al. examined the individual and organizational reputation of doctors [6], and Yang et al.
found that patients satisfaction is also affected by doctorpatient exchange frequency and the responding speed of doctors [7].
Even though these researches all presented promising results, they just ignored the text information which is considered to be the chief information carrier in OHCs. For instance, latent dirichlet allocation (LDA) [8] was applied in some existing works to extract text features from electronic health records (EHRs) and text in OHCs. Applying topic analysis on the cancer clinical notes [9]; mining some popular topics [10] and semantic analysis [11] based on the reviews in OHCs; identifying similar patients according to EHRs [12]; In addition, n-gram based methods was also used for infection detection based on EHRs [13].
With the development of computational biology, many disease-related genes are detected based on biological data and deep neural network [14][15][16][17]. In order to find out the relationships between diseases, the network based method is also introduced [18,19]. As a result, it gives us a way to cope with biological data in big amount.
Recently, the deep neural network has been employed in a lot of natural language processing tasks, such as machine translation [20,21], text generation [22,23], question and answering [24,25], dialogue systems [26][27][28], where we can encode a sequence of words into a vector through the recurrent neural networks (RNN) based model. The experimental results have demonstrated the effectiveness of encoding sentences with RNN-based models.
For convenience, we denote the features extracted from answers and questions as textual features, while the others (e.g., patient's age and gender, the doctor's answering order and title, the length of answer etc.) are denoted as numerical features. In our work, we combine these two types of features together. The process is listed as follows, first we trained a sentence encoder to encode a pair of question and answer into a vector, which is what we called textual features, after then we feed both textual features and numerical features into a classifier to predict whether an answer will be accepted by the patient. It's a worthy to note that the fascinating attention mechanism (e.g., self-attention [29], co-attention [30]) could be integrated into our sentence encoder and any classifiers could be used in our framework. Experimental results demonstrated the effectiveness when applying textual features in predicting whether an answer will be accepted or not.

Dataset Description
The dataset are collected from a popular Online Healthcare Community (http://club.xywy.com). This community was found in 1999 and it's one of the earliest online service system which explored and practised Internet healthcare services. In addition, the number of registered users exceeded 120 million and the number of daily online visitors exceed 22 million up to now. After years of development, it has been one the top platforms of online medical service industry in China.
In this website, there are many sections and most people will turn to youwenbida (Q&A) section for help because patients and doctors can communicate freely in this platform. There is an example in the youwenbida (Q&A) section and the detailed information we can get is shown in Fig. 1.
On this platform, the process is listed as follows. A patient could post a question which depict his/her morbid condition in words, then there will be more than one doctor who give some suggestions to the patient under this question. After that, the patient can either accept one of the suggestions or inquire some specific doctors again until his/her question is solved. We denote one suggestion is helpful to the patient only if he/she adopts this suggestion or inquire after this suggestion again. The pipeline for data processing and model construction can be seen in Fig. 2.
The dataset we crawled from the website contains both numerical and textual features. The numerical features contains three parts: patient's information, doctor's information and answering information. These features just play an important role for us to predict whether an answer will be helpful to the patient.
We list all the numerical features that will be used in our model in Table 1. Among them, some are categorical variables so we change them into one-hot vector, while for the continuous variable, we take logarithmic transformation because they all follow the right-skewd distribution.
Patient's information: Each patient is required to fill their personal information once they register on the website. And the information that we can see is their age and gender, which are included in our numerical features.
Doctor's information: Doctors in OHCs are strictly certified so they have specific information on the website. The doctor's information we used includes doctor's title, doctor's reputation, the number of online patients this doctor has helped, the number of gratitude received from online patients.
Answering information: The answering information comes from interactions between patients and doctors, which reflect the corresponding behaviors of doctors and influence patient's satisfaction in turn [7]. The answering information includes the length of answer and doctor's answering order under the corresponding question.
Textual information: The textual information are mainly extracted from patient's questions and the doctor's answers, while we can only cope with it through natural language processing. And the details of extracting textual features will be described in next section.

Data Preprocessing
Numerical data preprocessing In our dataset, we collected totally 225 kinds of diseases and nearly 17 million records. For the ease of calculation, we select two kinds of diseases as representatives, one is hypertension which is a representative of chronic disease, another one is Oral Ulcer which is a representative of acute disease.
Due to the fact that only when a patient respond to answer can we know whether an answer is helpful to the patient, so we just exclude the records where the patient has not any response and keep the others. Among all the answers under a question, if the patient think it's helpful then we label it as the positive record, while for the other answers we label them as the negative records. Moreover, we exclude the records with missing values.
In the processed dataset, there are 15014 samples in total. Among them, nearly 60% samples are positive and nearly 40% samples are negative. Moreover, the samples   Table 2.
Textual data preprocessing Because All the text we obtained in our dataset is in Chinese, at first we use Jieba (https://github.com/fxsjy/jieba) in Python to perform text segmentation in that Jieba is quite a popular module for Chinese word segmentation. Moreover, in order to distinguish the uncommon words (e.g., disease names, drug names) better, some customized dictionaries are added in the original dictionary, such as the medical thesaurus from Sogou Input (https://pinyin.sogou.com/dict/) and QQ Input (http://cdict.qq.pinyin.cn/). After then, there are totally 385950 words are added to the original dictionary. After text segmentation, we remove the stop words from the text, which contain modal words, conjunctions and pronouns etc. Then, we perform word embedding so that we can use it in the recurrent neural network (The preprocessing is performed on whole dataset, i.e., 17 million records).

Textual Features Extraction
Sentence Encoder In order to get a representation from the text information, we apply the sentence encoder to encode both the questions and answers. Because of the strong relationship between an answer and its corresponding question, we need a model to join them together and output a joint representation of them. Hence we apply the co-attention mechanism [30] and get the joint representation. The framework of sentence encoder is described in Fig. 3.
Here is a brief introduction to this process. When given a pair of question and answer, we denote x A t and l is the state size of LSTM. LSTM Q and LSTM A could share parameters for the same representation power. In addition, we introduce a non-linear projection layer to map the question into a different space of answer A. Finally the question can be represented as Q = tanh(W Q Q + b Q ).
In order to calculate the affinity scores which corresponds to all pairs of document words and quetion words, we compute the affinity matrix as follows: L = A Q ∈ R m×n . Then we apply the row-wise normalization in L to get the attention weights across the answers for each word in the question and the score matrix S Q is generated. Similarly, when applying the column-wise normalization in L, the score matrix S A can be generated. The calculation formula are listed as follows: After that, we compute the co-dependent representation of the question and answer as Xiong et al. [30] did: The question and answer are first encoded by LSTM and no-linear layer as Q and A respectively, then Q and A are encoded by co-attention encoder [30] as C A . Next, context function maps C A into a vector h, i.e, the representation of the given pair of question and answer, which are fed into softmax layer for binary classification, i.e., accepted or not where the notation [a; b] is the horizontal concatenation of the vectors a and b . C Q = AS Q ∈ R l×n is the summaries of the question that attends each word in the answer. Then a pair of question and answer can be encoded as C A . In order to extract more useful information in acceptance prediction, we train a sentence encoder. More specifically, when given a positive pair of question and answer (Q i , A i ), we aim to maximize the accepted probability of acceptance if the answer for this question was accepted. Otherwise, the objective is to minimize the accepted probability. Then we apply the cross-entropy as the loss function and write it as follow: where N is the total number of pairs of questions and answers.
In order to get the acceptance probability of an answer, we use a context function which maps C A into a vector and the softmax layer could be applied to compute the acceptance probability. Actually we can regard each column c i in C A as the encoding vector for the i-th word of answer. In addition, we can use another LSTM layer f C here and outputs the last hidden state as the representation of the pair of question and answer. Moreover, we could also use a multi-layer perceptron (MLP) as f C by applying it on each column of C A . Finally a vector of length m will be generated to represent h.

Acceptance Prediction
After the construction of the sentence encoder above, we can extract the textual features when we are given a pair of question and answer (Q i , A i ) and the formula is as follow: h text i = sentenceEncoder(Q i , A i ). Because we can also get the numerical features from the doctor, the patient and their corresponding behaviors, a dataset and y i indicates whether answer A i is helpful to the patient on question Q i .

Experimental Setup
As we discussed in "Methods" section, all sentences in our dataset are segmented and stop words are removed. In order to feed the sentences of question and answer into sentence encoder, we first use the word embedding method to change them into vectors based on the complete dataset of 17 million records. To realize word The area under ROC curve embedding, the word2vec [31] algorithm is applied and the dimension of vectors is set to 100. In addition, if the frequency of the words is below 30 and we replace them with 'UNK' . Among the whole dataset, we use 80% of them to train the sentence encoder and the rest 20% is used for testing, the classifier is built with a 5-fold cross validation. The objective of the sentence encoder is to minimize the loss in Eq. (3) on the held-out dataset. In addition, some parameters in our model is set as follows: the state size in the LSTM cell is set to 100, the batch size is set to 100 and the learning rate is set to 0.001. Moreover, we also adopt the dropout module and gradient clipping module in our model to avoid over fitting. Model comparisons. After the training of sentence encoder, the textual features of each sample in non-heldout set can be extracted by forward passing through the sentence encoder. In order to evaluate the sentence encoder, we trained a gradient boosting classifier (GBC) .There are two types of sentence encoders that we will use for comparisons, one is LSTM and it will input the whole sentence and output the last state as the final textual features, we denote it as LSTM-GBC. Another one is MLP and it can also outputs a vector h ∈ R m , we denote it as MLP-GBC. First, in order to compare the models with different encoders we will compare the result between LSTM-GBC and MLP-GBC. Second, in order to test the effectiveness of textual features and numerical features in acceptance prediction, we will compare these three models: the model with textual features only, the model with numerical features only and the model with both numerical and textual features.
Evaluation metrics. So as to evaluate the performance of different models, we use some common metrics such as Recall, Precision, Accuracy, AUC and F1-score. The definitions of these metrics are listed in Table 3.
In Table 3, TP means true positives, TN means true negatives, FP means false positives and FN means false negatives. ROC curve is plotted with the FP rate as the horizontal axis and the TP rate as the vertical axis. The FP rate and TP rate are calculated as follows:

Results
Model Results. The experimental results are listed in Table 4. We can see that the model with numerical features only outperforms the model with textual features only. While the model with both textual and numerical features has a better performance in nearly all metrics compared with the model with numerical features only. It shows that the additional textual features can significantly improve the model's ability in acceptance prediction and it demonstrates that the sentence encoder can extract the textual from text effectively. When comparing the sentence encoder based on LSTM with the sentence encoder based on MLP, we can see if we only consider the textual features, LSTM-GBC performs better than MLP-GBC, while when we consider both textual and numerical features, MLP-GBC achieves a better result. We attribute the unstable results to the small amount of dataset for training the sentence encoder.
Attention Analysis. Figure 4 shows the attention weights (i.e., score matrix S A and S Q ). From Figure 4 (a), we can see that the accepted answer is mainly focus on the uncertain part of the question, such as "Is that OK?" , "Excuse me", which means the doctors will answer the questions with more uncertain part. Figure 4 (b) shows that the accepted answer is mainly focus on "no eating", which indicates that the patients is prone to accept an answer with diet suggestions. In conclusion, the sentence encoder, which is enhanced with co-attention mechanism, is able to extract useful textual features attending on both question and answer for a better acceptance prediction.

Discussion
In this work, we leverage both textual features and numerical features to predict the acceptance for answers in OHCs and demonstrate the procedure of textual features extraction, which provides a guideline to text involved prediction tasks. Moreover, any other deep learning methods for natural language processing can be integrated into this framework to extract textual features automatically for a better results. Also, there are several limitations. First, we train the sentence encoder only for two kinds of diseases in this work, and the generalization to other diseases of the sentence encoder still needs to be verified. Second, we can only extract textual features based on a manually constructed word list because of the lack in Chinese medical resources, which limits the performance of our methods.

Conclusion
In this work, in order to predict the acceptance of answers, we propose a framework which combine both the numerical and textual features. Meanwhile, the sentence encoder is also introduced to extract the textual features from the texts. The experimental results demonstrated that additional textual features can significantly improve the model's ability in acceptance prediction and the sentence encoder can extract the textual from text effectively. In the future, we will try to use some state-of-the-art sequence models and the reinforcement learning based models to extract textual features from text.