 Research article
 Open Access
 Published:
Exploring spatialfrequencysequential relationships for motor imagery classification with recurrent neural network
BMC Bioinformatics volume 19, Article number: 344 (2018)
Abstract
Background
Conventional methods of motor imagery brain computer interfaces (MIBCIs) suffer from the limited number of samples and simplified features, so as to produce poor performances with spatialfrequency features and shallow classifiers.
Methods
Alternatively, this paper applies a deep recurrent neural network (RNN) with a sliding window cropping strategy (SWCS) to signal classification of MIBCIs. The spatialfrequency features are first extracted by the filter bank common spatial pattern (FBCSP) algorithm, and such features are cropped by the SWCS into time slices. By extracting spatialfrequencysequential relationships, the cropped time slices are then fed into RNN for classification. In order to overcome the memory distractions, the commonly used gated recurrent unit (GRU) and longshort term memory (LSTM) unit are applied to the RNN architecture, and experimental results are used to determine which unit is more suitable for processing EEG signals.
Results
Experimental results on common BCI benchmark datasets show that the spatialfrequencysequential relationships outperform all other competing spatialfrequency methods. In particular, the proposed GRURNN architecture achieves the lowest misclassification rates on all BCI benchmark datasets.
Conclusion
By introducing spatialfrequencysequential relationships with cropping time slice samples, the proposed method gives a novel way to construct and model high accuracy and robustness MIBCIs based on limited trials of EEG signals.
Background
Motor imagery brain computer interfaces (MIBCIs) construct pathways by electroencephalography (EEG) signals’ eventrelated desynchronizing/eventrelated synchronizing (ERD/ERS) phenomenon in central brain’s band power in two rhythms, μ (8  12 Hz) and β (18  25 Hz) [1, 2]. Due to characteristics of EEG signals, conventional methods of MIBCIs can be roughly divided into three categories: (1) classification by spatial features [3–7], (2) classification by frequencyspatial features [8–12], and (3) classification by temporalfrequency features [13–17]. The stateoftheart approach of MIBCIs was spatialfrequency features extracted by filter bank common spatial pattern algorithm (FBCSP) [8, 12]. Such FBCSP algorithm was effective for constructing optimal spatial features that discriminate among different classes of ERD/ERS rhythms in MIBCIs by a bank of bandpass filters [18, 19]. By distinguishing the relationships between EEG signals and underlying primary source, the spatialfrequency features were good at solving the volume conduction effect [20].
Although the spatialfrequency features are enough for classification of EEG signals in MIBCIs, the number of samples and simplified features are still two major challenges for the classification. First, since the conventional classification of EEG signals was usually adapted by “shallow” classifiers (linear discriminant analysis (LDA), support vector machine (SVM), and neural network (NN)) [21–25], such classifiers are appropriate for small sample size. Hence, a complete entity of each motor imagery trial’s spatialfrequency features was fed into these classifiers for classification. Due to the difficulty of obtaining motor imagery trials, public or private datasets have limited amounts of EEG trials from MIBCIs [26, 27]. Thus, “shallow” classifiers with less data will produce poor performances of classification.
Second, except for spatialfrequency features, EEG signals’ sequential relationship is another useful feature for motor imagery classification. By cropping the spatialfrequency features into several time slices, each time slice can be treated as timeseries, which contains sequential relationships over time. If the sequential relationships can be modeled by classifiers, the novel spatialfrequencysequential relationships will significantly improve the performances and robustness of motor imagery classification.
To solve the two major challenges, this paper introduces a deep recurrent neural network (RNN) architecture for the classification based on FBCSP algorithm [28, 29]. Also, by modeling EEG signals by RNN, an optimal number of hidden layers are obtained for RNN. Then, a sliding window cropping strategy (SWCS) is used to crop the entity trial into several time slices to increase the number of samples by the optimal number. Since the deep neural networks have dramatically improved the stateoftheart methods in signal processing and classification, researches on EEG signals have been developed by using deep learning techniques to extract essential feature representations. The sequential relationships of EEG signals are easy to be extracted by RNN architecture. Therefore, the two contributions of this study are as follow:

1.
A deep RNN architecture is applied to the FBCSP features to extract the spatialfrequencysequential relationships for motor imagery classification. The abundant features will improve the performances of classification. Also, two different memory units, long shortterm memory (LSTM) unit [30] and gated recurrent unit (GRU) [31], are included in the RNN architecture.

2.
The FBCSP features extracted from a complete entity motor imagery trial are cropped by the SWCS with an optimal number. The strategy will increase a large a e deep neural networks.
Related works
Conventional methods
Manual feature extraction methods and shallow classifiers are developed for conventional motor imagery classification. These features are usually extracted from the spatialfrequency features and sequential relationships of EEG signals. Table 1 illustrates the related work regarding feature extraction methods and the corresponding classifiers in the state of the art methods.
From Table 1, we found CSPs algorithm [8, 21] is the key algorithm for extracting spatial features in motor imagery classification. Other researchers improve the CSPs algorithm by a probabilistic model [32] or the genetic algorithm (GA) [33]. Except for spatial features, the frequency features of power spectrum density (PSD) and sequential relationships of adaptive auto regression (AAR) are also used in motor imagery classification [22, 34]. Besides, the “timefrequency” features combine frequency features and sequential relationships for classification [17]. For the classification, conventional classifiers focus on shallow machine learning models. In some cases, the preprocessing algorithm multivariate empirical mode decomposition (MEMD) has been used to improve signalnoise ratio and classification accuracy [25]. The related works used manual features and shallow classifiers for the following reasons: on the one hand, because public datasets have limited EEG samples, they are more suited for classification by LDA/SVM/Naive Bayes classifiers; on the other hand, the EEG signals are regarded as a complete entity, and the entity is classified by spatial, frequency features or sequential relationships. However, if signals belong to timeseries data, sequential relationships over time will provide the discriminant features for motor imagery classification.
Deep learning methods
Statistical, integrated, and deep learning are the common classification methods in machine learning [35, 36]. In particular, deep learning classification methods have been used gradually for EEG signal classification [37–39]. Table 2 illustrates the related works regarding the state of the art of deep learning classifiers.
From Table 2, we conclude that deep learning is widely used in EEG signal classification. Convolution Neural Network (CNN) models [40–44] and Deep Belief Network (DBN) models [32, 45, 46] are most often used in the analysis of EEG signals. Actually, the CNN and DBN models are used to extract the spatial features from EEG signals. These two deep learning models still treat the complete entity trials for classification, so the performance can’t be improved much. However, the deep RNN architecture can extract the sequential relationships from EEG signals [47, 48]. By using a sliding window cropping strategy, the complete entity trials will be cropped into several time slices for classification. Several multiples growth number of samples by cropping for classification will obtain a significant performance improvement of motor imagery classification. Therefore, the discriminant features for motor imagery classification are extracted by using a combination of the FBCSPs algorithm and RNN architecture.
Methods
By considering references [47, 49], our proposed method regards EEG signals as timeseries data, and the extracted spatialfrequency features’ sequential relationships are represented by RNN architecture. Due to the fact that conventional FBCSPs algorithms with shallow classifiers do not contain the sequential relationships, and these algorithms regarded the entity of each trial as a single sample for classification. Therefore, two methods are developed to validate and represent spatialfrequencysequential relationships for classification. First, we test a group of smoothing time windows on the FBCSP features to validate whether the sequential relationships can improve the classification performance of EEG timeseries. Then, a deep RNN architecture is applied to represent spatialfrequencysequential relationships on FBCSP features for classification. It is easy to cause overfitting problems and drop the classification performance if the deep neural networks are presented for classification by the entity of a trial [50]. Therefore, before using the deep RNN architecture, a sliding window cropping strategy is applied to crop the entity of each trial into several time slices. Then, each time slice is fed into the deep RNN architecture for the motor imagery classification. The size of each time slice will be set as the same of the optimal number of hidden layers for RNN to obtain the optimal classification performance. The proposed method is illustrated in Fig. 1.
In Fig. 1, our proposed method is comprised of four progressive stages of signal processing and machine learning on EEG signals: (1) a filter bank comprising multiple Butterworth bandpass filters to extract frequency features, (2) a CSP algorithm is used to extract spatial features, (3) a sliding window cropping strategy is applied to crop time slices to model the sequential relationships of spatialfrequency features, (4) classification of the spatialfrequencysequential relationships on time slices by a deep RNN architecture. In the deep RNN architecture, two different memory units, GRU and LSTM unit, are included to compare classification performance and robustness. The CSP projection matrix for each filter band, the discriminative spatialfrequency features, and the deep RNN architectures are computed and trained from training data labeled with the respective motor imagery action. These parameters computed from the training phase then used to validate each singletrial motor imagery action. By using the same cropping strategy in the validation phase, the classification of singletrial motor imagery action will predict several targets. The final evaluated action will be obtained by averaging all predicted targets.
Spatialfrequency features
The widely used spatialfrequency features extraction algorithm for classification of motor imagery EEG signals was Filter Bank Common Spatial Patterns (FBCSP) algorithm [8, 9]. There are two steps in the FBCSP method: (1)a group of bandpass filters are presented to the raw EEG data to obtain the subjectspecific frequency band. (2)The CSP algorithm is provided to every filter result to extract the optimal spatial features. Then, a classifier is used in all of the FBCSP features for motor imagery classification.
To extract CSP features, let X_{c}∈R^{N∗T} represent one bandpass filtering result, where c is the number of classes, N is the number of potentials of EEG, T is the number of samples in each trial. Each dataset contains L trials of EEG signals and each signal X_{c} is a zero average signal. The purpose of the CSP algorithm is to find an optimal spatial vector, \(\overrightarrow w \in {R^{M \times \mbox{{N}}}}\), to project the original EEG signal to a new space to obtain good spatial resolution and discrimination between different classes of EEG signals. To calculate the optimal projection matrix, let the average covariance matrix of class “c” be \(\overline C_{c}\), and average power of class “c” be \({\overline P_{c}}={\overrightarrow w^{T}}{\overline C_{c}}\overrightarrow w\). For an example of two classes on the minimized projected \(\overrightarrow w\) axis, the maximization of the power ratio is written into the Rayleigh quotient form:
The Rayleigh quotient is then retranslated into a constrained optimization problem, which is then solved by applying the Lagrange multiplier method to the problem. The optimization results include both eigenvectors and eigenvalues. The optimal CSP spatial filter vector, \({\overrightarrow w^{*}} \in {R^{M \times N}}\), is constructed by taking M=2m,M≤N eigenvectors corresponding to the “m” largest and “m” smallest eigenvalues:
where \({\overrightarrow w_{{\lambda _{i}}}}\) is the eigenvector that corresponds to the eigenvalue λ_{i}. Each filter band of EEG signals, X_{c}, is spatially filtered by:
where Z_{c}∈R^{M×T} is the spatialfrequency features. The EEG signals are composed of rapidly changing voltage values; therefore, band power (variance) is used as a feature for the classifier. For multiclass extension to the FBCSP algorithm, the oneversusrest (OVR) strategy is presented to solve the multiclass motor imagery BCI classification.
Spatialfrequencysequential relationships
Conventional algorithms for motor imagery EEG signals fed spatialfrequency features (FBCSP) into classifiers to discriminate different motor imagery targets. In this paper, the FBCSP features are fed into a deep RNN architecture to get spatialfrequencysequential relationships to improve the classification performance of motor imagery. To validate and represent the spatialfrequencysequential relationships, a group of smoothing time windows are put on the FBCSP features to validate the effect of sequential relationships, and a RNN model with sliding window cropping strategy is applied to represent spatialfrequencysequential relationships on EEG timeseries. To improve the classification performance and overcome the overfitting problem, the LSTM unit and GRU are used to construct LSTMRNN architecture or GRURNN architecture for EEG signals classification.
Smoothing time windows on FBCSP features
Since the FBCSP features are extracted from EEG timeseries, such features also contain sequential relationships. Before we represent the sequential relationships by the RNN architecture, a group of smoothing time windows are put on the FBCSP features to smooth the sequential relationships. For the classification by FBCSP features, we will adjust the smoothing time window size, and find the influence of classification performance by the smoothing time windows. If the influence for the performance is large, the sequential relationships on FBCSP features will be validated to influence the classification performance. Therefore, the RNN architectures with LSTM and GRU memories will be applied to extract spatialfrequencysequential relationships on FBCSP features. According to the smoothing process, given smoothing window size, ω, the following smoothing operation is applied to the FBCSP features, Z_{c}:
where \({\overline Z_{c}}(t)\) is the smoothed FBCSP features. In the experiments, we adjust the parameter ω to obtain different smoothing levels of FBCSP features, and get the classification performance by support vector machine (SVM). The classification performance will validate and instruct the sequential relationships for EEG signals classification.
RNN architectures with LSTM and GRU memories
To represent the spatialfrequencysequential relationships, we introduce the RNN architecture in this study [51, 52]. The RNN architecture, containing an input layer, recurrent hidden layers and an output layer, is widely used to represent timeseries [53, 54]. In recurrent hidden layers, a number of simple computation units with weighted interconnections, including delayed feedback [28]. The feedback will give intrinsic states and learn tasks from memory, which is suitable for modeling EEG signals. With the activation functions, the deep RNN architecture is good at learning sequential patterns from EEG signals. Figure 2 illustrates the standard deep RNN architecture. In the figure, the simplified RNN architecture is shown in the box on the left. The box on the right shows the architecture unfolded in a form of timeseries [⋯,t−1, t,t+1,⋯ ]. In the form of timeseries hidden layers, the input of layer “t” contains the output of layer “ t+1”, so do the input of layer “ t−1”. The sequential relationships propagate from the end of the timeseries to the start of the timeseries by neurons, which are connected by horizontal lines in the figure.
Recurrent connections between hidden layers are followed by a feedforward output layer. Hence, the deep RNN architecture is universal approximators of finite states. Therefore, a deep RNN architecture can approximate any finite states with enough recurrent hidden layers and trained weights. Let Z_{c}∈R^{M∗T} represent FBCSP features, where M is the features dimension, T is the number of samples in each trial. The RNN architecture can be defined as:
where x_{t} is the vector of input layer, which is one of the time slices of the FBCSP features Z_{c}∈R^{M∗T}. h_{t} is the vector of hidden layer. y_{t} is the vector of output layer. W, U and b are the recurrent connected weights. σ is the activation functions.
Neural networks are processed by backpropagations (BP) algorithm in common. For the RNN architecture, the sequential relationships propagate all steps back through time, so the feedback of hidden layers will be processed by backpropagation through time (BPTT) algorithm [55]. The training procedure of a deep RNN architecture is performed using a stochastic gradient descent (SGD) algorithm. By using SGD algorithm, we can iteratively update the network’s weight values based on BPTT algorithm. However, the BPTT algorithm is too sensitive to recent distractions; thus, the error flow tends to vanish as long as the weights have absolute low variations, especially at the onset of the training phase. Long shortterm memory (LSTM) unit [30] and Gated recurrent unit (GRU) [31] are proposed to overcome the vanishing gradient problem. The LSTM and GRU architecture is illustrated in Figs. 3 and 4. The introduction of these two architectures are as follow:

1.
LSTM architecture: In a LSTM unit [56], input, output and forget gates are used to retain memory contents; these gates also prevent the irrelevant inputs and outputs from entering the memory. Thus, the unit stores the long term memory features of the timeseries data. A peephole method [57] will be included in the LSTM architecture to transfer memories for all gates.

2.
GRU architecture: A GRU supports each recurrent unit to adaptively obtain dependencies of different time scales. The GRU has “update” and “reset” gates to prevent the error flow of information in the unit. Similarly to the LSTM unit, the gates prevent irrelevant inputs and outputs.
In such “memory units”, because these special units have internal states, multiplicative gates are employed to enforce constant error flow. These two different memory units are used in the deep RNN architecture to classify motor imagery tasks through spatialfrequencysequential relationships. For each hidden layer of the RNN architecture, the original hidden layer will be replaced by LSTM unit or GRU to construct LSTMRNN architecture or GRURNN architecture. Classification results are compared and analyzed to show which memory unit is more suitable for MIBCI.
Sliding window cropping strategy
The conventional trialwise EEG signals classification algorithms treat the entity duration of a trial as a single sample and the corresponding label as a single target. Then, a shallow classifier is used to train and validate motor imagery tasks. The conventional algorithms will lead to less samples and high dimensionality of features, which will cause the overfitting problem and drop the accuracy of classification. In this study, a deep RNN architecture is used for the classification of EEG signals, if the entity duration of a trial is fed into deep RNN architecture, the number of hidden layers will be too large to get longterm patterns for the classification of EEG signals. To avoid the overfitting problem of classification, a sliding window cropping strategy is applied to each trial to crop the entity duration of the trial into several time slices, and the label of the trial will be repeated to all time slices. This strategy will increase the number of training samples for the RNN architecture, which is widely used in the recognition tasks of image, audio and EEG signals by neural networks [58–60].
In our study, let Z_{c}∈R^{M∗T} represents the inputs of RNN, the entity duration of a trial includes T time steps. Assumed τ is the cropping size of the sliding window cropping strategy, the time slices of the trial by cropping can be defined as
The number of training samples will be increased T−τ times, and all time slices will get the label y_{c} as the same label from the original trial. Since the deep RNN architecture has the ability to extract signals’ sequential relationships for classification, we treat the number of hidden layers as the size of time slices. Therefore, we need to confirm the optimal number of hidden layers of the deep RNN architecture for motor imagery EEG signals classification; then, the optimal cropping size will be obtained from the EEG modeling experiment. If the optimal number of hidden layers is confirmed, the cropping size is confirmed. In common, the trial duration used for motor imagery is two seconds, and we obtain 500 samples for a 250 Hz sample rate. If the optimal number of the hidden layers is 20, the original trial will be crop to 480 time slices. The sliding window procedure for cropping a trial into time slices is shown in Fig. 5.
Experiments and results
Experimental datasets setup
The performances of the algorithms were evaluated on the BCI Competition IV [27] “Dataset 2a” and “Dataset 2b”^{Footnote 1}. The two datasets are compared in Table 3. Figure 6 illustrates how the singletrial EEG data were extracted on “Dataset 2a” and “Dataset 2b”. The two datasets share the same procedure. In the motor imagery classification experiments, each subject sat in a soft chair comfortably facing a computer screen. The BCI Competition IV experiments are composed of the following six steps: (1) Each trial started with a warning tone. (2) Simultaneously, a fixation cross was shown on the computer screen for two seconds. (3) After two seconds, a cue, in the form of an arrow, was randomly shown in lieu of the fixation cross, and the subjects started the corresponding motor imagery task of the cue. (4) After another 1.25 s, the cue reverted to the fixation cross. (5) The motor imagery task continued until the sixth second, at which time the fixation cross disappeared. (6) Finally, there was a short 1.5 s break. The signals were sampled at 250 Hz and recorded. The preprocessing operations on the signals for notch filtered and bandpass filtered were 50Hz and 0.1100Hz, respectively.
The BCI Competition IV “Dataset 2a” is composed of the following four classes of motor imagery EEG measurements from nine subjects: (1) left hand, (2) right hand, (3) feet, and (4) tongue. Two sessions, one for training and another for evaluation, were recorded from each subject. “Dataset 2b” is composed of two classes of motor imagery EEG measurements from nine subjects: (1) left hand and (2) right hand. Five sessions, the first three for training and the last two for evaluation, were recorded from each subject. According to the extraction procedure, the time range [4, 6s] was chosen for motor imagery classification because of a strong ERD/ERS phenomenon within that range [12, 44].
The spatialfrequency features are extracted by the FBCSPs algorithm. In the division of the whole band (830Hz, covered μ and β rhythms) to obtain universality for all subjects, the optimal band width range is 4Hz overlaps the next by 2Hz [5, 25]. The optimal division of bandpass filters is shown in Table 4. After the optimal frequency bands filter the raw EEG signals, the CSP algorithm is applied to the filtered EEG signals to obtain spatialfrequency features. In (2) in the CSP algorithm, parameter m for processing “Dataset 2a” and “Dataset 2b” is set to 2 and 1, respectively.
After extraction of spatialfrequency features, two separate experiments to confirm the parameters and validate the performances of spatialfrequencysequential relationships and the classification of motor imagery are as follows:

1.
EEG modeling experiments: First, a size range of [0, 4] smoothing time windows are put on the FBCSP features to obtain the performance of classification. After validate the affections of performances by sequential relationships, two different subexperiments on “Dataset 2a Subject 3” are presented to confirm whether a deep RNN architecture can model EEG signals well by crossentropies and accuracies. Another subexperiment is presented to find the optimal number of hidden layers in the deep RNN architecture.

2.
Classification experiments: For motor imagery classification, the spatialfrequency FBCSP features are fed into the deep RNN architecture to obtain spatialfrequencysequential relationships. The spatialfrequency features are cropped by a sliding window sized by the optimal number of hidden layers. In the classification by LSTMRNN architecture and GRURNN architecture, the accuracies, errors and efficiency of classification will be compared between spatialfrequency features and spatialfrequencysequential relationships.
EEG modeling experiments and results
To obtain the performance of classification influenced by the sequential relationships, a group of smoothing windows with the size range [0,4] is presented to FBCSP features. In our experiments, via smoothed FBCSP features, the SVM classifier with RBF kernel is used for motor imagery classification. Figure 7 illustrates the smoothing time window experimental results for “Dataset 2a” and “Dataset 2b”. Among the results, “SW=0” expresses the FBCSP features without smoothing. From the results, we find that the performance of EEG signals classification was fully influenced by the smoothing time windows. Thus, the RNN architecture is introduced in this study to extract spatialfrequencysequential relationships from FBCSP features for classification. However, we must validate the presentation of spatialfrequencysequential relationships by a RNN architecture at first.
There are three steps to validate the presentation of spatialfrequencysequential relationships by RNN architecture. First, to validate whether the deep RNN architecture can model EEG signals or not, we train a deep RNN architecture by 200 iterations of SGD algorithm over 22 channels of the first three seconds of EEG signals from “Dataset 2a Subject 3”. To test the modeling ability, the previous outputs are fed back into model’s inputs to predict the current EEG signals. The results on channel “ C3” by 20, 30, 90 hidden layers are drawn in Fig. 8. From the results, we find the deep RNN architecture will predict the same level of signals as the number of hidden layers increased. The predictions by 20 hidden layers matched the EEG signals after a few samples, and the predictions by 30 hidden layers matched almost half of the rest samples. The predictions by 90 hidden layers matched the entity of rest samples for both LSTMRNN architecture and GRURNN architecture. A highest number of hidden layers will get rich sequential relationships which have a similar spectrum to the EEG signals.
Second, we evaluate the classification performances of the deep RNN architecture by 200 iterations of SGD algorithm over the training data of “Dataset 2a Subject 3”. The loss function for EEG signals by RNN architecture is the logarithmic crossentropy, which is defined as [61]:
where c_{t} is the ground truth result, and \(b_{c}^{t}\) is the prediction result by deep classifier. The number of iterations for optimizing the loss function is an experience value of controlling training epochs by limiting the number of hidden layers. Figure 9 gives the training and validation crossentropies as the number of hidden layers increased. From the results, we find the training and validation crossentropies have separations over 20 hidden layers. The crossentropies will not reduce if the signals are overfitted by the RNN architecture. In fact, the crossentropies will not increase, so the deep RNN architecture continues to learn components of the signals that are common to all of the EEG sequences. Compared with LSTMRNN architecture and GRURNN architecture, the LSTMRNN architecture needs more hidden layers to achieve a same level of crossentropy during the classification of EEG signals.
Third, since a large number of hidden layers requires much computational complexity, and causes the overfitting problem to achieve low validation accuracies, Fig. 10 gives the training and validation accuracies as the number of hidden layers increased. From the results, we find the validation accuracies appear peaks with a 20–15 hidden layers. Compared with LSTMRNN architecture and GRURNN architecture, the LSTMRNN architecture needs more hidden layers to achieve a same level of accuracy during the classification of EEG signals. When the deep RNN architecture is overfitting, the accuracy of GRURNN has a sharp drop than LSTMRNN.
Classification experiments and results
Let \({\overline Z_{c}} \in {R^{M * T}}\) represents the spatialfrequency features, where M is the feature dimension, T is the number of samples in each trial. After EEG modeling experiments, the optimal number of hidden layers for LSTMRNN and GRURNN of all subjects are confirmed. Then, the optimal number τ is used for cropping training set and validation set by sliding window cropping strategy. Hence, the samples of each trial in training set and validation set will be increased T−τ times to satisfy the deep RNN architecture. After training procedure, the validation procedure will produce T−τ classification targets in each trial. Finally, the unique target of the trial will be calculated by averaging all targets of time slices.
To confirm the parameters and weights of RNN architecture, the characteristics of nonlinearity and nonstationarity in EEG signals will be considered, since the characteristics will limit the reliability of the conventional activation function in the deep learning architecture. Therefore, there are three different activation strategies, “tanh”, “sigmoid” and “ReLu”, for constructing activation functions [62]. The activation function, “tanh”, is applied to cell input activation function of both the LSTM unit and GRU. The activation function, “sigmoid”, is applied to the cell output activation function of the LSTM units. To prevent the vanishing error flow, the “ReLu” activation function is applied to the gates activation function of both the LSTM unit and GRU. The weights of RNN are initialized by a Gaussian distribution N∼(0,0.2). The BPTT algorithm is used to train RNN by minimizing crossentropy (see (8)) loss function. Also, because the Adam strategy [63] is suitable for timeseries in deep classifiers and its momentum improves the robustness of error flow, the strategy is applied to compute the learning rate during BPTT. Finally, a “Dropout” strategy is applied to prevent the overfitting problem [64]. The key idea of the “Dropout” strategy is to randomly eliminate units (along with their connections) from the neural network during training. By experience, the dropout rate is set at 0.2, and the maximum number of iterations is set at 200.
For both “Dataset 2a” and “Dataset 2b”, we train two different RNN architectures, each of which includes LSTM unit and GRU. Figure 11 illustrates the learning curves for different memory units in different datasets. In the case of both datasets, GRURNN architecture converges faster than LSTMRNN architecture. To reach lowest loss, GRURNN architecture acquire less number of iterations than LSTMRNN architecture. For some specific subjects, GRURNN architecture obtains lower average crossentropy loss than LSTMRNN architecture within 200 iterations. Overall, the subjects’ EEG signals from “Dataset 2a” and “Dataset 2b” represent similar average crossentropy between LSTMRNN architecture and GRURNN architecture.
Table 5 gives the average training and validation time complexity per trial comparison between spatialfrequencysequential relationships and spatialfrequency features. To compare time complexity, “Dataset 2a Subject 7” and “Dataset 2b Subject 8” are used to detect the average training and validation time complexity per trial. In Table 5, since the deep neural networks architectures (RNN and CNN) need more number of iterations for convergence in training phase, the average training time complexity of deep architectures is significantly higher than the conventional SVM model. However, in the validation phase, the deep architectures achieve a same level of time complexity than the conventional SVM model. Hence, the RNN architecture will cost appropriate time consumptions in the applications of MIBCIs. Besides, compared with two different memory units of RNN architecture, GRURNN architecture outperform LSTMRNN architecture in time complexity of EEG signals’ training and validation.
All classification experimental results of all subjects in “Dataset 2a” and “Dataset 2b” are listed in Tables 6 and 7, respectively. Spatialfrequencysequential relationships extracted from LSTMRNN architecture and GRURNN architecture; spatialfrequency features extracted from CNN, SVM with linear, polynomial and RBF kernels are used to classify motor imagery for comparison. The results are presented by error rate forms, and a paired ttest statistical technique is used to detect whether the spatialfrequencysequential relationships significantly outperform than spatialfrequency features in the classification of MIBCIs. For each subject, we confirm the optimal number of hidden layers (ONHL) for LSTMRNN architecture and GRURNN architecture.
From the results in Tables 6 and 7, for both datasets, spatialfrequencysequential relationships outperform the spatialfrequency features in the classification of MIBCIs. Among them, the average error rate of 27.42% and 26.44% is achieved by LSTMRNN and GRURNN in the case of “Dataset 2a”, respectively. The results of paired ttest show GRURNN (b) achieves significantly lower error rate than SVMPolynomial (e) (p <0.05) and SVMLinear (f) (p <0.05). In the case of “Dataset 2b”, an averaged error rate of 18.48% and 17.25% is achieved by LSTMRNN and GRURNN, respectively. The results of paired ttest show GRURNN (b) achieves significantly lower error rate than SVMRBF (d) (p <0.05) and SVMLinear (f) (p <0.05). To compare the classification performances of GRURNN and LSTMRNN, Fig. 12 gives the classification accuracies for all subjects with all algorithms.
From the results in Fig. 12, we find CNN and SVM (Linear) outperformed RNN in some subjects with highlevel (over 60%) accuracies (S3, S7, S8, S9 in “Dataset 2a” and S4, S6, S8, S9 in “Dataset 2b”). However, in lowlevel (below 60%) accuracies of subjects, RNN outperformed CNN and SVM (Linear) (S2, S4, S6 in “Dataset 2a” and S2, S3 in “Dataset 2b”). In the averagelevel accuracies, RNN architecture outperformed CNN and SVM (Linear). Besides, a comparison of the averaged accuracies for LSTM unit and GRU in both datasets shows that GRURNN architecture outperformed LSTMRNN.
Discussion
Discussion for sequential relationships
Four different subexperiments have been created to analyze the application of sequential relationships. From smoothing time window experimental results shown in Fig. 7, a small size of smoothing time window leaded to an improvement of classification accuracy for “Dataset 2a” and a decline of classification variance for “Dataset 2b”. However, a large size of smoothing time window leaded performance to decline for both “Dataset 2a” and “Dataset 2b”. The smoothing time window with different sliding sizes can change the sequential relationships; in addition, the experimental results demonstrated that the relationships significantly changed the classification performance; since the classification results can be changed if different sizes of smoothing time windows were applied to the sequential relationships. The finding gives us a novel enlightenment to smooth the extracted features to improve the classification performance and robustness [8, 12]. In addition, the sequential learning in the NLP also suggested to consider the sequential relationships as the key features for solving the natural language processing (NLP) problems [65, 66]. Therefore, due to the EEG signals contained the similar characteristics as the sentence structure in the NLP, the finding can assist us to use the sequential relationships to model the EEG signals.
In the experiments of representing EEG signals’ sequential relationships by RNN architectures, we found more hidden layers number of RNN architecture represented the EEG signals well, but more hidden layers number also caused memory vanishing problem(see Figs. 8, 9 and 10). To overcome the memory vanishing problem in the conventional RNN architecture, the LSTMRNN architecture and GRURNN architecture have been introduced for the classification. We have validated the training and validation results in Figs. 9 and 10, the results of accuracies were paradoxical with the results of crossentropies, and the classification of EEG signals will quickly overfitting. The reason is that the deep RNN architecture continues to learn common components of the EEG sequences, while simultaneously learning signal noise and nondiscriminative components [47]. Hence, here we must use propriate numbers of hidden layers to retain the classification performance. From the results in Fig. 10, the number of the hidden layers of the LSTMRNN was about 30, and the number of hidden layers of GRURNN was about 35 [67, 68] (see Fig. 10). Therefore, the constraint of LSTMRNN/GRURNN architecture leads us to crop the trial of EEG signals to time slices to feed into the classification architectures. Therefore, the SWCS is introduced on the timeseries to crop the entity of a trial into several time slices. The time slices keep the same length of the number of hidden layers in order to well trained the LSTMRNN architecture and GRURNN architecture.
Discussion for EEG classification
In the EEG classification experiment, he results of RNN architectures outperformed the stateoftheart methods (see Tables 6 and 7, and Fig. 12). There are two reasons for the results. The first reason is that EEG signals usually have easy distinguishing parts and difficult distinguishing parts. “Easy parts” for classification are well represented by the spatialfrequency features, since these features are statistical features. However, the “difficult parts” are nonlinear and nonstationary; therefore, the statistical features cannot well model these parts [69, 70]. Since the RNN architectures have enough neurons to fit the sequences’ nonlinear and nonstationary characteristics, the introduced spatialfrequencysequential relationships retain the classification performance of “easy parts”, while improve the classification performance of “difficult parts”. The second reason is that the conventional spatialfrequency features regarded EEG signals as a complete entity, many factors can influence classification; in particular, the subjectspecific diversity might be significant one. Instead of conventional features, spatialfrequencysequential relationships consider the EEG signals as timeseries to take spatial, frequency and temporal features of EEG data into consideration. This idea not only reduces the factors that influence classification, but also increases the corresponding robustness of the subjects by reducing the subjectspecific diversity. Hence, the classification accuracies of MIBCI by spatialfrequencysequential relationships are significantly higher than those using the spatialfrequency features.
To solve the limitation of hidden layers number of RNN architecture, the SWCS is introduced on the timeseries to crop the entity of a trial into several time slices. In this way, the number of samples for deep learning models is widely increased; therefore, enough samples are required to satisfy the generalization and performance of classification. In common, trials of EEG signals were obtained from complicated devices and the number of samples was too less to train the deep learning models. The application of SWCS solved the problem of sample number, and can also fit the RNN architecture. In addition, the LSTM and GRU are two different memory units for the RNN architecture. In order to test which memory unit is suited for processing EEG signals, the two memory units were applied to the RNN architectures. Our experimental results showed that GRURNN architecture was suited for the EEG signals better, and the spatialfrequencysequential relationships with GRU memory units outperformed both shallow learning and deep learning models (see Table 5).
Advantages of spatialfrequencysequential relationships
Since the EEG signals are nonlinear and nonstationary signals, the spatial and frequency features are the robust statistical features, which can be well classified by SVM using FBCSP features. We considered such features as the “easy parts” for classification; in contrast, we considered the temporal features as the “difficult parts” that are difficult to be classified by the conventional machine learning models. Hence, we introduced the spatialfrequencysequential relationships by using RNN architecture on FBCSP features. Experimental results showed that the approach involving spatialfrequencysequential relationships can have the better classification performance of the “difficult parts”; also, the results outperformed thestateoftheart methods. Besides classifying the “difficult parts” of EEG signals, another reason for introducing the RNN architecture was that: the limitation number of hidden layers leaded us to crop the entity of a trial into several time slices. Although the different time slices shared the same labels, the slices increased the diversity of EEG signals during the classification. Therefore, the cropping strategy also improved the classification performance. To sum up, the advantages of introducing spatialfrequencysequential relationships can improve classification performance and increase EEG samples diversity.
Conclusion
In this paper, an FBCSPs algorithm was used to extract spatialfrequency features, which were cropped by a sliding window cropping strategy into time slices. Then, the time slices were fed into deep RNN architectures, with two different memory units, to extract spatialfrequencysequential relationships for MIBCI classification. The extracted relationships included spatial, frequency and temporal characteristics. The experiments on MIBCI demonstrated that the proposed method owned two advantages: (1) The spatialfrequencysequential relationships extracted by FBCSPs and RNN architectures can achieve significantly higher performance than spatialfrequency features. Meanwhile, the relationships had the same level of time complexity with the conventional algorithms. (2) A comparison of the accuracy and efficiency of motor imagery classification between GRUs and LSTM units revealed that GRUs can generate the better results.
Our future work will focus on collecting more EEG data to construct deeper RNNs architecture, exploring the error flow rules in BPTT, and constructing a deeper RNNs architecture that is adapted and generalized for EEG signals.
Abbreviations
 AAR:

Adaptive auto regression
 BPTT:

Backpropagation through time
 CNN:

Convolution neural network
 DBN:

Deep belief network
 EEG:

Electroencephalography
 ERD/ERS:

Eventrelated desynchronizing/eventrelated synchronizing
 FBCSP:

Filter bank common spatial pattern
 GRU:

Gated recurrent unit
 LDA:

Linear discriminant analysis
 LSTM Longshort term memory; MEMD:

Multivariate empirical mode decomposition
 MIBCIs:

Motor imagery brain computer interfaces
 NN:

Neural network
 OVR:

Oneversusrest
 PSD:

Power spectrum density
 RNN:

Recurrent neural network
 SVM:

Support vector machine
 SWC:

Sliding window cropping
 SWCS:

Sliding window cropping strategy
References
 1
Lotte F, Congedo M, Lécuyer A, Lamarche F, Arnaldi B. A review of classification algorithms for EEGbased brain–computer interfaces. J Neural Eng. 2007; 4(2):1.
 2
Wolpaw JR, Birbaumer N, Heetderks WJ, McFarland DJ, Peckham PH, Schalk G, Donchin E, Quatrano LA, Robinson CJ, Vaughan TM, et al. Braincomputer interface technology: a review of the first international meeting. IEEE Trans on Rehabil Eng. 2000; 8(2):164–73.
 3
Wang Y, Gao S, Gao X. Common spatial pattern method for channel selelction in motor imagery based braincomputer interface. In: 27th Annual International Conference of the Engineering in Medicine and Biology Society (EMBS). New York: IEEE: 2006. p. 5392–5.
 4
Devlaminck D, Wyns B, GrosseWentrup M, Otte G, Santens P. Multisubject learning for common spatial patterns in motorimagery BCI. Comput Intell and Neurosci. 2011; 2011:8.
 5
Thomas KP, Guan C, Lau CT, Vinod AP, Ang KK. A new discriminative common spatial pattern method for motor imagery brain–computer interfaces. IEEE Trans on Biomed Eng. 2009; 56(11):2730–3.
 6
Brunner C, Naeem M, Leeb R, Graimann B, Pfurtscheller G. Spatial filtering and selection of optimized components in four class motor imagery EEG data using independent components analysis. Pattern Recogn Lett. 2007; 28(8):957–64.
 7
Qin L, Ding L, He B. Motor imagery classification by means of source analysis for brain–computer interface applications. J Neural Eng. 2004; 1(3):135.
 8
Ang KK, Chin ZY, Wang C, Guan C, Zhang H. Filter bank common spatial pattern algorithm on BCI competition IV datasets 2a and 2b. Front Neurosci. 2012; 6:39.
 9
Ang KK, Chin ZY, Zhang H, Guan C. Filter bank common spatial pattern (FBCSP) in braincomputer interface. In: IEEE International Joint Conference on Neural Networks. Hong Kong: IEEE: 2008. p. 2390–7.
 10
Lemm S, Blankertz B, Curio G, Muller KR. Spatiospectral filters for improving the classification of single trial EEG. IEEE Trans on Biomed Eng. 2005; 52(9):1541–8.
 11
Chin ZY, Ang KK, Wang C, Guan C, Zhang H. Multiclass filter bank common spatial pattern for fourclass motor imagery BCI. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEEEMBC. Minneapolis: IEEE: 2009. p. 571–4.
 12
Kumar S, Sharma A, Tsunoda T. An improved discriminative filter bank selection approach for motor imagery EEG signal classification using mutual information. BMC Bioinforma. 2017; 18(16):545.
 13
Sitaram R, Zhang H, Guan C, Thulasidas M, Hoshi Y, Ishikawa A, Shimizu K, Birbaumer N. Temporal classification of multichannel nearinfrared spectroscopy signals of motor imagery for developing a brain–computer interface. NeuroImage. 2007; 34(4):1416–27.
 14
Kayikcioglu T, Aydemir O. A polynomial fitting and knn based approach for improving classification of motor imagery BCI data. Pattern Recogn Lett. 2010; 31(11):1207–15.
 15
Schlögl A, Lee F, Bischof H, Pfurtscheller G. Characterization of fourclass motor imagery EEG data for the BCIcompetition 2005. J Neural Eng. 2005; 2(4):14.
 16
Pfurtscheller G, Neuper C, Schlogl A, Lugger K. Separability of EEG signals recorded during right and left motor imagery using adaptive autoregressive parameters. IEEE Trans on Rehabil Eng. 1998; 6(3):316–25.
 17
Wang T, Deng J, He B. Classifying EEGbased motor imagery tasks by means of time–frequency synthesized spatial patterns. Clin Neurophysiol. 2004; 115(12):2744–53.
 18
Pfurtscheller G, Brunner C, Schlögl A, Da Silva FL. Mu rhythm (de) synchronization and EEG singletrial classification of different motor imagery tasks. Neuroimage. 2006; 31(1):153–9.
 19
Neuper C, Wörtz M, Pfurtscheller G. ERD/ERS patterns reflecting sensorimotor activation and deactivation. Prog Brain Res. 2006; 159:211–22.
 20
Blankertz B, Tomioka R, Lemm S, Kawanabe M, Muller KR. Optimizing spatial filters for robust EEG singletrial analysis. IEEE Signal Process Mag. 2008; 25(1):41–56.
 21
Blankertz B, Dornhege G, Krauledat M, Müller KR, Curio G. The noninvasive berlin brain–computer interface: fast acquisition of effective performance in untrained subjects. NeuroImage. 2007; 37(2):539–50.
 22
Herman P, Prasad G, McGinnity TM, Coyle D. Comparative analysis of spectral approaches to feature extraction for EEGbased motor imagery classification. IEEE Trans on Neural Syst and Rehabil Eng. 2008; 16(4):317–26.
 23
Wu W, Chen Z, Gao X, Li Y, Brown EN, Gao S. Probabilistic common spatial patterns for multichannel EEG analysis. IEEE Trans on Pattern Anal and Mach Intell. 2015; 37(3):639–53.
 24
Yuksel A, Olmez T. A neural networkbased optimal spatial filter design method for motor imagery classification. PLoS ONE. 2015; 10(5):0125039.
 25
Park C, Looney D, ur Rehman N, Ahrabian A, Mandic DP. Classification of motor imagery BCI using multivariate empirical mode decomposition. IEEE Trans on Neural Syst and Rehabil Eng. 2013; 21(1):10–22.
 26
Blankertz B, Muller KR, Krusienski DJ, Schalk G, Wolpaw JR, Schlogl A, Pfurtscheller G, Millan JR, Schroder M, Birbaumer N. The BCI competition iii: Validating alternative approaches to actual BCI problems. IEEE Trans on Neural Syst and Rehabil Eng. 2006; 14(2):153–9.
 27
Tangermann M, Müller KR, Aertsen A, Birbaumer N, Braun C, Brunner C, Leeb R, Mehring C, Miller KJ, MuellerPutz G, et al. Review of the BCI competition IV. Front Neurosci. 2012; 6:55.
 28
Mikolov T, Karafiát M, Burget L, Černockỳ J, Khudanpur S. Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association. Makuhari: ISCA: 2010.
 29
Williams RJ, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989; 1(2):270–80.
 30
Hochreiter S, Schmidhuber J. Long shortterm memory. Neural Comput. 1997; 9(8):1735–80.
 31
Chung J, Gulcehre C, Cho K, Bengio Y. Gated feedback recurrent neural networks. In: International Conference on Machine Learning. Lille: JMLR.org: 2015. p. 2067–75.
 32
Zheng WL, Zhu JY, Peng Y, Lu BL. EEGbased emotion classification using deep belief networks. In: IEEE International Conference on Multimedia and Expo (ICME). Chengdu: IEEE: 2014. p. 1–6.
 33
Kumar S, Sharma A. A new parameter tuning approach for enhanced motor imagery eeg signal classification. Med Biol Eng Comput. 2018; 2018:1–14.
 34
Pfurtscheller G, Neuper C. Motor imagery and direct braincomputer communication. Proc IEEE. 2001; 89(7):1123–34.
 35
Palaniappan R, Sundaraj K, Sundaraj S. A comparative study of the svm and knn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals. BMC Bioinforma. 2014; 15(1):223.
 36
Li L, Yu S, Xiao W, Li Y, Huang L, Zheng X, Zhou S, Yang H. Sequencebased identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel svm. BMC Bioinforma. 2014; 15(1):340.
 37
Yu N, Yu Z, Pan Y. A deep learning method for lincrna detection using autoencoder algorithm. BMC Bioinforma. 2017; 18(15):511.
 38
Li H, Hou J, Adhikari B, Lyu Q, Cheng J. Deep learning methods for protein torsion angle prediction. BMC Bioinforma. 2017; 18(1):417.
 39
Maxwell A, Li R, Yang B, Weng H, Ou A, Hong H, Zhou Z, Gong P, Zhang C. Deep learning architectures for multilabel classification of intelligent health risk prediction. BMC Bioinforma. 2017; 18(14):523.
 40
Cecotti H, Graeser A. Convolutional neural network with embedded fourier transform for EEG classification. In: 19th International Conference on Pattern Recognition (ICPR). Tampa: IEEE: 2008. p. 1–4.
 41
Cecotti H, Graser A. Convolutional neural networks for P300 detection with application to braincomputer interfaces. IEEE Trans Pattern Anal Mach Intell. 2011; 33(3):433–45.
 42
Ren Y, Wu Y. Convolutional deep belief networks for feature extraction of EEG signal. In: International Joint Conference on Neural Networks (IJCNN). Beijing: IEEE: 2014. p. 2850–3.
 43
Yang H, Sakhavi S, Ang KK, Guan C. On the use of convolutional neural networks and augmented CSP features for multiclass motor imagery of EEG signals classification. In: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Milan: IEEE: 2015. p. 2620–3.
 44
Kumar S, Sharma A, Mamun K, Tsunoda T. A deep learning approach for motor imagery eeg signal classification. In: Computer Science and Engineering (APWC on CSE), 2016 3rd AsiaPacific World Congress On. Fiji: IEEE: 2016. p. 34–9.
 45
Hajinoroozi M, Jung TP, Lin CT, Huang Y. Feature extraction with deep belief networks for driver’s cognitive states prediction from EEG data. In: IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP). Chengdu: IEEE: 2015. p. 812–5.
 46
Wulsin D, Gupta J, Mani R, Blanco J, Litt B. Modeling electroencephalography waveforms with semisupervised deep belief nets: fast classification and anomaly measurement. J Neural Eng. 2011; 8(3):036015.
 47
Forney EM, Anderson CW. Classification of EEG during imagined mental tasks by forecasting with elman recurrent neural networks. In: International Joint Conference on Neural Networks (IJCNN). San Jose: IEEE: 2011. p. 2749–55.
 48
Soleymani M, AsghariEsfeden S, Fu Y, Pantic M. Analysis of EEG signals and facial expressions for continuous emotion detection. IEEE Trans Affect Comput. 2016; 7(1):17–28.
 49
Chandra R, Frean M, Zhang M. Adapting modularity during learning in cooperative coevolutionary recurrent neural networks. Soft Comput. 2012; 16(6):1009–20.
 50
Bentlemsan M, Zemouri ET, Bouchaffra D, YahyaZoubir B, Ferroudji K. Random forest and filter bank common spatial patterns for EEGbased motor imagery classification. In: 5th International Conference on Intelligent Systems, Modelling and Simulation (ISMS). Langkawi: IEEE: 2014. p. 235–8.
 51
Heffernan R, Yang Y, Paliwal K, Zhou Y. Capturing nonlocal interactions by long shortterm memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinforma. 2017; 33(18):2842–9.
 52
Mikolov T, Kombrink S, Burget L, Černocký J, Khudanpur S. Extensions of recurrent neural network language model. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Prague: IEEE: 2011. p. 5528–31.
 53
Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y. Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep. 2015; 5:11476.
 54
Heffernan R, Dehzangi A, Lyons J, Paliwal K, Sharma A, Wang J, Sattar A, Zhou Y, Yang Y. Highly accurate sequencebased prediction of halfsphere exposures of amino acid residues in proteins. Bioinformatics. 2015; 32(6):843–9.
 55
Mazumdar J, Harley RG. Recurrent Neural Networks Trained With Backpropagation Through Time Algorithm to Estimate Nonlinear Load Harmonic Currents. IEEE Trans Ind Electron. 2008; 55(9):3484–91.
 56
Hanson J, Yang Y, Paliwal K, Zhou Y. Improving protein disorder prediction by deep bidirectional long shortterm memory recurrent neural networks. Bioinformatics. 2016; 33(5):685–92.
 57
Gers FA, Schraudolph NN, Schmidhuber J. Learning precise timing with lstm recurrent networks. J Mach Learn Res. 2002; 3:115–43.
 58
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 2016. p. 770–8.
 59
Sercu T, Puhrsch C, Kingsbury B, LeCun Y. Very deep multilingual convolutional neural networks for LVCSR. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Shanghai: IEEE: 2016. p. 4955–9.
 60
Schirrmeister RT, Springenberg JT, Fiederer LDJ, Glasstetter M, Eggensperger K, Tangermann M, Hutter F, Burgard W, Ball T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum Brain Mapp. 2017; 38(11):5391–420.
 61
Chan JCC, Eisenstat E. Marginal likelihood estimation with the CrossEntropy method. Econ Rev. 2015; 34(3):256–85.
 62
Dahl GE, Sainath TN, Hinton GE. Improving deep neural networks for LVCSR using rectified linear units and dropout. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver: IEEE: 2013. p. 8609–13.
 63
Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011; 12(Jul):2121–59.
 64
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014; 15(1):1929–58.
 65
Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems. Montreal: NIPS: 2014. p. 3104–12.
 66
Mikolov T, Yih Wt, Zweig G. Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta: ACL: 2013. p. 746–51.
 67
Gal Y, Ghahramani Z. A theoretically grounded application of dropout in recurrent neural networks. In: Advances in Neural Information Processing Systems. Barcelona: NIPS: 2016. p. 1019–27.
 68
Ravuri S, Stolcke A. A comparative study of recurrent neural network models for lexical domain classification. In: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference On. Shanghai: IEEE: 2016. p. 6075–9.
 69
Riaz F, Hassan A, Rehman S, Niazi IK, Dremstrup K. Emdbased temporal and spectral features for the classification of eeg signals using supervised learning. IEEE Trans Neural Syst Rehabil Eng. 2016; 24(1):28–35.
 70
Jiao Z, Gao X, Wang Y, Li J, Xu H. Deep Convolutional Neural Networks for mental load classification based on EEG data. Pattern Recogn. 2018; 76:582–95.
Acknowledgements
The authors want to thank the members of the brainlike robotic research group of Xiamen University for their proofreading comments. The authors are very grateful to the anonymous reviewers for their constructive comments which have helped significantly in revising this work.
Funding
This work was supported by the National Natural Science Foundation of China (No.61673322, 61673326, and 91746103), the Fundamental Research Funds for the Central Universities (No. 20720160126), Natural Science Foundation of Fujian Province of China (No. 2017J01128 and 2017J01129), and the European Union’s Horizon 2020 research and innovation programme under the Marie SklodowskaCurie grant agreement No. 663830. The funding body played the roles in supporting the experiments.
Availability of data and materials
The evaluation data is on the BCI Competition IV [27] “Dataset 2a” and “Dataset 2b” http://www.bbci.de/competition/iv/#datasets.
Author information
Affiliations
Contributions
FC and TL designed and implemented the methods. TL was responsible for the experiments. FC and TL analyzed the experimental results. CZ, FC and TL wrote and edited the manuscript. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Fei Chao.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Luo, T., Zhou, C. & Chao, F. Exploring spatialfrequencysequential relationships for motor imagery classification with recurrent neural network. BMC Bioinformatics 19, 344 (2018). https://doi.org/10.1186/s1285901823651
Received:
Accepted:
Published:
Keywords
 EEG signals classification
 Spatialfrequencysequential relationships
 Deep recurrent neural networks
 Brain computer interface