 Research article
 Open Access
Early classification of multivariate temporal observations by extraction of interpretable shapelets
 Mohamed F Ghalwash^{1, 2} and
 Zoran Obradovic^{1}Email author
https://doi.org/10.1186/1471210513195
© Ghalwash and Obradovic; licensee BioMed Central Ltd. 2012
 Received: 26 March 2012
 Accepted: 23 July 2012
 Published: 8 August 2012
Abstract
Background
Early classification of time series is beneficial for biomedical informatics problems such including, but not limited to, disease change detection. Early classification can be of tremendous help by identifying the onset of a disease before it has time to fully take hold. In addition, extracting patterns from the original time series helps domain experts to gain insights into the classification results. This problem has been studied recently using time series segments called shapelets. In this paper, we present a method, which we call Multivariate Shapelets Detection (MSD), that allows for early and patientspecific classification of multivariate time series. The method extracts time series patterns, called multivariate shapelets, from all dimensions of the time series that distinctly manifest the target class locally. The time series were classified by searching for the earliest closest patterns.
Results
The proposed early classification method for multivariate time series has been evaluated on eight gene expression datasets from viral infection and drug response studies in humans. In our experiments, the MSD method outperformed the baseline methods, achieving highly accurate classification by using as little as 40%64% of the time series. The obtained results provide evidence that using conventional classification methods on short time series is not as accurate as using the proposed methods specialized for early classification.
Conclusion
For the early classification task, we proposed a method called Multivariate Shapelets Detection (MSD), which extracts patterns from all dimensions of the time series. We showed that the MSD method can classify the time series early by using as little as 40%64% of the time series’ length.
Keywords
 Time Series
 Information Gain
 Utility Score
 Dynamic Time Warping
 Distance Threshold
Background
In medical informatics, the patient’s clinical data records, such as heart rate, are collected over time and therefore represent a time series. If the data is collected from two groups of patients (for example, symptomatic and asymptomatic with respect to heart failure), the task of multivariate time series (MTS) classification is to learn temporal patterns to determine whether the patient belongs to the group of symptomatic patients.
Time series have been extensively analyzed in various fields, such as statistics, signal processing, and control theory. The focus of the research in these fields is on gaining a better understanding of the datagenerating mechanism, the prediction of future values, or the optimal control of a system. From a statistical viewpoint, time series analysis is comprised of methods for analyzing time series data in order to extract meaningful statistics from the data. As a part of time series analysis, time series forecasting is aimed to use a model, e.g. AutoRegressive Moving Average (ARMA), to predict future values based on previously observed values [1]. The ultimate objective of the signal processing community is the characterization of the time series in such a manner as to allow for transformation of the time series, with a method like Fast Fourier Transformation (FFT), to extract useful information from the time series [2]. Researchers and practitioners in Control Theory strive to calculate solutions for proper corrective action from the controller (inputs) that result in system stability. A set of past inputs and outputs is observed, and new inputs are set in such a way as to try to achieve a desired output [3].
Although all of the aforementioned methods could be helpful in our study, and the experience of researchers and practitioners from other fields are extremely valuable, the focus of our research is to classify a new time series as early as possible by looking at and extracting patterns from past observations rather than predicting future values or analyzing a single time series’ pattern.
In the data mining community, the time series classification problem has been studied in some detail as well. The predictive patterns framework has been introduced to directly mine a compact set of highly predictive patterns [4]. Instead of adopting a twophase approach by generating all frequent patterns in the first phase and selecting the discriminative patterns in the second phase, this approach integrates pattern mining and feature pruning into the same phase to filter out noninformative and redundant patterns while they are being generated. A temporal rulebased classification method for temporal pattern representation was recently proposed to address the deficiencies of existing methods [5].
A method that extracts all metafeatures from a multivariate time series was proposed by Kadous et al. [6]. The types of metafeatures are defined by the user, but are extracted automatically and are used to construct propositional attributes (attributevalue features) for another highlevel classifier, like a decision tree, that learns a nonlinear hypothesis to distinguish among classes.
In the context of classification of unknown time series (time series with an unknown label), models utilize the whole time series with the unknown label to predict it based on the information learned from training data. In an early classification context, the objective is to provide patientspecific classification of unknown time series as early as possible. Therefore, instead of utilizing the whole time series, our MSD method looks into a portion (current stream) of the unknown time series and determines whether it is able to predict the label of the whole time series without looking at the rest of the time series. If MSD is able to predict at the time point which is at the end of the current stream, the label is predicted. Otherwise, MSD requires more data for the unknown time series and looks at a larger segment, and does so until it is able to predict the label of the time series.
For early classification, a new method called Early Classification on Time Series (ECTS) has been proposed [7]. The idea behind the method is to explore the stability of the nearest neighbor relationship in the full space and in the subspaces formed by prefixes of the training examples. The disadvantage of ECTS is that it only provides classification results, without extracting and summarizing patterns from training data; thus, users may not be able to gain deep insights from the classification results. This drawback of ECTS has been resolved by extracting local shapelets which distinctly manifest the target class locally, and are effective for early classification [8]. However, the method is applicable only to onedimensional time series.
In this study, we generalize the definition of local shapelets to a multivariate context and accordingly propose a method for early classification of multivariate time series. The proposed method is called Multivariate Shapelets Detection (MSD). A multivariate shapelet consists of multiple segments, where each segment is extracted from exactly one dimension. The test time series is then classified based on the multivariate shapelets that best match the test time series.
In particular, we propose the following extensions to the existing univariate shapelet method:

Extending the concept of univariate shapelets to multivariate shapelets, which are multidimensional subsequences with a distance threshold along each dimension.

Proposing use of information gainbased distance threshold.

Proposing use of weighted informationgain based utility score of a shapelet. A theorem is provided to show that the weighted information gain incorporates the earliness and assigns high utility score to the shapelet that appears earlier given the same accuracy performance.
The mathematical definition of the problem is presented in the Definitions section. The method for multivariate time series classification is described in the Methods section. Datasets are described in the Dataset and data processing section. In the Results and discussion section, the experimental results are presented. Finally, future work and concluding remarks are discussed in the Conclusion section.
Definitions
A shapelet is defined as f = (s,l,δ,c_{ f }) where s is a time series subsequence of length l. The class label c_{ f } of the shapelet is called the target class. The other classes are called the nontarget classes, and are referred to as ${\overline{c}}_{f}$. We call a time series T_{ i } a target time series if the class of the time series is c_{ f }. The distance threshold δ is computed as follows:

The distance d_{ i }between s and every time series T_{ i } in the dataset is computed using Equation 1. The distance d_{ i } is represented as a point in the order line as shown in Figure 2. If Class(T_{ i }) = c_{ f }, then d_{ i } is represented as blue point. If Class(T_{ i }) ≠ c_{ f }, then d_{ i }is represented as red square.

The distance threshold δ is computed (as explained in the Methods section) to separate the two groups (blue and red groups).
The distance between a shapelet f and time series T is defined as dist(f,T) : = dist(s,T).
An Ndimensional (multivariate) time series of length L is defined as T = [T^{1},T^{2},…,T^{ N }] where T^{ j } is the j^{ th } dimension of T and T^{ j } [k] is the value of the j^{ th } dimension of T at time stamp k. Hereafter, we use the terms ‘multidimensional’ and ‘multivariate’ interchangeably.
where dist(s^{ j },T^{ j }) is defined as in Equation 1. Simply, the distance between two multivariate time series is a vector of distances where each component in the distance vector is the distance between the corresponding dimensions of the two multivariate time series. The distance between a shapelet f and a time series T is defined as dist(f,T) := dist(s,T).
Methods
In this section we first describe a recently proposed method for early classification of univariate time series [8] together with our suggested modifications. Then, we propose a new method for early classification of multivariate time series.
Modifications of univariate shapelet for early time series classification
An Early Distinctive Shapelet Classification (EDSC) method, which is proposed at [8] and described in Algorithm 1, is aimed to extract a small set of shapelets from univariate time series for early classification.
Algorithm 1: UnivariateShapeletsDetection
Input: A training dataset D of M univariate time series; minL; maxL
 1.
for each time series T ∈ D do {T is of length L}
 2.
for l ← minL to maxL do {for each shapelet length}
 3.
for k ← 1 to L – l + 1 do {for each starting position}
 4.
ShapeletDist(k,l,)
 5.
ComputeThreshold (f_{ lk },)
 6.
ComputeUtilityScore (f_{ lk })
 7.
Add(f_{ lk }, ShapeletList)
 8.
PruneShapelets(ShapeletList)
 9.
return ShapeletList
The method iterates over the time series in the dataset D (line 1). For each time series T, all shapelets of length l between minL and maxL (user parameters) are extracted from T. For each shapelet f_{ lk } (lines 2 and 3) the method calls the function ShapeletDist (line 4) that computes the distances between f_{ lk } and all time series in D using Equation 1. Then, the method computes the distance threshold (line 5) for the candidate shapelet f_{ lk } using Chebyshev’s inequality. Then, it assigns f_{ lk } a utility score (line 6) using a weighted F_{1} score measure. In line 8, the method ranks all extracted shapelets using their utility scores and selects a subset of the highest ranked shapelets as the pruned set of shapelets which can exhaustively classify time series.
The functions that compute the distance threshold and utility score are explained in the following sections. We describe how to prune the shapelets and use them for early classification in the Shapelet Pruning and Classification sections, respectively.
Distance threshold method
The Chebyshev’s inequality method is proposed for computing the distance threshold [8]. It guarantees that for any distribution, no more than 1/b^{2} of the distribution’s values are more than b standard deviations away from its mean [9]. The Chebyshev’s inequality is applied to the nontarget time series distances to compute the range where the nontarget distance has a low probability of appearing. The method refers to a onesided test, and is not able to find the distance threshold that can discriminate among the classes well. Here we proposed information gain [10] to find a discriminant distance threshold. In Additional file 1: Table S.4 of the supplementary document, we showed that using information gain as a method to compute the distance threshold outperformed the Chebyshev’s inequality method.
Information gainbased distance threshold for univariate shapelets
The basic idea is to find the shapelet’s distance threshold that maximizes the information gain and divides the dataset into two groups, target and nontarget time series [10].
Figure 4 shows an example of two distance thresholds δ_{1} and δ_{2}. The threshold δ_{1} splits the dataset into two datasets so that it has 4 true positives, 0 false positive, 4 true negatives, and 1 false negative. The information gain of δ_{1} is 0.4090. The distance threshold δ_{2} divides the dataset into two datasets so that it has 4 true positives, 1 false positive, 3 true negatives, and 1 false negative. The information gain of δ_{2} is 0.1591. Therefore, the threshold δ_{1} is chosen because it has maximum information gain.
Utility score method
The set of shapelets extracted from the dataset might be exceedingly large. Therefore, it is important to rank the shapelets in order to select a small subset of the shapelets for classification. For this reason, each shapelet has to be assigned a score that takes into consideration earliness as well as discrimination among classes.
The weighted F_{1} score method is proposed to rank shapelets [8]. In our study, we introduce the weighted information gain as a new utility score method. In the supplementary document (Additional file 1: Table S.5) we showed that our proposed method outperformed the weighted F_{1} method.
Weighted information gain
 1.
Compute the distance between the shapelet f = (s, l, δ, c_{ f }) and every time series T_{ i }in the dataset.
 2.
Split the dataset D into two datasets D_{ L } and D_{ R } such that D_{ L } contains all time series where dist(f, T_{ i }) ≤ δ and D_{ R } contains all time series where dist(f, T_{ i }) > δ.
 3.
For each time series T in the dataset D_{ L }, if Class(T) = c_{ f }, then T is weighted by EML(f,T). Otherwise, the time series is weighted by 1.
 4.
Compute M_{ L } as the weighted count of the number of time series in the dataset D_{ L } and M_{ R } is the size of the dataset D_{ R }.
 5.
Compute the weighted information gain using Equation 4.
The following theorem proves that the weighted information gain incorporates the earliness and assigns high utility score to the shapelet that has better earliness given the same accuracy performance.
Theorem: If f_{1} and f_{2} are two shapelets that have the same distance threshold (same splitting point), the same class, and different earliness (f1 has better earliness than f2), then f1 has better weighted information gain than f2. Proof: Suppose that the number of target time series in D_{ L } is N_{ T } and the number of nontarget time series in D_{ L } is N_{ NT }. Without loss of generality, since f1 has better earliness than f2, suppose that for every target time series T in D_{ L }, EML(f_{1},T) = P_{1} and EML(f_{1},T) = P_{2} such that P 1 < P 2. The weighted count M_{L 1} and M_{L 2} of the time series in D_{ L } for f_{1} and f_{2} is P_{1}N_{ T } + N_{ NT } and P_{2}N_{ T } + N_{ NT }, respectively. Since P 1 < P 2, then M_{L 1} < M_{L 2}. Hence the weighted information gain of f_{1} is greater than the weighted information gain of f_{2}.
Therefore, the weighted information gain gives high scores to the shapelets that come early in the time series.
Shapelet pruning
To select a subset of the shapelets for classification, the shapelets are sorted in descending order using their utility scores. In this manuscript, two methods have been used to select a subset of the shapelets.
The first method iterates over the shapelets starting from the highest ranked shapelet. We select the shapelet and remove all training examples that are covered by that shapelet. The shapelet f covers a training time series T if dist(f,T) ≤ δ and Class(T) = c_{ f }. We use the next highest ranked shapelet to see if it covers any of the remaining training time series. If it covers some of them, then we select the shapelet and remove all time series that are covered. Otherwise, we discard it and proceed to the next one. This process continues until all training time series are covered.
The second method simply involves keeping the top x shapelets from each class where x is a userdefined parameter. In our experiments, we used the top 5, 10, 15 and 20 shapelets from each class.
Classification
If the length of the shortest shapelets extracted by Algorithm 1 is l, then we can not classify any time series before observing l time points. Hence, the classification method (Additional file 1: Algorithm S.1) initially reads l time stamps from the test time series. It then gets the highestranked shapelet. If the shapelet covers the current stream of the test time series then the time series is classified as the class of the shapelet and the prediction is done. Otherwise, it gets the next shapelet from the ranked list and repeats the process. If none of the shapelets cover the current stream of the test time series the method reads one more time stamp and continues classifying the time series. Therefore, the test time series could be classified after reading number of time points greater than the shapelet’s length. If the method reaches the end of the time series and none of the shapelets covers it, the method marks the time series as a notclassified example. In the results section, we report the relative accuracy as well as the percentage of the covered test time series.
Multivariate shapelets detection for ECMTS
In a dataset of Ndimensional time series, the method extracts all Ndimensional shapelets f = (s,l,Δ,c_{ f }). The method assumes that all subsequences s^{ j } are extracted from the same starting position. Hence, we slide a window of length l over the time series. At each time stamp p, a subsequence s^{ j } of length l starting from time point p is extracted from the j^{ th } dimension to construct s = [s^{1},s^{2},…,s^{ N }]. An example of a 3dimensional shapelet is shown in Figure 3.
where Perc ∈ ]0,1].
The algorithm for extracting the multivariate shapelets from a dataset is similar to Algorithm 1. The algorithm iterates over each time series and extracts all multivariate shapelets. For each candidate multivariate shapelet, it computes the distances with every time series. Note that each distance is a vector of length N. Hence, the distances between a multivariate shapelet and all time series is a matrix with dimensions N × M where M is the number of time series. Then, the method computes the distance threshold and utility score for each candidate multivariate shapelet as explained in the following section. Finally, it prunes the shapelets using the same procedure as mentioned in the univariate case.
Distance threshold method
Multivariate information gainbased distance threshold for multivariate shapelets
The multivariate information gain (Additional file 1: Algorithm S.3) is computed in a similar way to the one that computes the information gain in the univariate case. It takes as input an Nshapelet f; a matrix Dist, that stores the multivariate distances between the shapelet and all M time series in the dataset; and Perc, which determines the percentage of dimensions used to compute Equation 6. It sorts the matrix Dist, and then the multivariate candidate threshold is computed as the midpoint between two successive distances (columns in the matrix Dist). Using the candidate threshold, the information gain is computed. Finally, the algorithm returns the multivariate threshold Δ = [δ^{1},δ^{2},…,δ^{ N }] that has maximal information gain.
Utility score method
The steps to adapt the utility scores defined on univariate time series are similar to the steps we have followed to adapt the distance threshold method.
After computing the score for each shapelet, the method sorts them in descending order according to their utility scores and then selects a subset of shapelets as explained in the Shapelet Pruning section. The classification process is similar to the process described in the Classification section, taking Equation 6 into consideration when computing the distance between the shapelet and the current stream of the query time series.
Dataset and data processing
Viral challenge datasets
We used two datasets for blood gene expression from human viral studies with influenza A (H3N2) and live rhinovirus (HRV) to distinguish individuals with symptomatic acute respiratory infections from uninfected individuals [11].
H3N2 dataset: A healthy volunteer intranasal challenge with H3N2 was performed in 17 subjects. Of those subjects, 9 became symptomatic and 8 remained asymptomatic. Blood samples were taken from each subject at 16 time points. Some subjects have missed certain measurements at time points 1,5,6 and/or 7. Hence, the gene expression values were measured on average 1416 times for each subject. 30 genes were identified, in ranked order, as contributing to respiratory infection [11]. We used 23 unique genes from that list that were found in the available dataset.
HRV dataset: A healthy volunteer intranasal challenge with HRV was performed in 20 subjects. Of those subjects, 10 became symptomatic and 10 remained asymptomatic. Blood samples were taken from each subject at 14 time points. We ignored time stamps 811 because the majority of the subjects missed the measurements at those time points. Thus, the gene expression values were measured on average 610 times for each subject. 30 genes were identified, in ranked order, as contributing to respiratory infection [11]. We used 26 unique genes from that list that were found in the available dataset.
Drug response dataset
Another clinical dataset was generated for studying the changes in cellular functions in multiple sclerosis (MS) patients in response to drug therapy with IFN β[12]. The dataset contains time series gene expression for 52 patients. The patients were classified as good responders (33 patients) or bad responders (19 patients) to the drug. The blood samples were taken every 3 months in the first year and every 6 months in the second year. Some patients missed certain measurements, especially at the 7^{ th } time point. Thus, the gene expression values were measured on average 57 times for each subject. The list of the genes used in our experiments is provided (Additional file 1: Table S.1).
Identification of triplets of genes for a Bayes classifier of time series expression data of multiple sclerosis patients’ response to the drug has been performed [12]. Previous research identified 12 genes in terms of triplets. Hence, we generated four datasets: Baranzini3A and Baranzini3B, consisting of one triplet of the best two triplets of genes, respectively; Baranzini6 has the top two triplets; and Baranzinin12 has all 12 genes identified by all triplets.
A discriminative hidden Markov model has been developed and applied to the MS dataset to reveal the genes that are associated with the good or bad responders to the therapy [13]. A total of 9 genes were found that are associated with the therapy. Hence, we constructed a dataset, called Lin9, consisting of those 9 genes.
A mixture of hidden Markov models has been developed to identify the genes that are associated with the patient response to the treatment [14]. A total of 17 relevant genes were found. Therefore, we constructed a dataset called Costa17 that contains data for these 17 genes.
Environment setup and evaluation measure
In all experiments we set minL = 3 and maxL to be 60% of the time series’ length. Since the number of subjects was small, bootstrapping was used for estimating the generalization error [15, 16]. We sample with replacement a subset (75%) from the original dataset. We train our model on the sample data and then test it on the subjects that are not used in the training data. This process is repeated 1000 times and the final reported statistics (like relative accuracy) is the median of the statistics over all bootstrap samples. We report the median instead of the average since the distribution of the statistics is skewed and not symmetric.
In the results, we report the median of the accuracy, the coverage (the percentage of the time series that are covered by the method), and the earliness (the fraction of the time series length used for classification). Note that the earliness varied from test example to another. In other words, each test example could be classified at different time point, so that our method is patientspecific and there is no fixed length of the time series used for classification.
where tp is the number of true positives, tn is the number of true negatives, fp is the number of false positives, and fn is the number of false negatives.
where smaller values of β put more weight on the earliness and larger values of β put more weight on the accuracy. Note that we use (1−Ear) because we want to penalize larger values of Ear. In our experiments, we used the balanced F_{1}score, which gives both the accuracy and the earliness the same weight. F_{1}score reaches its best value at 1 and worst score at 0.
Results and discussion
Evaluation of MSD method
Evaluation of the MSD method on the viral infection and drug response datasets using all genes
Dataset  Number of genes  Accuracy  Relative accuracy  Coverage  Earliness  F _{1} 

H3N2  23  77.78  85.71  100  62.50  0.5060 
HRV  26  70.00  71.43  100  40.00  0.6462 
Baranzini3A  3  70.00  73.91  95.83  46.26  0.6080 
Baranzini3B  3  66.67  68.00  100  44.81  0.6039 
Baranzini6  6  70.83  70.83  100  42.86  0.6325 
Baranzini12  12  66.67  66.67  100  42.86  0.6154 
Lin9  9  67.86  69.57  100  44.00  0.6136 
Costa17  17  68.00  69.23  100  45.24  0.6067 
From Table 1, it is clear that the MSD method achieved high accuracy using a small fraction of the time series. For example, MSD on the H3N2 dataset covered approximately 100% of the dataset, and out of the covered time series it achieved 85.71% accuracy using 62% of the time series’ length. On another benchmark dataset called Lin9, the method developed in [13] achieved 85% accuracy using the full time series (F_{1} ≈ 0.01) while our MSD method achieved approximately 68% accuracy using less than half of the time series’ length on average (F_{1} ≈ 0.51).
Evaluation of the MSD method on the drug response datasets using a subset of genes that gives the highest accuracy
Dataset  genes  Accuracy  Relative accuracy  Coverage  Earliness  F _{1} 

H3N2  Top 11 genes  80.00  87.50  88.89  64.29  0.4938 
HRV  RSAD2  71.43  75.00  100  38.89  0.6587 
Baranzini3A  Caspase 10  75.00  76.00  100  45.45  0.6316 
Baranzini3B  Caspase 2 , Caspase 3  75.00  76.19  100  44.05  0.6409 
Baranzini6  Caspase 10 , IL4Ra  75.00  76.00  100  43.45  0.6448 
Lin9  Caspase 2, Caspase 3, Jak2  81.82  82.61  100  43.43  0.6689 
Evaluation of the univariate method on all datasets
Dataset  gene  Accuracy  Relative accuracy  Coverage  Earliness  F _{1} 

H3N2  LOC26010  77.78  85.71  100  38.34  0.6879 
HRV  RSAD2  42.86  80.00  55.56  52.50  0.4506 
Baranzini3A  Caspase 10  12.00  100.00  12.25  42.86  0.1983 
Baranzini3B  Caspase 3  26.09  80.00  31.38  40.26  0.3632 
Baranzini6  Caspase 10  12.00  100.00  12.25  42.86  0.1983 
Baranzini12  Caspase 3  26.09  80.00  31.38  40.26  0.3632 
Lin9  Caspase 3  26.09  80.00  31.38  40.26  0.3632 
Costa17  Caspase 3  26.09  80.00  31.38  40.26  0.3632 
Baseline classifier for early classification
Evaluation of the random classifier on all datasets
Dataset  Accuracy 

H3N2  55.2833 
HRV  52.1869 
Baranzini3A  49.7893 
Baranzini3B  49.6808 
Baranzini6  50.8227 
Baranzini12  53.9255 
Lin9  50.7689 
Costa17  51.5093 
In addition, we compared MSD to the baseline classical classifier, which uses shorter time series. Recent research strongly suggested that the 1nearest neighbor (1NN) method with Dynamic Time Warping (DTW) is exceptionally difficult to beat [17]. Therefore, we compared MSD to the 1NN classifier using DTW. We compared (data is not shown) 1NN using Euclidean distance to 1NN using DTW and we found that 1NN with DTW is more accurate than 1NN with Euclidean distance.
On the HRV dataset (right group), the accuracy of 1NN using 50% of the time series’ length (gray bar) is worse than our early classification method MSD (yellow bar), and MSD used a smaller fraction of time series on average. For instance, 1NN achieved 55% accuracy on 1NN(50) dataset (F_{1} ≈ 0.46) while MSD was more accurate using on average 40% of time series’ length (F_{1} ≈ 0.64). The results were consistent with the H3N2 dataset.
Therefore, for the early classification task, using conventional classification methods on shorter time series is not as accurate as using methods specialized for early classification, such as our proposed method.
Runtime analysis
Runtime analysis of MSD on the viral infection and drug response datasets
Dataset  Number  Number  TS length  Time 

of genes  of examples  in seconds  
H3N2  23  17  16  295.1 
HRV  26  20  10  77.7 
Baranzini3A  3  52  7  49.3 
Baranzini3B  3  52  7  36.1 
Baranzini6  6  52  7  41.1 
Baranzini12  12  52  7  64.3 
Lin9  9  52  7  48.8 
Costa17  17  52  7  131.9 
Conclusion
For the early classification task, we proposed a method called Multivariate Shapelets Detection (MSD). It extracts patterns from all dimensions of the time series. In addition, we proposed using of information gainbased distance threshold and weighted informationgain based utility score of a shapelet. The weighted information gain incorporates the earliness and assigns high utility score to the shapelet that appears earlier. In order to adhere to the limitations of clinical settings (in which only a small prespecified number of genes is provided in shorter time series), datasets comprised of fairly short time series were used in reported experiments. However, our method is applicable to any domain. We showed that MSD can classify the time series early by using as little as 40%64% of the time series’ length. We compared MSD to a baseline classifier and showed that using the method proposed for early classification is more accurate than using conventional methods.
The run time of the MSD method grows exponentially with the number of examples and the length of the time series which limits the applicability of the proposed approach to datasets with smaller number of data instances and/or temporal observations. In practice, this is not a limitation for early classification in many health informatics applications (e.g. sepsis) since decisions typically have to be made very early by learning from a small number of patients. However, in future work, we will speed up the run time of the method by incorporating parallelism in the algorithm.
We are working to improve MSD by allowing the components of the multivariate time series shapelet to have different starting positions. Since the number of candidate shapelets grows exponentially, the concept of closed shapelets, and maximal closed shapelets can be introduced to pruning redundant shapelets that are supersets of smaller shapelets. Another extension to our work is to let the horizon between the time stamps in the subjects vary.
Author’s contributions
MG designed the algorithms, implemented software, carried out the analysis, and drafted the manuscript. ZO inspired the overall work, provided advice, and revised the final manuscript. Both authors read and approved the final manuscript.
Declarations
Acknowledgements
We thank everyone in Prof. Obradovic’s laboratory for valuable discussions. Special thanks to the reviewers for their valuable suggestions that helped improving presentation and characterizing the proposed method, and to Dušan Ramljak for reviewing the initial draft of the paper.
This work was funded, in part, by DARPA grant [DARPAN660011114183] negotiated by SSC Pacific grant; the US National Foundation of Science [NSFCNS0958854]; and the Egyptian Ministry of Higher Education.
Authors’ Affiliations
References
 Box GEP, Jenkins GM, Reinsel GC: Time Sereis Analysis: Forecasting and Control. Wiley, Chichester; 2008.View ArticleGoogle Scholar
 Bracewell RN: The Fourier Transform and Its Applications. 3edition. McGrawHill Science/Engineering/Math; 1999.Google Scholar
 Goodwin GC, Ramadge PJ, Caines PE: Discrete time multivariable adaptive control. 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes 1979, 335–340.Google Scholar
 Batal I, Hauskrecht M: Constructing Classication Features Using Minimal Predictive Patterns. ACM Conference on Information and Knowledge Management 2010.Google Scholar
 Dua S, Saini S, Singh H: Temporal Pattern Mining for Multivariate Time Series Classification. J Med Imaging and Health Inf 2011, 1(2):164–169. 10.1166/jmihi.2011.1019View ArticleGoogle Scholar
 Kadous MW, Sammut C: Classification of Multivariate Time Series and Structured Data Using Constructive Induction. Machine Learning 2005, 58: 179–216. 10.1007/s1099400558265View ArticleGoogle Scholar
 Xing Z, Pei J, Yu PS: Early Prediction on Time Series: A Nearest Neighbor Approach. Proceedings 21st International Joint Conference on Artifical Intelligence 2009, 1297–1302.Google Scholar
 Xing Z, Pei J, Yu PS, Wang K: Extracting Interpretable Features for Early Classification on Time Series. Proceedings of 11th SIAM International Conference on Data Mining 2011, 439–451.Google Scholar
 Allen AO: Probability, Statistics, and Queuing Theory with Computer Science Applications. Academic Press; 1990.Google Scholar
 Mueen A, Keogh E, Young N: LogicalShapelets: An Expressive Primitive for Time Series Classification. Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2011, 1154–1162.Google Scholar
 Zaas AK, Chen M, Varkey J, Veldman T, III AOH, Lucas J, Huang Y, Turner R, Gilbert A, LambkinWilliams R, Øien NC, Nicholson B, Kingsmore S, Carin L, Woods CW, Ginsburg GS: Gene Expression Signatures Diagnose Influenza and Other Symptomatic Respiratory Viral Infections in Humans. Cell Host and Microbe 2009, 6(3):207–217. 10.1016/j.chom.2009.07.006PubMed CentralView ArticlePubMedGoogle Scholar
 Baranzini SE, Mousavi P, Rio J, Caillier SJ, Stillman A, Villoslada P, Wyatt MM, Comabella M, Greller LD, Somogyi R, Montalban X, Oksenberg JR: TranscriptionBased Prediction of Response to IFNSS Using Supervised Computational Methods. PLoS Biol 2005, 3(1):166–176.Google Scholar
 Lin T, Kaminski N, BarJoseph Z: Alignment and classification of time series gene expression in clinical studies. Bioinformatics 2008, 24(13):i147i155. 10.1093/bioinformatics/btn152PubMed CentralView ArticlePubMedGoogle Scholar
 Costa IG, Schönhuth A, Hafemeister C, Schliep A: Constrained mixture estimation for analysis and robust classification of clinical time series. Bioinformatics 2009, 25(12):i6i14. 10.1093/bioinformatics/btp222PubMed CentralView ArticlePubMedGoogle Scholar
 Lendasse A, Wertz V, Verleysen M: Model Selection with CrossValidations and Bootstraps  Application to Time Series Prediction with RBFN Models. In Artificial Neural Networks and Neural Information Processing ICANN/ICONIP 2003. SpringerVerlag; 2003:573–580.View ArticleGoogle Scholar
 Jain AK, Dubes RC, Chen CC: Bootstrap Techniques for Error Estimation. IEEE Trans Pattern Anal Machine Intelligence 1987, PAMI9(5):628–633.View ArticleGoogle Scholar
 Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E: Querying and mining of time series data experimental comparison of representations and distance measures. Proc VLDB Endowment 2008, 1(2):1542–1552.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.