Samples
mRNAs were prepared from mouse embryonic stem (ES) cells at 0, 12, 24, 48, and 96 h after removal of Leukemia Inhibitory Factor (LIF) from the culture medium. The samples subjected to HiCEP reaction were duplicated. We designated each sample as 0 h1, 0 h2, 12 h1, 12 h2, 24 h1, 24 h2, 48 h1, 48 h2, 96 h1, and 96 h2.
HiCEP analysis
mRNAs prepared from each sample were digested with two 4bpcutting endonucleases (Msp I combined with Mse I) and ligated with the corresponding adaptors. The resulting HiCEP templates, MspIMseIpoly(A) mRNAs, were amplified by fluorescently labelled primers; for labelling, FAM, HEX, and NED were used. In total, 256 primer combinations (16 MspINN primers combined with 16 NNMseI primers; N = {A, C, G, T}) were used in the HiCEP analysis. For example, a primer combination of MspITA and GCMseI is capable of amplifying particular transcriptderived fragments (TDFs) corresponding to that combination. The details of the protocol of the HiCEP reaction are described elsewhere [11]. An animation of the principle is provided at the following URL http://133.63.22.11/english/research/serch03.html.
Electrophoresis and image analysis
The PCR products were denatured and loaded on an ABI Prism 310 (Applied Biosystems) for capillary gel electrophoresis. The digitized images were analyzed by the GeneScan software (Applied Biotech). The size of the fragments was calculated by the software, according to internal molecular size markers of 35, 50, 75, 100, 139, 150, 160, 200, 300, 340, 350, 400, 490, 500, 600, and 700 bp, on each gel. The fragment sizing and baseline subtraction were performed by the software. The software quantifies each peak by the fragment length L (in bp), peak height H, and area A (in arbitrary units). Accordingly, the subsequent normalization procedure accepts these threetuples as input for detected TDFs between 35 bp and 700 bp. TDFs smaller than 35 bp or larger than 700 bp were omitted from the analysis because the range was outside the size calibration range.
Delineation of quality scores for lanes
The starting point of normalization is a set of lanes (10 timecourse measurements; 0, 12, 24, 48, and 96 h, each experiment duplicated) in each of 256 primer combinations. We explain the procedure using data from the primer combination of 'Msp ICT combined with ttMse I (designated as CTtt)' because the ten electropherograms have some ranges for which fragment sizing is obviously inappropriate (we therefore designated such ranges as "dissimilar ranges").
The first step starts from the Gaussian approximation of each lane. The use of the approximating lane is the same as described in Aittokallio et al. [25–27]. Briefly, a fragment F_{
i
}in lane P is originally characterized by the threetuples (L_{
i
}, H_{
i
}, A_{
i
}). If lane P consists of n fragments , the approximation of the lane at length t is given by:
where σ_{
i
}is obtained from the following equation:
The approximation is performed independently for each lane. The ten approximate profiles of timecourse data in the primer combination of CTtt are shown in Fig. 1a.
For the automated identification of 'dissimilar ranges' from the expression profiles of ten lanes
, we next assign quality scores to each of the fragments , where the fragments are originally numbered with respect to their lengths. By using the ten approximate profiles, relative similarity scores for intervals from fragment i to fragment (i+T1) (i = 1, 2,..., n  T + 1) in lane P^{target}(target = {1, ..., 10}) are calculated from the following equation:
where
is the Pearson correlation coefficient between the target lane P^{target}and one of the other lanes P^{k}in the interval (startend bp) which always includes T fragments from fragment i to fragment (i+T1) (i = 1, 2, ..., nT+1). The interval is defined as: start = L_{
i
} 2.5σ_{
i
}and end = L_{i+T1}+ 2.5σ_{i+T1}. In this analysis, the number of fragments T is held constant at T = 5; other numbers are of course possible. By applying a moving window of T fragments, most of the fragments (nT+2 fragments in this case, with the exception of F_{1}, F_{2}, F_{3}, F_{4}, F_{n3}, F_{n2}, F_{n1}, and F_{
n
}) have T relative similarity scores. Finally, the relative quality value Q(L_{
i
}) for fragment F_{
i
}is defined as the average of the similarity scores which satisfy start ≤ L_{
i
}≤ end. An example of the calculation is given in Table 1. Quality scores at arbitrary lengths t, Q(t), are interpolated by the use of cubic splines to . The procedure is applied to each of the ten lanes and then the quality profiles corresponding to the expression profiles are created (Fig. 1b).
The quality profiles delineated from Q(t) have a clear interpretation. The high (or low) score for Q^{k}(t) in lane k indicates a high (or low) level of relative similarity between the lane and the others around the length t.
Detection of dissimilar ranges
Now we have information (quality profiles) for the automated detection of dissimilar ranges. Here we adopt a simple method for detecting the range. Briefly,

1)
Seek 'seed' ranges (C_{
seed_s
} C_{
seed_e
}bp) which satisfy two conditions: a) Q(t) ≤ thres_{
seed
}and b) they contain at least two peaks.

2)
Seek C_{
tmp_s
}which satisfies both and C_{
tmp_s
}<C_{
seed_s
}; similarly, C_{
tmp_e
}, and C_{
tmp_e
}<C_{
seed_e
}

3)
Substitute the nearest marker length
(in this case, M_{1} = 35, M_{2} = 50, ..., = 700) to C_{
tmp_s
}(resp. C_{
tmp_e
}) for C_{
s
}(resp. C_{
e
}); accordingly, both C_{
s
}and C_{
e
}= and C_{
s
}<C_{
e
}
Aparameter thres_{
seed
}is set to 0.3 empirically. Forexample, P^{9} has the following parameters in Fig. 1b: C_{
seed_s
}= 57.04, C_{
seed_e
}= 89.98, C_{
tmp_s
}= 52.60, C_{
tmp_e
}= 104.60, C_{
s
}= M_{2}, and C_{
e
}= M_{4}. Although fine tuning might be necessary, the procedure enables us to display dissimilar ranges.
Selection of the reference lane
When we want to correct a dissimilar range (C_{
s
} C_{
e
}bp), we have to select the "reference" (a kind of mean or typical profile in the corresponding range). One method is to choose lane P^{reference}satisfying max {}, where is the average of Q^{k}in the range (C_{
s
} C_{
e
}bp) in lane . For example, the algorithm selects P^{10} (i.e., 96 h2) as a reference in a particular range (M_{1}M_{2} bp) and also in range (M_{2}M_{4} bp).
Two models for the normalization of dissimilar ranges
The meaning of the word "normalization" here is to correct the fragment lengths (L) and the areas (A) of peaks in a dissimilar range so that the similarity between the normalized electrophoretic pattern and the reference pattern in the corresponding range can be maximized. To normalize a particular lane P^{target}against the reference P^{reference}, we now consider the following two models. Model 1 is the case of an incorrect fragment sizing at the shortest (or longest) marker peak, i.e, C_{
s
}= M_{1} = 35 (or C_{
e
}= = 700). The peak lengths deviate more and more from the reference length moving from C_{
e
}to C_{
s
}(or from C_{
s
}to C_{
e
}). Model 2 is the case of an incorrect fragment sizing near marker length M_{
j
}(C_{
s
}<M_{
j
}<C_{
e
}, j = {2, 3,..., n_{
M
} 1}; the inside of dissimilar range (C_{
s
} C_{
e
}bp)). Roughly, the deviation of peak lengths from the reference length gradually increases starting from C_{
s
}; the maximum deviation is reached at M_{
j
}(C_{
s
}<M_{
j
}<C_{
e
}); the deviation decreases gradually; and finally disappears at C_{
e
}bp.
Normalization is performed by either expanding or compressing. Consider, for example in Model 1, normalization for the expression profile of P^{3} (12 h1) in range (M_{1}M_{2} bp) against the reference P^{10} (96 h2). Undoubtedly, the profile displays a systematic deviation from the reference. The degree of the deviation gradually increases starting from M_{2} bp to M_{1} bp probably because an intense peak generated near the shortest marker peak for the correction of M_{1} bp is used mistakenly. We expect the normalization will be achieved by a linear expansion of the short side (M_{1}) of the range (M_{1}M_{2} bp) by anchoring the long side. The best approximating profile is found by considering various combinations of normalization candidates starting from D × 100% expansion to D × 100% compression of the short side at intervals of d bp. We set D = 0.4, as a maximal realistic displacement and d = 0.2. Accordingly, in practice, the number of combinations is 2 × (C_{
e
} C_{
s
}) × D/d + 1 (for example, there are 61 combinations of normalization candidates in the range (M_{1}M_{2} bp)) in Model 1.
For each combination x (x = {0, 1, ..., 2 × (C_{
e
} C_{
s
}) × D/d)}, we make a candidate profile P_{
x
}by changing three parameters (L_{
i
}, A_{
i
}, and σ_{
i
}) accompanied by fragments (F_{
i
}) in the dissimilar range (C_{
s
} C_{
e
}bp), according to the level of correction (expansion or compression). Those parameters are calculated as follows:
Candidates are made by substituting these transformed threetuples
in a given range (C_{
s
} C_{
e
}bp) into eq. (1). The best approximate profile is the one that achieves the highest correlation coefficient between P^{reference}and P_{
x
}(x = {0, 1, ..., 2 × (C_{
e
} C_{
s
}) × D / d}) in the range (C_{
s
} C_{
e
}bp). In the normalization for the expression profile P^{3} in the range (M_{1}M_{2} bp) against the reference P^{10}, the best normalized profile by our method matches well with the reference (Fig. 2).
A good example of Model 2 is the expression profile P^{2} (0 h2) in range (M_{2}M_{4} bp) with the reference P^{10} (96 h2); there is no possibility of Model 1 (C_{
s
}≠ M_{1}) and the number of incorrect marker lengths is only one (M_{3}). Model 2 is a mixture of Model 1. The normalization is also done by one of the {2 × (C_{
e
} C_{
s
}) × D / d + 1} combinations starting from D × 100% compression of the long side in (M_{2}M_{3} bp) and D × 100% expansion of the short side in (M_{3}M_{4} bp) to D × 100% expansion of the long side in (M_{2}M_{3} bp) and D × 100% compression of the short side in (M_{3}M_{4} bp) at intervals of d (= 0.2) bp. Unlike Model 1, we set D = 0.1 as a maximal realistic displacement. In the normalization for the expression profile P^{2} in the range (M_{2}M_{4} bp) against the reference, the best normalized profile by our method is matches well with the reference (Fig. 3).
It should be noted that when a dissimilar range (M_{
j
}M_{j+l}bp) is very wide (j = 1, 2,..., n_{
M
} l; l ≥ 3), there are two or more possibilities for incorrect marker lengths in Model 2. Of these cases, we only consider cases with j = 1 in Model 1 because such cases are the only realistic ones. For the remaining cases (j = 2,..., n_{
M
} l; l ≥ 3), the experiment should be redone rather than trying to normalize them by considering numerous possibilities. It should also be noted that there is a case of a dissimilar range (M_{1}M_{3} bp) to which both Models 1 and 2 are applicable. In this case, the best approximate profile is decided by comparing the two best possible profiles determined using Models 1 and 2.