Frequency distribution of single nucleotides (A, T, G, C) on six data sets. Figure 1 shows the frequency of the four single nucleotides (A, T, G, C) on six different data sets: (a)S1000. (b)S2000 (c)Shk1000 (d)Shk2000 (e)Sts1000 and (f)Sts2000. In each figure, x axis is the bin number and y axis is the number of occurrences of a single nucleotide in a bin. G/C content is shown to be much higher than A/T content at the location close to TSSs in all figures. But there is a small increase of A/T content at the location where TATA Box resides(the second closest bin to TSSs). This can help explain why there is a TATA Box in a area where the majority of bases are G and C. At the 5' end of promoters far from TSSs, A/T content is observed to be higher than G/C content. And little difference of frequency of single nucleotides is observed between housekeeping genes and tissue specific genes when comparing Figure 1(c) and Figure 1(e) and comparing Figure 1(d) and Figure 1(f).