The algorithm is based on base digitalization followed by calculation of information coefficient. Information coefficient we used in this research is a formula based on Shannon's information theory. Shannon's information theory was ever used in other research concerned on primer design, e.g. Purohit et al. [10] used entropy measure to identify conserved regions in aligned sequences for primer design.

The first step of this algorithm is to find out appropriate annealing sites from template sequences. Firstly, let the upstream and downstream primer sequences slide along template sequences respectively, one base per step. At each position where 3'-end base of primers match with the base of templates, a DNA fragment can be acquired from template sequences which length is the same as primer length, which we name as Candidates of Annealing Sites (CAS). The primer sequences and CAS fragments are all converted into numeric vectors. We name the numeric vectors of two primer sequences as *PU* and *PD*, and fragments of template as *TU* and *TD*. They are defined as below.

*PU* = [*p*
_{1}, *p*
_{2},..., *p*
_{
m
}], *PD* = [*p*
_{1}, *p*
_{2},..., *p*
_{
n
}] (1)

*TU* = [*t*
_{1}, *t*
_{2},..., *t*
_{
m
}], *TD* = [*t*
_{1}, *t*
_{2},..., *t*
_{
n
}] (2)

Variables *m* and *n* are lengths of upstream and downstream primers. The *p*
_{
i
}and *t*
_{
i
}are defined as below.

According to equation (1), (2), (3) and (4), we can transform the DNA sequences of primers and CASs into numeric vectors. Then we can perform the next step, computing the information coefficient (*I*) for each primer-CAS pair. The formula for the calculation of information coefficient (*I*) is as equation (5).

Only those sites where the similarity is higher than a preset threshold were selected as annealing sites if the last digit in the vector of T was not 0 (a requirement for perfect match at the 3'-end).

The formula for information coefficient (*I*) calculation is as follows:

The value field of information coefficient (*I*) is (0,1], when primer sequence match with template completely, i.e. *P* = *T*, *I* = 1, and the higher the affinity between primer and template, the greater the value of information coefficient. Information coefficients formed by upstream primer and CAS of template are represented as a set *I*
_{
up
}; and accordingly, information coefficients of downstream primer and CAS are *I*
_{
dn
}.

For each probable product, a successful amplification was determined by 5 parameters: upstream information coefficient *I*
_{
up
}, downstream information coefficient *I*
_{
dn
}, estimated limits for product maximum and minimum length, and product amplification coefficient (*P*
_{
a
}) which equals to an average of *I*
_{
up
}and *I*
_{
dn
}. There are two kinds of average methods provided in SPCR program, the arithmetic average, i.e. (*I*
_{
up
}+*I*
_{
dn
})/2, and the geometric average, (*I*
_{
up
}**I*
_{
dn
}). We discuss only the geometric average here. Although the two method of average are different in computation, the values are close, and will not change the result of prediction significantly. If, and only if, the values of *I*
_{
up
}, *I*
_{
dn
}, and *P*
_{
a
}are all greater than the preset thresholds, and the length of predicted product lies in the preset length limit, did SPCR generate a product between upstream and downstream primer annealing sites within the product length range.

SPCR was implemented as a Win32 application and written in C++ language. It comprises an executable program that can be run directly without the need for installation. The user inputs the primer sequences, sets the thresholds, and provides locally one or more template sequence files, and push the button "Start PCR" to begin the prediction. SPCR can recognize degenerate primers encoded with the IUPAC nucleotide codes. Degenerated base are allowed in primer sequences. Template sequences can be available genomic sequence of the target organism, which must be Single- or Multi-FASTA format. The output of SPCR is a text file of a list of all predicted PCR products.

SPCR saves all produced data, including predicted products and all parameters, into a user-specified result file in pure text format. The output file consists of four parts. The first part is a table in which all of predicted products are listed, including their *P*
_{
a
}value, product length, template it comes from, direction of amplification, position of beginning and end, *I*
_{
up
}and *I*
_{
dn
}value, and the upstream and downstream primer sequences. The second part is a digit indicating the number of predicted products. The third part is the detailed nucleotide sequence data in FASTA format of all predicted products in the same order as in the table. The last part includes all parameters set before SPCR running. The time complexity of this algorithm is *O(n)*, where *n* is the aggregate length of all the template sequences. In addition, SPCR provides a function to simulate agarose gel patterns of output data.