Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets

Figure 1

Overview of the LSI based procedure to calculate association values between genes. Gene documents were created for each of the 21,027 genes in the mouse genome by concatenating titles and abstracts corresponding to the genes. The documents were parsed to produce a term-by-gene matrix, the entries of which contained weighted term frequencies a ij calculated in two ways. The matrix was first normalized and then its dimensionality reduced using SVD. The association between any two genes was calculated as the cosine between any two gene document vectors in 500 dimensions.

Back to article page