From: Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
 |  |  |  | numbers of image sets retrieved for target gene using different methods |  | ||
---|---|---|---|---|---|---|---|
 |  | test gene |  | sequence | text |  | |
frog species | current symbol | current full name | mRNA accession or other identifier | with full-length mRNA | with current gene symbol | with trial and error text terms | notes |
X. laevis | chrd | chordin | NM_001088309 | 3 | 0 | 3 | Â |
 | hes1 | hairy and enhancer of split 1 | NM_001085917 | 1 | 1 | 1 |  |
 | nog-A | noggin | NM_001085644 | 1 | 0 | 1 |  |
 | Six1 | homeobox protein SIX1 | NM_001088558 | 1 | 1 | 1 |  |
X. tropicalis | bambi | BMP and activin membrane-bound inhibitor | NM_001008193 | 2 | 2 | 2 | Â |
 | bmp4 | bone morphogenetic protein 4 | Xt7.1-XZT65619.5.5 | 3 | 1 | 3 | mRNA from Entrez Gene appears to be truncated, used EST-based contig sequence instead |
 | fgf8 | fibroblast growth factor 8 | NM_001008162 | 1 | 0 | 1 |  |
 | lhx1 | LIM homeobox 1 | NM_001100228 | 2 | 0 | 2 |  |
 | smarcd1 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily d, member 1 | NM_001004862 | 1 | 0 | 0 | probe design sequences were in 3'UTR so there were no BLAST hits for text identification |
 | sox2 | SRY (sex determining region Y)-box 2 | NM_213704 | 4 | 0 | 4 | alias gene symbol 'sox-2' worked better than 'sox2' |
 | t | T, brachyury homolog | NM_001008138 | 6 | !! | 6 | a large number of protein descriptions contain the letter 't' |
 | tp53 | tumor protein p53 | NM_001001903 | 2 | 0 | 2 | older alias gene symbol 'p53' retrieved both image sets |