Architecture
We mantain a server (Figure 3) that integrates genomic annotations (Probeset Data in Tabular Format) from the NetAffx™ Analysis Center [11], single probe mappings over NCBI RefSeqs and EMBL cDNAs generated at ISREC using the software tagger [12] and the exon-intron genomic coordinates of each known human transcript downloaded with the UCSC Table Browser [13]. The tables mapping single probe pairs to RefSeq and cDNAs are part of the CleanEx database [14]. The integrated annotations are stored into a MySQL[15] database; it is possible to automatically update a set of tables related to GeneChip® platform with the perl script add_chip.pl. Two Object Oriented Perl modules are the core of the system: Splicy::AffyDB and Splicy::Probeset.
The first module parses raw data in tabular and comma-separated format, and inserts this data into a set of MySQL tables. The second module is used to store probeset data into a Perl object at runtime and to manipulate this data in order to generate the graphical displays. Splicy currently runs on a public web-server at the IFOM-IEO research institute [1].
The program accepts in input a GeneChip® platform, a class of putative targets (RefSeq, cDNAs or both) and a list of objects to be queried. A query object can be a Probeset ID, a RefSeq accession, a Gene Symbol or an Affymetrix® Representative Public ID. A Representative Public ID is a sequence (chosen during chip design) which is optimally associated with the transcribed region that is interrogated by the probeset [11]. Once the given object (transcript or probeset) is identified, Splicy organizes the informations by Probeset ID and produces a series of graphical displays showing the association between the probeset and the transcripts targeted by the probes.
For each Probeset retrieved, Splicy can display a graphical report (Probe Maps) or can generate two different tsv (tab-separated) files: one containing the information related to the Probeset and another focused on the Probe Pair data.
We currently maintain mappings of the most popular human and mouse Affymetrix™ GeneChips®, and Splicy can be queried for matches with human and mouse RefSeqs and EMBL cDNAs.
Probe Maps and 'splice diagnostic probes'
Each graphical display is generated from numeric data using the GD graphic library [16] and the Perl module Bio::Graphics, part of the BioPerl distribution [17]. Splicy maintains static coordinates data relative to alignments between probes and transcripts (start and end of probes alignments and length of the transcripts). At runtime the module Splicy::Probeset.pm uses the intron-exon genomic coordinates to convert transcript-relative coordinates to genomic coordinates (Figure 4).
Each image (Figure 5) reports a line showing the position of the transcript on the chromosome (5.1) and the genomic exon-intron structure for a transcript associated with the probes (5.2). Below the transcript structure there are some glyphs (boxes) used to highlight the exons containing one or more matching probes; these boxes are marked with the number of probe pairs matching with this specific exon (5.3). A second line of red boxes connected by segments (5.4) underline "Junction Probes" which are at the boundary of two exons. These probes are particularly interesting because if one of the two exons involved in the hybridization is skipped in an alternative isoform, the probes can produce different hybridization patterns. The last lines in the graphical display show, as triangular glyphs, the position of single probes over the transcript (5.5). If a given probe belongs to an exon which is skipped in a different transcript (isoform) of the same gene, it is tagged as a potential 'splice diagnostic probe' and marked red (5.6). The idea is that a given probeset containing 'splicing diagnostic probes' will behave differently in the hybridization process, according to the transcript variant present in the hybridization mixture. The images are associated with HTML client-side maps that associate the triangular glyphs (position of single probes) with a pop-up generated with the Javascript library Overlib [18]: when the user mouse is over a specific glyph, Splicy generates a small pop-up showing: probeset ID, oligonucleotide sequence, X and Y on the array, position of the alignments on the given target transcript ([start stop]length_of_the_transcript). All the annotation data stored statically on the server can be retrieved from the graphical interface using a set of buttons on the top of the graphic report: Design (description of the Representative public ID), Targets (all the transcripts matching with the selected probeset), Probe Pairs (nucleotide sequence and position on the array of the single probes), Alignments (coordinates of the genome alignments), Notes and Links (further notes and links related to the target representative public ID), Function (GO functional classification). Splicy provides also a direct link to the Entrez Gene [19] entry corresponding to the target gene and (at the bottom of the record) direct links to the UCSC genome browser [20].
Tab-separated files
The tab-separated files contain annotations and the mapping data related to the transcripts and/or chromosomes, starting from a given list of objects (Probeset ID, Gene Symbols, Representative Public ID, RefSeq).
Probeset and Probepairs informations are available for download; the user can interactively select which kind of data to include in the output. Two different files will be generated, one containing information related to the Probeset ID (file suffix PS_) and another file containing information related to single probe pairs (file suffix PP_). Once the user has selected which kind of data to include in the output, the files are generated into a temporary directory accessible by the user. If a user selects more than 30 objects, an e-mail address is requested, and the server sends an e-mail to the user once the requested file is complete.
GUI Interface
The Splicy interface is flexible and user-friendly. The first page contains links to the following sections: Probe Maps, TAB files, statistics, help, source code and documentation.
The Probes Maps form allows the user to select a GeneChip® platform with a set of target transcripts (human and mouse RefSeq and EMBL cDNAs) and to insert a list of query objects (Probeset ID, RefSeq accession, Gene Symbols, Representative Public ID).
The TAB Files form is composed by three windows: General Info allows the user to select a GeneChip® platform, a set of target transcripts, a list of query objects and an e-mail address; the Probeset frame allows the selection of data related to the probeset (GeneChip® informations, sequence design informations, RefSeq targets, Alignments, Functional GO annotation); the Probe Pairs window enables selection of data related to the single probes (position on the array, sequence, probe mapping on the genome and on the target transcripts). The statistics page contains general information about the number of platforms available in the server (number of GeneChips® available, number of probesets, number of splice diagnostics probesets for RefSeqs targets and for EMBL cDNAs targets). Help and Documentation pages describe the use of the Web-Inteface and of the Perl modules.