The abundance of data in the post-genomics era is a major boon for life science researchers. However, data from disparate sources arguably have the most value when considered in context with each other. For example, manually curated experimental evidence may be more reliable than computational predictions, but the latter may offer greater coverage. Whilst drawing conclusions based on the results of multiple experiments is by no means a new concept in biology, omics data and in silico analyses make traditional ad hoc methods of publishing and sharing data impractical. With the trend for data expansion set to continue and the highly collaborative approaches of major projects such as ENCODE , integration is likely to become an increasingly important focus of bioinformatics.
Efforts to integrate data sources may be broadly categorised by their motivation:
aggregating and presenting data in an accessible format
computational analysis of combined data sets
federation of disparate resources
Each of these goals, although not necessarily mutually exclusive, has its own requirements. For example, whilst user interfaces must be responsive and accessible, computational analysis requires robust semantics.
The Distributed Annotation System (DAS)  was originally conceived as a mechanism to aggregate and display genome sequence annotations such as transcript predictions. It is built upon the principle that data should remain spread across multiple sites, rather than aggregated into centralised databases. Thus data providers retain control over data access, releases can be more dynamic and changes to file formats or database structures are transparent. DAS has a "dumb server, clever client" architecture, which holds a number of advantages. For example, the minimal resources and time required of data providers to expose their data means more sources can be integrated and more readily. Conversely, one of the main reasons for this ease of implementation is a lack of enforced semantics, which limits applications primarily to visual display. In addition, DAS has been lacking a central registry of available data sources.
DAS was developed by WormBase  for sharing genome annotations, and was adopted by the Ensembl project  to facilitate the display of such distributed data in its genome browser. The applicability of DAS was extended to protein sequence and structure data by the efforts of the eFamily project to integrate five of the major protein databases [5, 6]. It was subsequently adopted by the BioSapiens Network of Excellence as the mechanism of sharing proteomics data among member institutions [7, 8], and also by the ENCODE project to dynamically share the latest data between collaborators. Many other individual projects across the world also expose their data and/or operate integration services via DAS.
As a standard for the sharing of biological information, the DAS protocol defines how data should be represented and communicated. It takes the form of a web service based upon the open standards of Hyper-Text Transfer Protocol (HTTP) for data transmission and Extensible Markup Language (XML) for data format. A DAS server may host a number of sources, each differing in the services it provides and the type of underlying data it is based on.
DAS may be used to annotate different types of data. In order to distinguish these, coordinate systems describe the various reference data types DAS supports. Each coordinate system may be thought of as a model that bioinformaticians commonly use to denote biological entities and locations of features within them. A coordinate system has four parts:
1. The category or type of annotatable entity. For example a chromosome, gene, protein sequence or protein structure.
2. The authority or project responsible for defining the coordinate system. For example NCBI, UniProt or Ensembl.
3. The version, used where entities themselves are not versioned (as in genomic assemblies).
4. The species, for coordinate systems containing only entities from a single organism.
Though coordinate systems are normally used to describe the location of a feature within a reference entity (for example residue 26 of UniProt sequence P15056), some annotations are not always associated with a sequence location but rather the entity itself (for example database cross-references). Such features are commonly called non-positional features and are used most when annotating genes, which themselves are often thought of as abstract entities. The difference between annotating an entity versus a region of an entity's sequence is conceptual and requires no special implementation for a data source, but does have implications for a client's display.
A DAS source may offer one or more different services to clients, determined by the commands it implements. A DAS command is a request issued by a client for a certain class of data, such as a sequence or annotations of a sequence. The server responds with an XML document representing the requested data. DAS defines a model for constructing the query (a specific URL format), a model for representing the data (an XML document type) and its means of transport (HTTP). Each command has similar but distinct query and data models. Version 1.53 of the DAS specification  has five main commands:
entry points – fetches a list of entities a source can annotate
sequence – fetches the sequence of a segment of DNA, protein et cetera
features – the most commonly implemented command; fetches annotations located within a segment
types – fetches a list of the types of feature a source or segment has
stylesheet – fetches instructions for displaying features
DAS sources that offer sequences are often referred to as reference sources because they provide the reference entry points for other commands on the same or different servers. Sources implementing the features command are by contrast referred to as annotation sources because they provide annotations based on a reference sequence. This distinction is largely historical since some DAS sources are conceptually both reference and annotation sources, and DAS has since expanded to cover non-sequence data.
The DAS specification has also been extended with several other commands, such as those offering 3D structures and alignments. These are discussed in the Results section.
The steady growth in both the number and diversity of publicly available DAS sources necessitated the development of a method for the discovery of DAS services. Previously reported is the implementation of such a mechanism in the form of the DAS Registry [6, 10]. This service allows data providers to publish their DAS sources, allowing their automatic discovery by compatible clients. This discovery feature has been incorporated into most client implementations and libraries. The registry also performs service validation on registered sources to check that they are both functioning and conforming to the DAS specification. The number of registered sources has steadily increased since the DAS registry was created, to date totalling 383.