Skip to main content

Table 2 Methods, input and output parameters in the SEED Web services API

From: Accessing the SEED Genome Databases via Web Services API: Tools for Programmers

Method Name

Parameters & Order

Description

abstract_coupled_to

peg

Get the pegs that may be coupled to this peg through abstract coupling. Input is a peg, output is list of [protein, score] for things that are coupled to this peg

Adjacent

pegs

Retrieve the set of pegs in order along the chromosome. Input is a comma separated list of pegs, and output is the pegs in order along the genome.

alias2fig

alias

Get the FIG ID(s) (peg) for a given external identifier. Input is an identifier used by another database, output is a list of our identifiers. Note that an alias can refer to more than one protein since the mapping is done via protein sequence.

aliases_of

peg

Get the aliases of a peg. These are the identifiers that other databases use. Input is a peg, output is an array of aliases

ali_to_seq

alias

Retrieve the protein sequence for a given identifier. Input is an alias, output is a sequence

all_families

 

Get all the FIG protein families (FIGfams). No input needed, it just returns a list of all families

all_families_ with_funcs

 

Get all the FIG protein families (FIGfams) with their assigned functions. No input needed, it just returns a list of all the families and their functions.

all_genomes

complete, restrictions, domain

Get a set of genomes. The inputs are a series of constraints - whether the sequence is complete, other restrictions, and a domain of life (Bacteria, Archaea, Eukarya, Viral, Environmental Genome). Output is a list of genome ids. An example use is with the parameters ("complete", undef, "Bacteria") that will return all complete bacterial genomes.

all_subsystem_ classifications

 

Get a list of all the subsystems and their classifications. No input needed, it just returns a list of all the subsystems and their classifications

boundaries_of

locations

Get the boundaries of a feature location. A feature can have multiple locations on a contig (e.g. split locations, introns, etc). This just returns an array of [contig, beginning, end]. You can pass it the output from feature_location directly

CDS_data

families

Get all the pegs in some FIGfams, their functions, and aliases. Input is a tab-separated list of pegs, returns a 3-column comma separated table [peg, Function, Aliases]

CDS_sequences

families

Get the protein sequences for a list of proteins. Input is a tab-separated list of peg, returns a 2-column comma separated table of [peg, sequence]

cluster_by_bbhs

peg

Get the clusters for a peg by bidirectional best hits. Input is a peg, output is two column table of [peg, cluster]

cluster_by_sim

peg

Get the clusters for a peg by similarity. Input is a peg, output is two column table of [peg, cluster]

contigs_of

genomeid

Get a comma-separated list of all the contigs in a genome

contig_ln

genomeid, contig

Get the length of the DNA sequence in a contig in a genome. Input is a genome id and a contig name, return is the length of the contig

coupled_to

peg

Get the pegs that are coupled to any given peg. Input is a peg, output is list of [protein, score] for things that are coupled to this peg

dna_seq

genomeid, location1

Get the DNA sequence for a region in a genome. Input is a genome ID and a location in the form contig_start_stop, output is the DNA sequence in fasta format.

ec_name

EC_number

Get the name for a given E.C. number. Input is an EC number, output is the name

external_calls

peg

Get the annotations for a peg from all other known sources. Input is a peg, output is two column table of [peg, other function]

feature_location

peg

Get the location of a peg on its contig. Input is a peg, output is list of locations on contigs. Usually this will be a single location, but sometimes it can either be more than one region on a contig, or even on multiple contigs. For convenience it is a comma joined list, often you will want to pass that to boundaries_of

fid2dna

peg

Get the DNA sequence for a given protein identifier. Input is a peg, output is the DNA sequence in fasta format.

fids2dna

peg

Get the DNA sequence for a set of protein identifiers. Input is a comma-joined list of pegs, output is the DNA sequence in fasta format.

function_of

peg

Get the functional annotation of a given protein identifier. Input is a peg, output is a function

Genomes

complete, restrictions, domain

Get a set of genomes. The inputs are a series of constraints - whether the sequence is complete, other restrictions, and a domain of life (Bacteria, Archaea, Eukarya, Viral, Environmental Genome). Output is a list of genome ids with the genus species appended. An example use is with the parameters ("complete", undef, "Bacteria") that will return all complete bacterial genomes.

genomes_of

peg

Get the genome(s) that a given protein identifier refers to. Input is a peg, output is a single column table of genomes

genus_species

genomeid

Get the genus and species of a genome identifier. Input is a genome ID, output is the genus and species of the genome

get_ corresponding_ ids

peg

Get the corresponding ids of a peg. These are the identifiers that other databases use. Input is a peg, output is an array of aliases

get_dna_seq

featureid

Retrieve the DNA sequence for a particular feature. Note that this will take a feature id (peg, rna, etc), and return the DNA sequence for that id. There is also a separate method to get the DNA sequence for an arbitrary location on a genome

get_translation

peg

Get the translation (protein sequence) of a peg. Input is a peg, output is translation. (Note that this is a synonym of translation_of);

is_archaeal

genomeid

Test whether an organism is Archaeal. Input is a genome identifier, and output is true or false (or 1 or 0)

is_bacterial

genomeid

Test whether an organism is Bacterial. Input is a genome identifier, and output is true or false (or 1 or 0)

is_eukaryotic

genomeid

Test whether an organism is Eukaryotic. Input is a genome identifier, and output is true or false (or 1 or 0)

is_member_of

sequences

Tries to put a protein sequence in a family. Input is a tab-separated id and sequence, delimited by new lines. The output is a comma-separated 2-column table [your sequence id, FamilyID] if the sequence is placed in a family.

is_prokaryotic

genomeid

Test whether an organism is a Prokaryote. Input is a genome identifier, and output is true or false (or 1 or 0)

list_members

families

Get all the pegs in some FIGfams. The input is a tab-separated list of family IDs, and the output is a two column table of [family id, peg]

pegs_of

genomeid

Get all the protein identifiers associated with a genome. Input is a genome id, output is a list of pegs in that genome

pegs_with_md5

md5

Get the FIG IDs associated with the MD5 sum of a protein sequence. Input is the md5 checksum, output is an array of strings of FIG ids. This should be faster, and more complete, than using aliases or other ways to match protein sequences.

pegs_with_md5_string

md5

Get the FIG IDs associated with the MD5 sum of a protein sequence. Input is the md5 checksum, output is a comma separated list of FIG ids as a single string. This should be faster, and more complete, than using aliases or other ways to match protein sequences.

pinned_region_ data

peg_id, n_pch_pins, n_sims, sim_cutoff, color_sim_ cutoff, sort_by

Input is a FIG (peg) ID and ..., output is the pinned regions data

reaction_to_role

Reaction_number, genomeid

Get a tab-separated list of [subsystem name, functional role, peg, subsystem variant code for that genome] for any given reaction id and genome id. Maps the reaction id to peg, peg to genome, and genome to variant code

replaces

genomeid

If this genome replaces another one (it is a more upto date version), what is the ID of the older genome?

Rnas_of

genomeid

Get all the RNA identifiers associated with a genome. Input is a genome ID, and output is a list (an array) of the RNAs in that genome

search_and_grep

pattern1, pattern2

Search and grep through the database. Input is two patterns, first one is used in search_index, second used to grep the results to restrict to a smaller set. Output is an array of hashes with keys id, organism, otherIds, functionalAssignment, and annotator.

Simple_search

pattern

Search the database. Input is a pattern to search for, output is list of pegs and roles

Sims

peg, maxN, maxP

Retrieve the sims (precomputed BLAST hits) for a given protein sequence. Input is a peg, an optional maximum number of hits (default = 50), and an optional maximum E value (default = 1e-5). The output is a list of sims in modified tab separated (-m 8) format. Additional columns include length of query and database sequences, and method used.

taxonomy_of

genomeid

Returns the taxonomy of a given genomeid

translation_of

peg

Get the translation (protein sequence) of a peg. Input is a peg, output is the protein sequence. (Note that this is a synonym of get_translation).