Accessing the SEED Genome Databases via Web Services API: Tools for Programmers

Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A

doi:10.1186/1471-2105-11-319

BMC Bioinformatics

Table 2 Methods, input and output parameters in the SEED Web services API

From: Accessing the SEED Genome Databases via Web Services API: Tools for Programmers

*Method Name*	*Parameters & Order*	*Description*
abstract_coupled_to	peg	Get the pegs that may be coupled to this peg through abstract coupling. Input is a peg, output is list of [protein, score] for things that are coupled to this peg
Adjacent	pegs	Retrieve the set of pegs in order along the chromosome. Input is a comma separated list of pegs, and output is the pegs in order along the genome.
alias2fig	alias	Get the FIG ID(s) (peg) for a given external identifier. Input is an identifier used by another database, output is a list of our identifiers. Note that an alias can refer to more than one protein since the mapping is done via protein sequence.
aliases_of	peg	Get the aliases of a peg. These are the identifiers that other databases use. Input is a peg, output is an array of aliases
ali_to_seq	alias	Retrieve the protein sequence for a given identifier. Input is an alias, output is a sequence
all_families		Get all the FIG protein families (FIGfams). No input needed, it just returns a list of all families
all_families_ with_funcs		Get all the FIG protein families (FIGfams) with their assigned functions. No input needed, it just returns a list of all the families and their functions.
all_genomes	complete, restrictions, domain	Get a set of genomes. The inputs are a series of constraints - whether the sequence is complete, other restrictions, and a domain of life (Bacteria, Archaea, Eukarya, Viral, Environmental Genome). Output is a list of genome ids. An example use is with the parameters ("complete", undef, "Bacteria") that will return all complete bacterial genomes.
all_subsystem_ classifications		Get a list of all the subsystems and their classifications. No input needed, it just returns a list of all the subsystems and their classifications
boundaries_of	locations	Get the boundaries of a feature location. A feature can have multiple locations on a contig (e.g. split locations, introns, etc). This just returns an array of [contig, beginning, end]. You can pass it the output from feature_location directly
CDS_data	families	Get all the pegs in some FIGfams, their functions, and aliases. Input is a tab-separated list of pegs, returns a 3-column comma separated table [peg, Function, Aliases]
CDS_sequences	families	Get the protein sequences for a list of proteins. Input is a tab-separated list of peg, returns a 2-column comma separated table of [peg, sequence]
cluster_by_bbhs	peg	Get the clusters for a peg by bidirectional best hits. Input is a peg, output is two column table of [peg, cluster]
cluster_by_sim	peg	Get the clusters for a peg by similarity. Input is a peg, output is two column table of [peg, cluster]
contigs_of	genomeid	Get a comma-separated list of all the contigs in a genome
contig_ln	genomeid, contig	Get the length of the DNA sequence in a contig in a genome. Input is a genome id and a contig name, return is the length of the contig
coupled_to	peg	Get the pegs that are coupled to any given peg. Input is a peg, output is list of [protein, score] for things that are coupled to this peg
dna_seq	genomeid, location1	Get the DNA sequence for a region in a genome. Input is a genome ID and a location in the form contig_start_stop, output is the DNA sequence in fasta format.
ec_name	EC_number	Get the name for a given E.C. number. Input is an EC number, output is the name
external_calls	peg	Get the annotations for a peg from all other known sources. Input is a peg, output is two column table of [peg, other function]
feature_location	peg	Get the location of a peg on its contig. Input is a peg, output is list of locations on contigs. Usually this will be a single location, but sometimes it can either be more than one region on a contig, or even on multiple contigs. For convenience it is a comma joined list, often you will want to pass that to boundaries_of
fid2dna	peg	Get the DNA sequence for a given protein identifier. Input is a peg, output is the DNA sequence in fasta format.
fids2dna	peg	Get the DNA sequence for a set of protein identifiers. Input is a comma-joined list of pegs, output is the DNA sequence in fasta format.
function_of	peg	Get the functional annotation of a given protein identifier. Input is a peg, output is a function
Genomes	complete, restrictions, domain	Get a set of genomes. The inputs are a series of constraints - whether the sequence is complete, other restrictions, and a domain of life (Bacteria, Archaea, Eukarya, Viral, Environmental Genome). Output is a list of genome ids with the genus species appended. An example use is with the parameters ("complete", undef, "Bacteria") that will return all complete bacterial genomes.
genomes_of	peg	Get the genome(s) that a given protein identifier refers to. Input is a peg, output is a single column table of genomes
genus_species	genomeid	Get the genus and species of a genome identifier. Input is a genome ID, output is the genus and species of the genome
get_ corresponding_ ids	peg	Get the corresponding ids of a peg. These are the identifiers that other databases use. Input is a peg, output is an array of aliases
get_dna_seq	featureid	Retrieve the DNA sequence for a particular feature. Note that this will take a feature id (peg, rna, etc), and return the DNA sequence for that id. There is also a separate method to get the DNA sequence for an arbitrary location on a genome
get_translation	peg	Get the translation (protein sequence) of a peg. Input is a peg, output is translation. (Note that this is a synonym of translation_of);
is_archaeal	genomeid	Test whether an organism is Archaeal. Input is a genome identifier, and output is true or false (or 1 or 0)
is_bacterial	genomeid	Test whether an organism is Bacterial. Input is a genome identifier, and output is true or false (or 1 or 0)
is_eukaryotic	genomeid	Test whether an organism is Eukaryotic. Input is a genome identifier, and output is true or false (or 1 or 0)
is_member_of	sequences	Tries to put a protein sequence in a family. Input is a tab-separated id and sequence, delimited by new lines. The output is a comma-separated 2-column table [your sequence id, FamilyID] if the sequence is placed in a family.
is_prokaryotic	genomeid	Test whether an organism is a Prokaryote. Input is a genome identifier, and output is true or false (or 1 or 0)
list_members	families	Get all the pegs in some FIGfams. The input is a tab-separated list of family IDs, and the output is a two column table of [family id, peg]
pegs_of	genomeid	Get all the protein identifiers associated with a genome. Input is a genome id, output is a list of pegs in that genome
pegs_with_md5	md5	Get the FIG IDs associated with the MD5 sum of a protein sequence. Input is the md5 checksum, output is an array of strings of FIG ids. This should be faster, and more complete, than using aliases or other ways to match protein sequences.
pegs_with_md5_string	md5	Get the FIG IDs associated with the MD5 sum of a protein sequence. Input is the md5 checksum, output is a comma separated list of FIG ids as a single string. This should be faster, and more complete, than using aliases or other ways to match protein sequences.
pinned_region_ data	peg_id, n_pch_pins, n_sims, sim_cutoff, color_sim_ cutoff, sort_by	Input is a FIG (peg) ID and ..., output is the pinned regions data
reaction_to_role	Reaction_number, genomeid	Get a tab-separated list of [subsystem name, functional role, peg, subsystem variant code for that genome] for any given reaction id and genome id. Maps the reaction id to peg, peg to genome, and genome to variant code
replaces	genomeid	If this genome replaces another one (it is a more upto date version), what is the ID of the older genome?
Rnas_of	genomeid	Get all the RNA identifiers associated with a genome. Input is a genome ID, and output is a list (an array) of the RNAs in that genome
search_and_grep	pattern1, pattern2	Search and grep through the database. Input is two patterns, first one is used in search_index, second used to grep the results to restrict to a smaller set. Output is an array of hashes with keys id, organism, otherIds, functionalAssignment, and annotator.
Simple_search	pattern	Search the database. Input is a pattern to search for, output is list of pegs and roles
Sims	peg, maxN, maxP	Retrieve the sims (precomputed BLAST hits) for a given protein sequence. Input is a peg, an optional maximum number of hits (default = 50), and an optional maximum E value (default = 1e-5). The output is a list of sims in modified tab separated (-m 8) format. Additional columns include length of query and database sequences, and method used.
taxonomy_of	genomeid	Returns the taxonomy of a given genomeid
translation_of	peg	Get the translation (protein sequence) of a peg. Input is a peg, output is the protein sequence. (Note that this is a synonym of get_translation).

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com