Skip to main content

Table 1 Elastic Map Reduce commands

From: Cloud computing for comparative genomics

Argument

Description

Input

--stream

Activates the "streaming" module

N/A

--input

File(s) to be processed by EMR

hdfs:///home/hadoop/blast_runner hdfs:///home/hadoop/ortho_runner

--mapper

Name of mapper file

s3n://rsd_bucket/blast_mapper.py s3n://rsd_bucket/ortho_mapper.py

--reducer

None required, reduction done within RSD algorithm

N/A

--cache-archive

Individual symlinks to the executables, genomes,

s3n://rsd_bucket/executables.tar.gz #executables,#genomes, #RSD_standalone,#blastinput,#results

--output

 

hdfs:///home/hadoop/outl

-- jobconf mapred.map.tasks

Number of blast and ortholog calculation processes

= N

-- jobconf mapred.tasktracker.map.tasks.maximum

Total number of task trackers

= 8

--jobconf mapred. task, timeout

Time at which a process was considered a failure and restarted

= 86400000 ms

--jobconf mapred.tasktracker.expiry.interval

Time at which an instance was declared dead.

3600000 (set to be large to avoid instance shut down with long running jobs)

--jobconf mapred.map.tasks.speculative.execution

If true, EMR will speculate that a job is running slow and run the same job in parallel

False (because the time for each genome-vs-genome run varied widely, we elected to set this argument to False to ensure maximal availability of the cluster)

  1. Specific commands passed through the Ruby command line client to the Elastic MapReduce program (EMR) from Amazon Web Services. The inputs specified correspond to (1) the BLAST step and (2) the ortholog computation step of the RSD cloud algorithm. These configurations settings correspond to both the EMR and Hadoop frameworks, with two exceptions: In EMR, a --j parameter can be used to provide an identifier for the entire cluster, useful only in cases where more than one cloud cluster is needed simultaneously. In Hadoop, these commands are passed directly to the streaming.jar program, obviating the need for the --stream argument.