2019独角兽企业重金招聘Python工程师标准>>>

Running Trinity in multiple steps

Trinity (trinityrnaseq.sourceforge.net) is a software package combining three independent software modules (Inchworm, Chrysalis, Butterfly) to process large volumes of RNA-seq reads. Running Trinity from beginning to end on large data sets may exceed the walltime limit for a single job. Trinity provides a mechanism to run the workflow in four separate steps. Each step may be run as its own job, providing a workaround for the single job walltime limit. This page describes how to run Trinity in this manner under the SLURM scheduler and provides example submit scripts.

Generally, the same Trinity command is run for each step, aside from one option that determines how far Trinity will progress before stopping. On the last step, the Trinity command is run as normal. For example,

# Step 1

Trinity.pl <options> --no_run_chrysalis

# Step 2

Trinity.pl <options> --no_run_quantifygraph

# Step 3

Trinity.pl <options> --no_run_butterfly

# Step 4

Trinity.pl <options>

SLURM submit scripts that will request 16 CPUs and 200GB of RAM for each step are given as examples.

trinity_step1.submit

#!/bin/sh

#SBATCH --job-name=trinity_step1

#SBATCH --time=168:00:00

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=16

#SBATCH --mem=200gb

#SBATCH --output=trinity_step1.stdout

#SBATCH --error=trinity_step1.stderr

module load trinity/r2013-02-25bowtie/1.0.0

Trinity.pl --output trinity_out --seqType fq --JM 200G --left leftreads.fastq \

--right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE --inchworm_cpu $SLURM_NTASKS_PER_NODE \

--bflyCPU $SLURM_NTASKS_PER_NODE --no_run_chrysalis

trinity_step2.submit

#!/bin/sh

#SBATCH --job-name=trinity_step2

#SBATCH --time=168:00:00

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=16

#SBATCH --mem=200gb

#SBATCH --output=trinity_step2.stdout

#SBATCH --error=trinity_step2.stderr

module load trinity/r2013-02-25bowtie/1.0.0

Trinity.pl --output trinity_out --seqType fq --JM 200G --left leftreads.fastq \

--right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE --inchworm_cpu $SLURM_NTASKS_PER_NODE \

--bflyCPU $SLURM_NTASKS_PER_NODE --no_run_quantifygraph

trinity_step3.submit

#!/bin/sh

#SBATCH --job-name=trinity_step3

#SBATCH --time=168:00:00

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=16

#SBATCH --mem=200gb

#SBATCH --output=trinity_step3.stdout

#SBATCH --error=trinity_step3.stderr

module load trinity/r2013-02-25bowtie/1.0.0

Trinity.pl --output trinity_out --seqType fq --JM 200G --left leftreads.fastq \

--right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE --inchworm_cpu $SLURM_NTASKS_PER_NODE \

--bflyCPU $SLURM_NTASKS_PER_NODE --no_run_butterfly

trinity_step4.submit

#!/bin/sh

#SBATCH --job-name=trinity_step4

#SBATCH --time=168:00:00

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=16

#SBATCH --mem=200gb

#SBATCH --output=trinity_step4.stdout

#SBATCH --error=trinity_step4.stderr

module load trinity/r2013-02-25bowtie/1.0.0

Trinity.pl --output trinity_out --seqType fq --JM 200G --left leftreads.fastq \

--right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE --inchworm_cpu $SLURM_NTASKS_PER_NODE \

--bflyCPU $SLURM_NTASKS_PER_NODE

The job dependency feature of SLURM can be used to run each step sequentially as the previous step completes. All four jobs can be submitted at once and they will run in the proper order without needing any further interaction from the user. The job ID of each step is used in the submit command for the next to order the jobs. Assuming the four scripts above are saved in the working directory with the input dataset, they would be submitted as follows:

Example Trinity submission

$ sbatch trinity_step1.submit

Submitted batch job 366910

$ sbatch -d afterok:366910 trinity_step2.submit

Submitted batch job 366911

$ sbatch -d afterok:366911 trinity_step3.submit

Submitted batch job 366912

$ sbatch -d afterok:366912 trinity_step4.submit

Submitted batch job 366913

The -d afterok option instructs SLURM to only run the submitted job if the existing specified job completes successfully. If for some reason Trinity exits with an error code for one step, SLURM will not run the next step.

Tips: Check Command

1.Check the status of your job:

Example: Check Your Job Status

$ squeue -u <username>

Output:

JobID                        JobName      State ExitCode               Start                 End    Elapsed
------------ ------------------------------ ---------- -------- ------------------- ------------------- ----------
[<username>@login.tusker ~]$ squeue -u <username>JOBID PARTITION     NAME     USER    ST       TIME    NODES  NODELIST(REASON)426290     batch trinity_ <username>  PD       0:00      1   (Dependency)426291     batch trinity_ <username>  PD       0:00      1   (Dependency)426289     batch trinity_ <username>   R    10:33:59     1   c2417

2.Check a specific JOB,such as JOBID=426289

Example to check JOBID:426289

$scontrol show job426289

[<username>@login.tusker ~]$ scontrol show job 426289
JobId=426289 Name=trinity_step2UserId=<username>(3557) GroupId=<groupname>(11156)Priority=30208 Account=<groupname> QOS=normalJobState=RUNNING Reason=None Dependency=(null)Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0RunTime=10:38:38 TimeLimit=7-00:00:00 TimeMin=N/ASubmitTime=2013-08-19T15:12:44 EligibleTime=2013-08-21T00:36:51StartTime=2013-08-21T00:37:09 EndTime=2013-08-28T00:37:09PreemptTime=None SuspendTime=None SecsPreSuspend=0Partition=batch AllocNode:Sid=login:62036ReqNodeList=(null) ExcNodeList=(null)NodeList=c2417BatchHost=c2417NumNodes=1 NumCPUs=16 CPUs/Task=1 ReqS:C:T=*:*:*MinCPUsNode=16 MinMemoryNode=250G MinTmpDiskNode=0Features=(null) Gres=(null) Reservation=(null)Shared=OK Contiguous=0 Licenses=(null) Network=(null)Command=/lustre/work/entomology/hwang4/WCR_RNAseq_2013/Fallarmyworm/trinity_step2.submitWorkDir=/lustre/work/entomology/hwang4/WCR_RNAseq_2013/Fallarmyworm

3.Check your job history after a specific date.  For example, all jobs run since 08-14-2013.

Example: Check Your Job History After A Specific Date

$ sacct -u <username> -S081413-o JobId,JobName%30,State,ExitCode,Start,End,Elapse

Output:

JobID                        JobName      State ExitCode               Start                 End    Elapsed
------------ ------------------------------ ---------- -------- ------------------- ------------------- ----------
382339                        trinity_step1  COMPLETED      0:0 2013-08-13T09:47:18 2013-08-13T22:03:39   12:16:21
382339.batc+                          batch  COMPLETED      0:0 2013-08-13T09:47:18 2013-08-13T22:03:39   12:16:21
382846                        trinity_step2 CANCELLED+      0:0 2013-08-13T22:03:39 2013-08-14T15:40:45   17:37:06
426288                        trinity_step1    RUNNING      0:0 2013-08-20T15:24:23             Unknown   00:14:21
426289                        trinity_step2    PENDING      0:0             Unknown             Unknown   00:00:00
426290                        trinity_step3    PENDING      0:0             Unknown             Unknown   00:00:00
426291                        trinity_step4    PENDING      0:0             Unknown             Unknown   00:00:00

转载于:https://my.oschina.net/u/727594/blog/191124