Skip Navigation
Text:
Increase font size
Decrease font size

ShotGun and DesignPlanner Tutorial


What You Need


Installation

    The current beta release of ShotGun is static executable and is ready to use (no need for installation). You will only need to extract the files by using the following command line:
         tar -zxvf ShotGun.*.tgz

How to Run

ShotGunOnly      ShotGun+DesignPlanner

ShotGun Only

Options      Command      Documentation     

Available Options:



Command Line Example:

Under csh:
   set d = ./2012-03-29_ShotGun
   set hf = $d/data/canHap/chrs44880.1.canHap.gz
   set pf = $d/data/simp_pos/EU.1.pos
   $d/bin/ShotGun -h $hf -p $pf -n 60 -d 6 -o n60.depth6

Under bash:
   d=./2012-03-29_ShotGun
   hf=$d/data/canHap/chrs44880.1.canHap.gz
   pf=$d/data/simp_pos/EU.1.pos
   $d/bin/ShotGun -h $hf -p $pf -n 60 -d 6 -o n60.depth6

The command line above generates average 6X sequencing data for 60 individuals.


Full Documentation for the options:

  • --hapfile[-h] : Required input haplotype file
           example w/ 20 chromosomes at 50 markers
  • --posfile[-p] : Required input position file
           example for 50 markers
  • --errorfile : Optional input cycle-specific error file
           example for 35-base reads
  • --seed : Random seed, integer, default = 12345
  • --readlength[-l] : Read Length, positive integer, default = 100
  • --minwts : Recommend using --alpha (below) instead.
  • --BeginningPosition[b] : Start Position, positive number, default = 1.0
  • --EndingPosition[e] : End Position, positive number, default = 100000.0
  • --sequencableProportion: Sequencable Proportion, default = 1.0
  • --alpha[a] : False Positive Rate, default = 1e-3
           1e-3 corresponds to 100 false positives in a 100Kb region
  • --shape[s] : Shape Parameter for Gamma Mixture.
           Default = 4.0, also see --negativeBinomial.
           Smaller value => more severe over-dispersion compared with Poisson.
  • --numIndividuals[n] : Number of individuals to be sequenced
           positive integer, default = 100
  • --depth[d] : Average Sequencing Depth per Individual.
  • --erate : Average Sequencing Error Rate, ignored if --errorfile provided.
  • --allbps : Output Information for ALL Base Pairs,
           regardless of evidence for being polomorphic.
  • --negativeBinomial : Depths follow Negative Binomial distribution.
           Implemented as a Gamma mixture of Poisson's.
           Shape parameter of Gamma specified by --shape.
  • --trueped : Output true genotype information at detected markers.
  • --numReplications : #Replicates to determine polymorphism score threshold.
  • --prefix : Output prefix.

ShotGun+DesignPlanner

CommandLine      Options      Output      Interpretation      OutputFields

Command Line Example:

Under csh:
   set d = ./2012-03-29_ShotGun
   cd $d
   ./DesignPlanner.csh 0

Under bash:
   d=./2012-03-29_ShotGun
   cd $d
   ./DesignPlanner.csh 0

     The command line above will generate effective sample size estimate for each MAF category for the default setting: 30 individuals sequenced at depth 5X (you can change the parameters by changing the 26th and 28th line in the shell script), with a ShotGun false positve rate of ~1/1,000 (that is, ~100 false positive SNPs in each 100Kb region).
     Note that the 0 at the end of the command line is for setting warning option. By default warning = 1, which means the program will STOP for warning.
     See example screen output after running a similar command line (60 individuals at depth 2X).

Options:

Options are specified in DesignPlanner.csh. You will need to modify the codes inside the shell script, in the "SET PARAMETERS" sections at the beginning of the shell script. The following options are available:
  • n : #individuals
  • t : depth
  • alpha: false positive rate (see ShotGunDocumentation)
  • erate: Average Sequencing Error Rate (see ShotGunDocumentation)
  • st : --states for thunder
           larger is better at the cost for quadratically increased computing time
  • rd : --rounds for thunder
           larger is better at the cost for linearly increased computing time

Example Output:



Example Output Interpretation:

    Selected interpretation of the output: using the design of 60 individuals sequenced at an average depth of 2X, While only a small percentage of rare markers (e.g., 19 / 897 = ~2% of markers with MAF 0.001-0.002) can be detected, almost all the common markers (e.g., 1210 / 1241 = ~97.5% of markers with MAF > 0.05) can be detected. In addition, the quality of genotype calls (measured by information content dosage r2 or effective sample size) also tends to improve with increasing MAF. For example, the effective sample size is only ~2/3 of the sequenced (41.4 compared with the 60 sequenced) for markers with MAF 0.001-0.002; and the effective sample size is ~5/6 for markers with MAF > 0.05 (50.5 compared with 60).
    Note that the pattern fluctuation is caused by chance due to small number of markers involved in some MAF categories.

Output Fields:

  1. MAF-LB : Minor Allele Frequency (MAF) Lower Bound
  2. MAF-UB : MAF Upper Bound
  3. #SNPs : Number of SNPs with "Population"a MAF in the Category
  4. #Detected : Number of SNPs Detected by the Designb
  5. avgDoseR2 : Average Dosagec r2d for the Detected SNPs
  6. effectiveN : Effective Sample Size = n * avgDoseR2

    a: population = 45K chromosomes
    b: by default, the design is 60 individuals sequenced at depth 2X
    c: Dosage, ranging continuously from 0 to 2, is the fractional count of a designated allele.
    d: Dosage r2, a marker level quality metric for information content, is the squared Pearson correlation between the dosages and true genotypes.