Text:
 
				Increase font size
 
				Decrease font size
 
                
  
  ShotGun and DesignPlanner Tutorial
  
  What You Need
    
  Installation
    
    The current beta release of ShotGun is static executable and is ready to use (no need for installation).
    You will only need to extract the files by using the following command line: 
          tar -zxvf ShotGun.*.tgz
     
    
  How to Run
    ShotGunOnly     
    
ShotGun+DesignPlanner
     
             
      
     ShotGun Only
        Options     
        
Command     
        
Documentation     
             
         Available Options: 
             
        
             
      
         Command Line Example: 
            Under csh:
            
  
              set d = ./2012-03-29_ShotGun
  
              set hf = $d/data/canHap/chrs44880.1.canHap.gz
  
              set pf = $d/data/simp_pos/EU.1.pos
  
              $d/bin/ShotGun -h $hf -p $pf -n 60 -d 6 -o n60.depth6
            
            Under bash:
            
  
              d=./2012-03-29_ShotGun
  
              hf=$d/data/canHap/chrs44880.1.canHap.gz
  
              pf=$d/data/simp_pos/EU.1.pos
  
              $d/bin/ShotGun -h $hf -p $pf -n 60 -d 6 -o n60.depth6
            
            The command line above generates average 6X sequencing data for 60 individuals.
             
      
         Full Documentation for the options: 
          
            - --hapfile[-h]          : Required input haplotype file 
       example w/ 20 chromosomes at 50 markers
             - --posfile[-p]          : Required input position file 
       example for 50 markers
             - --errorfile            : Optional input cycle-specific error file 
       example for 35-base reads
             - --seed                 : Random seed, integer, default = 12345
            
 - --readlength[-l]       : Read Length, positive integer, default = 100
            
 - --minwts               : Recommend using --alpha (below) instead.
            
 - --BeginningPosition[b] : Start Position, positive number, default = 1.0
            
 - --EndingPosition[e]    : End Position, positive number, default = 100000.0
            
 - --sequencableProportion: Sequencable Proportion, default = 1.0
            
 - --alpha[a]             : False Positive Rate, default = 1e-3 
       1e-3 corresponds to 100 false positives in a 100Kb region
             - --shape[s]             : Shape Parameter for Gamma Mixture.
       Default = 4.0, also see --negativeBinomial.
                                         
       Smaller value => more severe over-dispersion compared with Poisson.
             - --numIndividuals[n]    : Number of individuals to be sequenced 
       positive integer, default = 100
             - --depth[d]             : Average Sequencing Depth per Individual.
            
 - --erate                : Average Sequencing Error Rate, ignored if --errorfile provided.
            
 - --allbps               : Output Information for ALL Base Pairs, 
       regardless of evidence for being polomorphic.
             - --negativeBinomial     : Depths follow Negative Binomial distribution. 
       Implemented as a Gamma mixture
                                         of Poisson's. 
       Shape parameter of Gamma specified by --shape.
             - --trueped              : Output true genotype information at detected markers.
            
 - --numReplications      : #Replicates to determine polymorphism score threshold.
            
 - --prefix               : Output prefix.
          
 
             
      
     ShotGun+DesignPlanner
        CommandLine     
        
Options     
        
Output     
        
Interpretation     
        
OutputFields
             
         Command Line Example: 
 
            Under csh:
            
  
              set d = ./2012-03-29_ShotGun
  
              cd $d
  
              ./DesignPlanner.csh 0
            
            Under bash:
            
  
              d=./2012-03-29_ShotGun
  
              cd $d
  
              ./DesignPlanner.csh 0
            
                 The command line above will generate effective sample size estimate for each MAF category for the default setting: 
            30 individuals sequenced at depth 5X (you can change the parameters by changing the 26th and 28th line in the shell script), 
            with a ShotGun false positve rate of ~1/1,000 (that is, ~100 false positive
            SNPs in each 100Kb region). 
                 Note that the 
0 at the end of the command line is for setting warning option. 
            By default warning = 1, which means the program will STOP for warning.
                 See example 
screen output after running a similar command line (60 individuals at depth 2X). 
             
      
         Options: 
          Options are specified in DesignPlanner.csh. You will need to modify the codes inside the shell script, in the "SET PARAMETERS" 
          sections at the beginning of the shell script. The following options are available:
          
            - n    : #individuals
            
 - t    : depth
            
 - alpha: false positive rate (see ShotGunDocumentation)
            
 - erate: Average Sequencing Error Rate (see ShotGunDocumentation)
            
 - st   : --states for thunder
                       
       larger is better at the cost for quadratically increased computing time
             - rd   : --rounds for thunder
                       
       larger is better at the cost for linearly increased computing time
           
             
      
         Example Output: 
             
          
             
      
         Example Output Interpretation: 
          
              Selected interpretation of the output: using the design of 60 individuals sequenced at an average depth of 2X, While only a small percentage of rare markers (e.g., 
              19 / 897 = ~2% of markers with MAF 0.001-0.002) can be detected, almost all the common markers (e.g., 1210 / 1241 = ~97.5% of markers with MAF > 0.05) can be detected. 
              In addition, the quality of genotype calls (measured by information content dosage r2 or effective sample size) also tends to improve with increasing MAF. For 
              example, the effective sample size is only ~2/3 of the sequenced (41.4 compared with the 60 sequenced) for markers with MAF 0.001-0.002; and the effective sample size 
              is ~5/6 for markers with MAF > 0.05 (50.5 compared with 60).
              Note that the pattern fluctuation is caused by chance due to small number of markers involved in some MAF categories.
          
             
      
         Output Fields:
              
                - MAF-LB     : Minor Allele Frequency (MAF) Lower Bound
                
 - MAF-UB     : MAF Upper Bound
                
 - #SNPs      : Number of SNPs with "Population"a MAF in the Category 
                
 - #Detected  : Number of SNPs Detected by the Designb
                
 - avgDoseR2  : Average Dosagec r2d for the Detected SNPs
                
 - effectiveN : Effective Sample Size = n * avgDoseR2
              
 
             
              
              a: population = 45K chromosomes
              b: by default, the design is 60 individuals sequenced at depth 2X
              c: Dosage, ranging continuously from 0 to 2, is the fractional count of a designated allele.
              d: Dosage r2, a marker level quality metric for information content, is the 
                                                squared Pearson correlation between the dosages and true genotypes.