Simulation Details underlying AbCD

Results presented by AbCD were generated using the following simulation protocol:

(1) We first randomly simulated 10 1Mb regions using the cosi bestfit models. Table 1 in the cosi accompanying publication (Schaffner et al 2005 Genome Research) shows the parameters calibrated for the bestfit models which mimic the level of sequence variation, pattern of linkage disequilibrium, recombination rates and demographical history of four major populations: AA (African American), AF (African), AN (Asian), and EU (European). For each region, we simulated 450,000 chromosomes.

(2) Within each region, we then randomly picked 2n chromosomes from the population of 450,000 to form ndiploid individuals, where n is referred to as sample size or number of individuals sequenced in the subsequent text.

(3) From the chromosomes picked in (2), we used ShotGun to generate short reads mimicking those from the Illumina Solexa technologies for 10 pre-specified sequencing depths (d = 0.5X, 2X, 4X, 6X, 8X, 10X, 15X, 20X, 25X and 30X).

(4) We then performed LD-based genotyping calling using thunder on the short reads generated in (3).

(5) Finally, for each design (one set ofn, d, and ethnicity), we summarized several key statistics by taking an average across the ten simulated regions for each of the following seven MAF categories: (i) 0-0.1%; (ii) 0.1-0.2%; (iii) 0.2-0.5%; (iv) 0.5-1%; (v) 1-2%; (vi) 2-5%; (vii) 5-50%. The MAF-specific statistics summarized are: (a) Number of polymorphisms in the population of 450,000 chromosomes; (b) Number of variants segregating in the sample of nsequenced individuals; (c) Percent of all variants (that is, (a)) detected which is upper bounded by (b) divided by (a); (d) Average information content which is measured by dosage r2 the squared Pearson correlation between imputed dosages and their corresponding true genotypes; and (e) Effective sample size which is the multiplication of n and average information content.

   (*) The same simulation protocol was adopted in our published thunder paper.
   (*) The above steps (2)-(5) are implemented in our DesignPlanner C-shell script wrapper, which, together with all the software used and 10 regions each of 100Kb length simulated by cosi, can be downloaded via the ShotGun Download Page.