Skip Navigation
Text:
Increase font size
Decrease font size

DISSCO: Direct Imputation of Summary Statistics allowing COvariates

What You Need


How to Run

DISSCO has 3 running modes: calculate, project and impute. The first 2 modes need to be run before the impute mode.

Calculate Mode (optional)

The mode "calculate" is used to calculate Z statistics at typed markers. If Z statistics are already calculated, this step is not necessary.

It has 3 required input files:
  • --sample, sample genotype file, in standard vcf format.
  • --phenotype, file containing phenotypic outcome information, in text file format with a single filed/column. Missing values are coded as ".".
  • --covariates, file containing covariates information with S columns, each for one of the S covariates. Missing values are coded as ".".

It has 2 optional input parameters:
  • --chr, specifying the chromosome to be calculated. Default is 20, which means only markers on chromosome 20 will be processed.
  • --output, specifying the output file name. Default is ZScores.txt.

The output file contains the following 8 columns:
  1. chr: chromosome.
  2. pos: physical coordinate (in bp).
  3. name: marker name.
  4. ref: reference allele.
  5. alt: alternative allele.
  6. betahat: estimated beta coefficient.
  7. se: standard error for betahat.
  8. Zscore: Z statistic.

Command line: java -jar DISSCO.jar calculate --sample Sample.vcf --phenotype Phenotype.txt --chr 20 --covariates Covariates.txt --output ZScores.txt

Output excerpt:
$head ZScores.txt
chr pos name ref alt betahat se Zscore
20 371972 rs6051637 T G 0.012073790113601324 0.0890077780396112 0.13564870823118533
20 384898 rs6084337 A C 0.14329767377235336 0.09780588548181954 1.4651232189804158
20 397928 rs6115906 C A 0.08168299747274155 0.11910046119132928 0.6858327554376272
20 403653 rs6051908 T G 0.09173111764662312 0.09876909891275981 0.9287430851996213

Project Mode

The mode "project" is used to project the sample covariate(s) into reference, based on genotype and covariate data in sample, and genotype data in reference.

It has 3 required input files:
  • --sample, sample genotype file in standard vcf format.
  • --reference, reference genotype file in standard vcf format.
  • --covariates, file containing covariates information with S columns, each for one of the S covariates. Missing values are coded as ".".

It has 4 optional input parameters:
  • --chr, specifying the chromosome to be calculated, default is 20, which means only markers on chromosome 20 will be processed.
  • --WindowLength, specifying the window length (in unit of kb) used to divide the whole chromosome into nonoverlapping blocks. Default is 100.
  • --FlankingLength, specifying the flanking length (in unit of kb) used to exapand a block. Default is 10.
  • --output. output file name. Default is PseudoCovariate.txt.

Output format: The first line prints out the window length, the second line flanking length. The following lines are arranged in blocks with 1+S lines per block, where 1 is for block information and each of the S lines is for one pseudo-covariate, in the same order as in the covariates file.

Command line: java -jar DISSCO.jar project --sample Sample.vcf --reference Reference.vcf --covariates Covariates.txt --WindowLength 100 --FlankingLength 10 --chr 20 --output PseudoCovariates.txt

Output excerpt:
$head -10 PseudoCovariates.txt | cut -c1-200 | cat -n
1 WindowLength (kb):100
2 FlankingLength (kb):10
3 Block No:3:Bp range [300000,399999]
4 1.3081199837873319 1.108541910958523 -1.3254194760196731 -0.040617761048501455 1.406394325758478 1.3291745830988897 -0.3869165062199408 -1.1366930369135175 -0.3475114702110962 -0.784534306297316 -0.61
5 0.4715163769011299 -1.4662623630377143 -0.47981341053232474 1.0978121495576398 1.0621553014338767 -0.06277218445822044 0.40845960503117573 0.6833312241499856 -0.3325149878924578 -0.9032839671339202 0.
6 -0.1589772640543887 -0.9233321284073257 -0.4068839459030936 -1.5593583196247085 -0.7312542415445107 -1.4260339803681694 -0.36521406880581014 -0.11294295585397593 -0.8705035388758369 0.3681921079132952
7 Block No:4:Bp range [400000,499999]
8 0.3439415672688316 -1.1043482190793177 -0.7223790618973769 0.5251197630625875 0.595123962353505 1.5100248016052524 1.4943317861864456 -0.2929142267655638 -2.7684668006570865 -0.12317547294540193 -0.41
9 0.850054806528325 -1.2207874321411947 -0.36253962163591 -1.190214244410283 -1.5339450772992054 -0.23944436636364386 -4.0822355425666945 -0.9503831376716751 -0.5641949456180458 -1.1585183230599252 -0.8
10 1.5793167682021265 1.0936626278231811 -1.8912390589338492 -1.734152038915539 -3.4452017778359174 1.2090059924499907 -0.4512659564714896 -0.31089755055755797 1.0490113704687158 -0.214851421211262 0.532

Impute Mode

The mode "impute" is to impute Z statistics at untyped markers, based on reference, sample Z statistics and pseudo-covariate(s).

It has 3 required input files:
  • --reference, reference genotype file in standard vcf format.
  • --Zstatistics, output file from DISSCO's "calculate" mode.
  • --PseudoCovariates, output file from DISSCO's "project" mode.

It has 5 optional input parameters:
  • --chr, specifying the chromosome to be calculated. Default is 20, which means only markers on chromosome 20 will be processed.
  • --WindowLength, specifying the window length (in unit of kb) used to divide the whole chromosome into nonoverlapping blocks. Default is 100.
  • --FlankingLength, specifying the flanking length (in unit of kb) used to exapand a block. Default is 10.
  • --lambda, for regularization of the inverse of reference correlation matrix. Default is 0.1.
  • --output, specifying the output file name. Default is CompleteZ.txt.

The output file contains the following 8 columns:
  1. chr: chromosome.
  2. pos: physical coordinate (in bp).
  3. name: marker name.
  4. ref: reference allele.
  5. alt: alternative allele.
  6. quality: imputation quality (with quality=1 for a typed marker, 0<=quality<=1 for an untyped markers, and quality=0 for an untyped markers not imputed when there are no typed markers within the block).
  7. Zscore: Z statistics, the same as in input for typed markers and "." for untyped markers not imputed.
  8. type: marker type, 0 for typed markers, 1 for untyped markers imputed, and 2 for untyped markers not imputed, due to no typed markers within the same block.

Command line: java -jar DISSCO.jar impute --Zstatistics ZScores.txt --reference Reference.vcf --PseudoCovariates PseudoCovariates.txt --WindowLength 100 --FlankingLength 10 --chr 20 --lambda 0.1 --output CompleteZScores.txt

Output excerpt:
$ head CompleteZScores.txt
chr pos name ref alt quality zscore type
20 371972 rs6051637 T G 1.0 0.13564870823118533 0
20 378242 rs2295492 G A 0.5381657419118223 0.22365125089212082 1
20 384898 rs6084337 A C 1.0 1.4651232189804158 0
20 385052 rs4815580 A C 0.4406648933155706 1.0048875986218282 1
20 391025 rs6139109 T G 0.38797294709273467 -0.5925764399114839 1
20 397928 rs6115906 C A 1.0 0.6858327554376272 0
20 403653 rs6051908 T G 1.0 0.9287430851996213 0
20 404407 rs6107303 T G 0.9065047897721022 0.7610758354012989 1
20 415205 rs6037639 A C 0.9254864618333011 0.06314083762666539 1

Example

  • For help, type:
    java -jar DISSCO.jar
  • To calculate Z statistics at typed markers:
    java -jar DISSCO.jar calculate --sample Sample.vcf --phenotype Phenotype.txt --chr 20 --covariates Covariates.txt --output ZScores.txt
  • To project covariates:
    java -jar DISSCO.jar project --sample Sample.vcf --reference Reference.vcf --covariates Covariates.txt --WindowLength 100 --FlankingLength 10 --chr 20 --output PseudoCovariates.txt
  • To impute Z statistics at untyped markers:
    java -jar DISSCO.jar impute --Zstatistics ZScores.txt --reference Reference.vcf --PseudoCovariates PseudoCovariates.txt --WindowLength 100 --FlankingLength 10 --chr 20 --lambda 0.1 --output CompleteZScores.txt