Text:
Increase font size
Decrease font size
TOPMed5b Based Imputation: Post Imputation Filtering and Quality Evaluation
Background Information
The example below assumes the imputation output file is in vcf format. You can fine an example output file ("imp_out.vcf.gz") inside the tarball file from
our download page.
Post Imputation Filtering
Post imputation quality filtering is performed based usually according to imputation quality metric, estimated r2. Sometimes, people also filter based on minor allele frequency (MAF). In vcf format files, estimated r2 and MAF are usually both presented in the 8th INFO column (for two examples, "MAF=0.13402;R2=0.98195", "MAF=0.00005;R2=0.00193"). Please check out the script
01.a.post-imputation-filter.sh inside the tarball file from
our download page.
Imputation Quality Evaluation
There are multiple metrics to assess imputation quality. We group them into two categories: estimated and true quality metrics. Estimated quality metrics can be obtained from running imputation software alone, without the need to have true genotypes. In contrast, true quality metrics can only be obtained in the presence of true genotypes.
Among the estimated quality metrics, we and others have been recommending the estimated r2, which is used in the above section for post-imputation quality filtering. Estimated r2 ranges from 0 to 1. The larger estimated r2 is, the better imputation quality is belived to be. For common variants, estimated r2 should skew heavily towards the well-imputed end (i.e., close to 1). Run the script
02.estimated-r2.dist.sh inside the tarball file from
our download page to see results from a real data example.
Among the true imputation quality metrics, we have been recommending true r2, the squared Pearson correlation between imputed dosages and true genotypes. One can use our
doseR2 program for this purpose.
For more information, please refer to
MaCH wiki page.