Text:
Increase font size
Decrease font size
LAAA: Local Ancestry and Allelic Association
What You Need
How to Run
LAAA, the pipeline for Local Ancestry and Allelic Association, consists of the following four stages:
- Format conversion
- Local ancestry inference
- Post-inference processing
- Two-step testing
Note: Examples used in this tutorial as well as a more detailed README can be found in the downloadable package.
STAGE 1
Convert sample genotypes in plink format and reference haplotypes to HAPMIX required input format and generate parameter file.
The following input files are required at stage 1:
- A ped and a map file containing genotypes from study sample in plink format.
- A haps file containing phased reference haplotypes (e.g., HapMapII or HapMapIII phased haplotype files, or 1000 Genomes phased haplotype files)
- A snps file containing all markers present in the above hap file (e.g., the snps files come with HapMapII phased haplotype files)
- A rate file consisting of the following 5 columns (e.g., the recombination rate files come with HapMapII phased haplotype files)
- 1st column: marker ID
- 2nd column: chromosome
- 3rd column: physical position
- 4th column: recombination rate (cM/Mb)
- 5th column: genetic distance (cM)
Note:
- Genetic distance is required for input genotypes.
- All markers in the reference files must be included in the rate file.
- All markers in teh genotype files must be inclded in the reference and rate files.
The input files are specified using the following command line options:
- chromosome number: -chr
- ped file: -ped
- map file: -map
- haps file from population 1: -hap1
- snps file from population 1: -snps1
- haps file from population 2: -hap2
- snps file from population 2: -snps2
- rate file: -rate
- output directory: -outdir
Example command line (details on this example can be found in the README included in the downloadable package):
./convert_format.pl \
-chr 22 \
-ped example.ped \
-map example.map \
-hap1 CEU.hap \
-snps1 CEU.snps \
-hap2 YRI.hap \
-snps2 YRI.snps \
-rate rate.txt \
-outdir out_hapmix
Output:
Input files required by HAPMIX as well as a parameter file. A folder called “RUN” is created for holding results from HAPMIX. For example,
- out_hapmix/AAgenofile.22
- out_hapmix/AAind.22
- out_hapmix/AAsnpfile.22
- out_hapmix/pop1genofile.22
- out_hapmix/pop1snpfile.22
- out_hapmix/pop2genofile.22
- out_hapmix/pop2snpfile.22
- out_hapmix/rates.22
- out_hapmix/hapmix.par.22
- out_hapmix/RUN
STAGE 2
Stage 2 performs local ancestry inference. Output from stage 1 serves as input here. The user can specify to use task scheduler or not. Default is not. Inference is performed on each individual one by one.
Example command line:
./bin/runHapmix.pl out_hapmix/hapmix.par.22
If you are using a task scheduler, three sub-steps are needed:
- File formatting:
Example command line:
./bin/runHapmix_1_GenerateFiles.pl -parfile out_hapmix/hapmix.par.22
- Local ancestry inference using HAPMIX; each individual local ancestry inference is submitted as a job. In this example, we use 'bsub' as our task scheduler:
Example command line:
./bin/runHapmix_2_admix2.sh AA 22 out_hapmix bsub
- Finalizing:
Example command line:
./bin/runHapmix_3_SummarizeFiles.pl -parfile out_hapmix/hapmix.par.22
Output is generated in out_hapmix/RUN as AA.DIPLOID.*.22
STAGE 3
Stage 3 performs post-inference processing to prepare input for the two-step testing procedure.
Input:
- path to HAPMIX output
- chromosome number
- sample size
- output path
Example command line:
# Assume sample size is 20 and output directory is out_testing
Rscript reformat.hapmix.output.R out_hapmix 22 20 out_testing
Output are three files containing matrix of number of minor alleles, number of ancestry-specific alleles and number of ancestry-specific minor alleles. For example:
- AA.geno.22
- AFR.allele.22
- AFR.minor.22
STAGE 4
Perform association analysis using two step testing procedure. Additional input of phenotype and covariates files are required.
Input:
- path to the three matrix generated in step 3
- chromosome number
- traits: two columns; 1st column IID; 2nd column value of trait
- covariates: 1st column: IID; the following columns: value of covariates
- path to association output
Example command line:
Rscript association_2wayAdmixed.R out_testing 22 traits.txt covariates.txt out_testing
The output file is the association results based on two step testing procedure, containing the following 14 columns:
- SNP ID
- P value from jointly testing allele effect, ancestry effect and effect heterogeneity
- Effect size of allele effect
- Standard error of allele effect
- Test statistics of allele effect
- P value of allele effect
- Effect size of ancestry effect
- Standard error of ancestry effect
- Test statistics of ancestry effect
- P value of ancestry effect
- Effect size of ancestry specific minor allele
- Standard error of ancestry specific minor allele
- Test statistics of ancestry specific minor allele
- P value of ancestry specific minor allele
- sample size