LAAA: Local Ancestry and Allelic Association

What You Need

Download LAAA pipeline

How to Run

LAAA, the pipeline for Local Ancestry and Allelic Association, consists of the following four stages:

Format conversion
Local ancestry inference
Post-inference processing
Two-step testing

Note: Examples used in this tutorial as well as a more detailed README can be found in the downloadable package.

STAGE 1

Convert sample genotypes in plink format and reference haplotypes to HAPMIX required input format and generate parameter file.
The following input files are required at stage 1:

A ped and a map file containing genotypes from study sample in plink format.
A haps file containing phased reference haplotypes (e.g., HapMapII or HapMapIII phased haplotype files, or 1000 Genomes phased haplotype files)
A snps file containing all markers present in the above hap file (e.g., the snps files come with HapMapII phased haplotype files)
A rate file consisting of the following 5 columns (e.g., the recombination rate files come with HapMapII phased haplotype files)
- 1st column: marker ID
- 2nd column: chromosome
- 3rd column: physical position
- 4th column: recombination rate (cM/Mb)
- 5th column: genetic distance (cM)

Note:

Genetic distance is required for input genotypes.
All markers in the reference files must be included in the rate file.
All markers in teh genotype files must be inclded in the reference and rate files.

The input files are specified using the following command line options:

chromosome number: -chr
ped file: -ped
map file: -map
haps file from population 1: -hap1
snps file from population 1: -snps1
haps file from population 2: -hap2
snps file from population 2: -snps2
rate file: -rate
output directory: -outdir

Example command line (details on this example can be found in the README included in the downloadable package):
./convert_format.pl \
-chr 22 \
-ped example.ped \
-map example.map \
-hap1 CEU.hap \
-snps1 CEU.snps \
-hap2 YRI.hap \
-snps2 YRI.snps \
-rate rate.txt \
-outdir out_hapmix

Output:
Input files required by HAPMIX as well as a parameter file. A folder called “RUN” is created for holding results from HAPMIX. For example,

out_hapmix/AAgenofile.22
out_hapmix/AAind.22
out_hapmix/AAsnpfile.22
out_hapmix/pop1genofile.22
out_hapmix/pop1snpfile.22
out_hapmix/pop2genofile.22
out_hapmix/pop2snpfile.22
out_hapmix/rates.22
out_hapmix/hapmix.par.22
out_hapmix/RUN

STAGE 2

Stage 2 performs local ancestry inference. Output from stage 1 serves as input here. The user can specify to use task scheduler or not. Default is not. Inference is performed on each individual one by one.

Example command line:
./bin/runHapmix.pl out_hapmix/hapmix.par.22

If you are using a task scheduler, three sub-steps are needed:

File formatting:
Example command line:
./bin/runHapmix_1_GenerateFiles.pl -parfile out_hapmix/hapmix.par.22
Local ancestry inference using HAPMIX; each individual local ancestry inference is submitted as a job. In this example, we use 'bsub' as our task scheduler:
Example command line:
./bin/runHapmix_2_admix2.sh AA 22 out_hapmix bsub
Finalizing:
Example command line:
./bin/runHapmix_3_SummarizeFiles.pl -parfile out_hapmix/hapmix.par.22

Output is generated in out_hapmix/RUN as AA.DIPLOID.*.22

STAGE 3

Stage 3 performs post-inference processing to prepare input for the two-step testing procedure.

Input:

path to HAPMIX output
chromosome number
sample size
output path

Example command line:
# Assume sample size is 20 and output directory is out_testing
Rscript reformat.hapmix.output.R out_hapmix 22 20 out_testing

Output are three files containing matrix of number of minor alleles, number of ancestry-specific alleles and number of ancestry-specific minor alleles. For example:

AA.geno.22
AFR.allele.22
AFR.minor.22

STAGE 4

Perform association analysis using two step testing procedure. Additional input of phenotype and covariates files are required.

Input:

path to the three matrix generated in step 3
chromosome number
traits: two columns; 1st column IID; 2nd column value of trait
covariates: 1st column: IID; the following columns: value of covariates
path to association output

Example command line:
Rscript association_2wayAdmixed.R out_testing 22 traits.txt covariates.txt out_testing

The output file is the association results based on two step testing procedure, containing the following 14 columns:

SNP ID
P value from jointly testing allele effect, ancestry effect and effect heterogeneity
Effect size of allele effect
Standard error of allele effect
Test statistics of allele effect
P value of allele effect
Effect size of ancestry effect
Standard error of ancestry effect
Test statistics of ancestry effect
P value of ancestry effect
Effect size of ancestry specific minor allele
Standard error of ancestry specific minor allele
Test statistics of ancestry specific minor allele
P value of ancestry specific minor allele
sample size

Top

The University of North Carolina at Chapel Hill

Li Group Home

LAAA Home

Tutorial

Download

How to Cite

Contact