Skip Navigation
Text:
Increase font size
Decrease font size

LAAA: Local Ancestry and Allelic Association

What You Need


How to Run

LAAA, the pipeline for Local Ancestry and Allelic Association, consists of the following four stages:

  1. Format conversion
  2. Local ancestry inference
  3. Post-inference processing
  4. Two-step testing
Note: Examples used in this tutorial as well as a more detailed README can be found in the downloadable package.

STAGE 1

Convert sample genotypes in plink format and reference haplotypes to HAPMIX required input format and generate parameter file.
The following input files are required at stage 1:
  • A ped and a map file containing genotypes from study sample in plink format.
  • A haps file containing phased reference haplotypes (e.g., HapMapII or HapMapIII phased haplotype files, or 1000 Genomes phased haplotype files)
  • A snps file containing all markers present in the above hap file (e.g., the snps files come with HapMapII phased haplotype files)
  • A rate file consisting of the following 5 columns (e.g., the recombination rate files come with HapMapII phased haplotype files)
    • 1st column: marker ID
    • 2nd column: chromosome
    • 3rd column: physical position
    • 4th column: recombination rate (cM/Mb)
    • 5th column: genetic distance (cM)
Note:
  • Genetic distance is required for input genotypes.
  • All markers in the reference files must be included in the rate file.
  • All markers in teh genotype files must be inclded in the reference and rate files.
The input files are specified using the following command line options:
  • chromosome number: -chr
  • ped file: -ped
  • map file: -map
  • haps file from population 1: -hap1
  • snps file from population 1: -snps1
  • haps file from population 2: -hap2
  • snps file from population 2: -snps2
  • rate file: -rate
  • output directory: -outdir
Example command line (details on this example can be found in the README included in the downloadable package):
./convert_format.pl \
-chr 22 \
-ped example.ped \
-map example.map \
-hap1 CEU.hap \
-snps1 CEU.snps \
-hap2 YRI.hap \
-snps2 YRI.snps \
-rate rate.txt \
-outdir out_hapmix


Output:
Input files required by HAPMIX as well as a parameter file. A folder called “RUN” is created for holding results from HAPMIX. For example,
  • out_hapmix/AAgenofile.22
  • out_hapmix/AAind.22
  • out_hapmix/AAsnpfile.22
  • out_hapmix/pop1genofile.22
  • out_hapmix/pop1snpfile.22
  • out_hapmix/pop2genofile.22
  • out_hapmix/pop2snpfile.22
  • out_hapmix/rates.22
  • out_hapmix/hapmix.par.22
  • out_hapmix/RUN

STAGE 2

Stage 2 performs local ancestry inference. Output from stage 1 serves as input here. The user can specify to use task scheduler or not. Default is not. Inference is performed on each individual one by one.

Example command line:
./bin/runHapmix.pl out_hapmix/hapmix.par.22


If you are using a task scheduler, three sub-steps are needed:
  1. File formatting:
    Example command line:
    ./bin/runHapmix_1_GenerateFiles.pl -parfile out_hapmix/hapmix.par.22
  2. Local ancestry inference using HAPMIX; each individual local ancestry inference is submitted as a job. In this example, we use 'bsub' as our task scheduler:
    Example command line:
    ./bin/runHapmix_2_admix2.sh AA 22 out_hapmix bsub
  3. Finalizing:
    Example command line:
    ./bin/runHapmix_3_SummarizeFiles.pl -parfile out_hapmix/hapmix.par.22
Output is generated in out_hapmix/RUN as AA.DIPLOID.*.22

STAGE 3

Stage 3 performs post-inference processing to prepare input for the two-step testing procedure.

Input:
  • path to HAPMIX output
  • chromosome number
  • sample size
  • output path

Example command line:
# Assume sample size is 20 and output directory is out_testing
Rscript reformat.hapmix.output.R out_hapmix 22 20 out_testing


Output are three files containing matrix of number of minor alleles, number of ancestry-specific alleles and number of ancestry-specific minor alleles. For example:
  • AA.geno.22
  • AFR.allele.22
  • AFR.minor.22

STAGE 4

Perform association analysis using two step testing procedure. Additional input of phenotype and covariates files are required.

Input:
  • path to the three matrix generated in step 3
  • chromosome number
  • traits: two columns; 1st column IID; 2nd column value of trait
  • covariates: 1st column: IID; the following columns: value of covariates
  • path to association output
Example command line:
Rscript association_2wayAdmixed.R out_testing 22 traits.txt covariates.txt out_testing


The output file is the association results based on two step testing procedure, containing the following 14 columns:
  1. SNP ID
  2. P value from jointly testing allele effect, ancestry effect and effect heterogeneity
  3. Effect size of allele effect
  4. Standard error of allele effect
  5. Test statistics of allele effect
  6. P value of allele effect
  7. Effect size of ancestry effect
  8. Standard error of ancestry effect
  9. Test statistics of ancestry effect
  10. P value of ancestry effect
  11. Effect size of ancestry specific minor allele
  12. Standard error of ancestry specific minor allele
  13. Test statistics of ancestry specific minor allele
  14. P value of ancestry specific minor allele
  15. sample size