Skip Navigation
Increase font size
Decrease font size

BETASEQ: A Powerful Novel Method to Control Type-I Error Inflation in Partially Sequenced Data for Rare Variant Association Testing

What You Need


After downloading the BETASEQ_1.0.tar.gz into a chosen local folder "local_path",
    1. Start R envrionment.
    2. Use R command
       install.packages("local_path/BETASEQ_1.0.tar.gz", repos = NULL, type="source")
       to install BETASEQ. Note that the statmod package needs to be installed first.
    3. Use R command library("BETASEQ") to load BETASEQ.

How to Run

In our BETASEQ R package, the function beta.correct is the main function to correct the partically sequenced data from a two-stage design.

Usage of beta.correct

  • beta.correct(dat.mat, ind.seq, maf.cutoff=0.05, M=200)

Input of beta.correct

  • dat.mat: input data matrix. The first n-1 columns of dat.mat are genotype matrix, and the last column is affection status. The genotype coding is additive, with the value of minor allele being 1 and major allele 0.The control status is coded by 0 and case status by 1.
  • ind.seq: IDs of individuals sequenced (i.e., stage one individuals)
  • maf.cutoff: MAF cutoff to define rare variants, default is 0.05.
  • M: number of quadrature points, default is 200.

Output of beta.correct

Return a list of two components:

  • new.dat.mat: dat.mat matrix after correction
  • new.maf: new minor allele frequency vector after correction


  • Download seq.txt (500 cases and 500 controls, 90% of cases and 10% controls were sequenced. Column 1-855 are genotypes of discovered SNPs. Column 856 is case/control status.) and (List of IDs for the 500 [500*90%+500*10%] sequenced individuals).
  • Prepare input for beta.correct function using the following R codes:
    seq.dat.mat <- read.table("local_path/seq.txt")
    ind.seq <- scan("local_path/")
    marker <- seq.dat.mat[,1:(ncol(seq.dat.mat) - 1)]
    maf.cutoff <- 0.03
    M <- 100
  • Output: beta.dat <- beta.correct(seq.dat.mat,ind.seq,maf.cutoff,M)
  • Take BETASEQ output for ANRV rare variant association test (Morris and Zeggini, 2010):

    1. Download (This file is one of the source files in the SEQCHIP R package (Liu and Leal, 2012). It contains the function, an implementation of the ANRV test proposed by Morris and Zeggini in 2010.)

    2. Get pvalue of the ANRV test using the following R codes:
    new.dat.mat <- beta.dat$new.dat.mat
    new.maf <- beta.dat$new.maf
    result <-,ind.seq,new.maf,maf.cutoff,correct=FALSE)
    pvalue <- result$p.value
  • A more detailed example which includes producing the partically sequenced data can be found in the R document of beta.correct.