1 FIREcaller Tutorial

In this tutorial, we will analyze a Hi-C data set from Hippocampus dataset originally presented in Schmitt et al. (Cell,2016). It contains 22 Hi-C NxN contact matrices of the autosomal chromosomes. Hi-C input files and the mappability file can be downloaded from Yun Li’s Group website.

In this tutorial, we will analyze Hippocampus dataset from Schmitt et al., (Cell Reports, 2016), which contains Hi-C contact matrix of 22 autosomes from human Hippocampus. Hi-C input files and the mappability file can be downloaded from Yun Li Group website.

1.1 Setup the library

library("FIREcaller")

2 FIRE and SuperFIRE calling

2.1 Set working directory

The directory should contain Hi-C input files of all autosomes and the mappability file required for the analysis.

setwd('~/Desktop/FIREcaller_example')

2.2 Hi-C input file

As default, the Hi-C input file file.list is defined according to the naming convention of the NxN matrices. The HiC Input files need to be sample and chromosome-specific NxN contact frequency matrix with no row name and no column name.The files for the contact matrices must have the naming convention "${prefix}_chr${number}.gz". For .hic and .cool files, ${sample}.cool or ${sample}.hic. For .hic files, juicer needs to be downloaded.

file.list <- paste0('Hippo_chr',1:22,'.gz')

Here is an example for the required format of Hi-C input files.

Hippo_chr1 <- read.table("Hippo_chr1.gz", header = FALSE)

# A subset contact matrix from 100 ~ 110th rows and 100 ~ 110th columns
Hippo_chr1[100:110,100:110]
##     V100 V101 V102 V103 V104 V105 V106 V107 V108 V109 V110
## 100    0    1    1    1    1    2    2    1    1    0    1
## 101    1    4    7    3    5   11    6    2    7    6    1
## 102    1    7    8    8    6    8    6    3    8    4    1
## 103    1    3    8    4    5    5    7    4    1    4    3
## 104    1    5    6    5    2    8    6    3    9    3    1
## 105    2   11    8    5    8   12   10    9    7   10    7
## 106    2    6    6    7    6   10    4   10    6   13    4
## 107    1    2    3    4    3    9   10    4   15   13    3
## 108    1    7    8    1    9    7    6   15   10   21    5
## 109    0    6    4    4    3   10   13   13   21    6   10
## 110    1    1    1    3    1    7    4    3    5   10    0

2.3 Define the genome build

Users are required to define the genome build type. It can be “hg19” or “GRCh38” for human data, and “mm9” or “mm10” for mouse data. If missing, an error message is returned.

gb<-'hg19'

2.4 Define the name of the mappability file

There are some mappability files of different genome build (hg19, GRCh38, mm9 and mm10) and different resolutions (10 kb, 20 kb, and 40 kb) available in Yun Li’s’s website.

The mappability file needs to be in the format of column names as =c(‘chr’,‘start’, ‘end’, ‘F’, ‘GC’,‘M’,‘EBL’), and the chromosome column needs to be in the format ‘chr${number}’. The chromosomes in the file need to directly relate to the chromosomes in the process. For example, the mappability file will only contain information on chromosomes 1 through 22 to do an analysis on chromosome 1 through 22.

map_file<-'Hind3_hg19_40Kb_encodeBL_F_GC_M_auto.txt.gz'

Here is an example for the required format of the mappability file.

# The format of mappability file
Hind3_hg19_40Kb_encodeBL_F_GC_M_auto= read.table("Hind3_hg19_40Kb_encodeBL_F_GC_M_auto.txt.gz", header = TRUE)
head(Hind3_hg19_40Kb_encodeBL_F_GC_M_auto)
##    chr  start    end   F     GC      M EBL
## 1 chr1      0  40000   0 0.0000 0.0000   0
## 2 chr1  40000  80000   0 0.0000 0.0000   0
## 3 chr1  80000 120000 963 0.3925 0.6231   0
## 4 chr1 120000 160000   0 0.0000 0.0000   0
## 5 chr1 160000 200000   0 0.0000 0.0000   0
## 6 chr1 200000 240000   0 0.0000 0.0000   0

Here is an example of the chromosomes present in the mappability file.

unique(Hind3_hg19_40Kb_encodeBL_F_GC_M_auto$chr)
##  [1] "chr1"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17"
## [10] "chr18" "chr19" "chr2"  "chr20" "chr21" "chr22" "chr3"  "chr4"  "chr5" 
## [19] "chr6"  "chr7"  "chr8"  "chr9"

Users can also use their own mappability file in the same format.

2.5 Define the binsize.

Default=40000 (40Kb). Other recommended bin sizes are 10000 (10Kb) and 20000 (20Kb).

binsize<-40000

2.6 Change the cis-interacting regions

The user has the option to change the cis-interacting region threshold. Default is 200Kb.

upper_cis=200000

2.7 Define if the input matrices is ALREADY normalized.

Default=FALSE. If true, it will skip within-normalization step.

normalized<-FALSE

2.8 Define whether to remove MHC region

The MHC regions defined in the R package are:

GB CHR BP RANGE
hg19 chr6 28477797-33448354
GRCh38 chr6 28510120-33480577
mm9 chr17 33888191-35744546 & 36230820-38050373
mm10 chr17 33681276 & 38548659

The default setting is “TRUE” to remove the MHC region

rm_mhc <- TRUE

2.9 Define whether to remove the ENCODE black list regions

The ENCODE blacklist regions are described here. The EBL variable in the mappability file is an indicator of whether it is a black list region (1) or not (0).

rm_EBL <- TRUE

The default setting is “TRUE” to remove the ENCODE blacklist regions.

2.10 Change the filtering threshold

The user has the option to change the filtering threshold, where if a cis-interaction is calculated by more than 25% bins that contains a mappability of 0, a GC content of 0 , or a effective fragment length of 0,then it is also filtered.

rm_perc=0.25

2.11 Change the regression distribution

The user has the option to change the regression distribution used. The HiCNormCis method uses Poisson (“poisson”) distribution, but the user can change to negative binomial (“nb”)

dist='poisson'

2.12 Change the define the alpha cut off for a significant p-value.

Default=0.05.

alpha<-0.05

2.13 Specify if you would like circos plots of the FIREs and SuperFIREs

Default =FALSE. If the user would like to incorporate other multi-omic data, see FIRE_plot() function.

plots<-FALSE

2.14 Specify if you would like differential FIRE analysis

Default =FALSE. If the user would like more options specific to the limma package, see diff_interactions() function. Differential FIRE analysis of up to 2 samples with atleast 2 replicates per sample. Naming convention between replicates needs to be similar. Sample name is ordered alphabetically.

diff_fires<-FALSE

2.15 Call FIREs and SuperFIREs

Using FIREcaller function, we call both FIREs and SuperFIREs for 22 autosomes of Hippocampus dataset.

FIREcaller(file.list, gb, map_file,binsize=40000, upper_cis=200000,normalized=FALSE, rm_mhc = TRUE,rm_EBL=TRUE, rm_perc=0.25, dist='poisson',alpha=0.05, plots=FALSE,diff_fires=FALSE)

In this case, two sets of files will be returned: one for FIREs and the other one for SuperFIREs. If multiple (n) prefix’s are specified in the prefix.list, there will be one file for FIREs and n SuperFIRE files.

# An example for FIRE output
FIRE_output = read.table("FIRE_ANALYSIS_40000_200000_poisson.txt", header = T)
head(FIRE_output)
##    chr   start     end Hippo_norm_cis Hippo_neg_ln_pval Hippo_indicator
## 1 chr1  840000  880000         0.1867            0.0037               0
## 2 chr1  880000  920000         0.1808            0.0035               0
## 3 chr1  920000  960000         0.3123            0.0118               0
## 4 chr1  960000 1000000         0.2592            0.0073               0
## 5 chr1 1000000 1040000         0.4518            0.0363               0
## 6 chr1 1040000 1080000         0.3520            0.0165               0
# An example for SuperFIRE output
SuperFIRE_output = read.table("super_FIRE_call_Hippo.txt", header = T)
head(SuperFIRE_output)
##     chr     start       end cum_FIRE_score
## 1 chr15  64280000  64480000        20.0705
## 2  chr5 139400000 139560000        20.1927
## 3  chr7 117360000 117520000        20.3722
## 4  chr5 124160000 124280000        20.6055
## 5  chr3 156000000 156200000        20.6582
## 6  chr5  14080000  14200000        20.7497