In this tutorial, we will analyze a Hi-C data set from Hippocampus dataset originally presented in Schmitt et al. (Cell,2016). It contains 22 Hi-C NxN contact matrices of the autosomal chromosomes. Hi-C input files and the mappability file can be downloaded from Yun Li’s Group website.
In this tutorial, we will analyze Hippocampus dataset from Schmitt et al., (Cell Reports, 2016), which contains Hi-C contact matrix of 22 autosomes from human Hippocampus. Hi-C input files and the mappability file can be downloaded from Yun Li Group website.
library("FIREcaller")
The directory should contain Hi-C input files of all autosomes and the mappability file required for the analysis.
setwd('~/Desktop/FIREcaller_example')
As default, the Hi-C input file file.list is defined according to the naming convention of the NxN matrices. The HiC Input files need to be sample and chromosome-specific NxN contact frequency matrix with no row name and no column name.The files for the contact matrices must have the naming convention "${prefix}_chr${number}.gz". For .hic and .cool files, ${sample}.cool or ${sample}.hic. For .hic files, juicer needs to be downloaded.
file.list <- paste0('Hippo_chr',1:22,'.gz')
Here is an example for the required format of Hi-C input files.
Hippo_chr1 <- read.table("Hippo_chr1.gz", header = FALSE)
# A subset contact matrix from 100 ~ 110th rows and 100 ~ 110th columns
Hippo_chr1[100:110,100:110]
## V100 V101 V102 V103 V104 V105 V106 V107 V108 V109 V110
## 100 0 1 1 1 1 2 2 1 1 0 1
## 101 1 4 7 3 5 11 6 2 7 6 1
## 102 1 7 8 8 6 8 6 3 8 4 1
## 103 1 3 8 4 5 5 7 4 1 4 3
## 104 1 5 6 5 2 8 6 3 9 3 1
## 105 2 11 8 5 8 12 10 9 7 10 7
## 106 2 6 6 7 6 10 4 10 6 13 4
## 107 1 2 3 4 3 9 10 4 15 13 3
## 108 1 7 8 1 9 7 6 15 10 21 5
## 109 0 6 4 4 3 10 13 13 21 6 10
## 110 1 1 1 3 1 7 4 3 5 10 0
Users are required to define the genome build type. It can be “hg19” or “GRCh38” for human data, and “mm9” or “mm10” for mouse data. If missing, an error message is returned.
gb<-'hg19'
There are some mappability files of different genome build (hg19, GRCh38, mm9 and mm10) and different resolutions (10 kb, 20 kb, and 40 kb) available in Yun Li’s’s website.
The mappability file needs to be in the format of column names as =c(‘chr’,‘start’, ‘end’, ‘F’, ‘GC’,‘M’,‘EBL’), and the chromosome column needs to be in the format ‘chr${number}’. The chromosomes in the file need to directly relate to the chromosomes in the process. For example, the mappability file will only contain information on chromosomes 1 through 22 to do an analysis on chromosome 1 through 22.
map_file<-'Hind3_hg19_40Kb_encodeBL_F_GC_M_auto.txt.gz'
Here is an example for the required format of the mappability file.
# The format of mappability file
Hind3_hg19_40Kb_encodeBL_F_GC_M_auto= read.table("Hind3_hg19_40Kb_encodeBL_F_GC_M_auto.txt.gz", header = TRUE)
head(Hind3_hg19_40Kb_encodeBL_F_GC_M_auto)
## chr start end F GC M EBL
## 1 chr1 0 40000 0 0.0000 0.0000 0
## 2 chr1 40000 80000 0 0.0000 0.0000 0
## 3 chr1 80000 120000 963 0.3925 0.6231 0
## 4 chr1 120000 160000 0 0.0000 0.0000 0
## 5 chr1 160000 200000 0 0.0000 0.0000 0
## 6 chr1 200000 240000 0 0.0000 0.0000 0
Here is an example of the chromosomes present in the mappability file.
unique(Hind3_hg19_40Kb_encodeBL_F_GC_M_auto$chr)
## [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17"
## [10] "chr18" "chr19" "chr2" "chr20" "chr21" "chr22" "chr3" "chr4" "chr5"
## [19] "chr6" "chr7" "chr8" "chr9"
Users can also use their own mappability file in the same format.
Default=40000 (40Kb). Other recommended bin sizes are 10000 (10Kb) and 20000 (20Kb).
binsize<-40000
The user has the option to change the cis-interacting region threshold. Default is 200Kb.
upper_cis=200000
Default=FALSE. If true, it will skip within-normalization step.
normalized<-FALSE
The MHC regions defined in the R package are:
GB | CHR | BP RANGE |
---|---|---|
hg19 | chr6 | 28477797-33448354 |
GRCh38 | chr6 | 28510120-33480577 |
mm9 | chr17 | 33888191-35744546 & 36230820-38050373 |
mm10 | chr17 | 33681276 & 38548659 |
The default setting is “TRUE” to remove the MHC region
rm_mhc <- TRUE
The ENCODE blacklist regions are described here. The EBL variable in the mappability file is an indicator of whether it is a black list region (1) or not (0).
rm_EBL <- TRUE
The default setting is “TRUE” to remove the ENCODE blacklist regions.
The user has the option to change the filtering threshold, where if a cis-interaction is calculated by more than 25% bins that contains a mappability of 0, a GC content of 0 , or a effective fragment length of 0,then it is also filtered.
rm_perc=0.25
The user has the option to change the regression distribution used. The HiCNormCis method uses Poisson (“poisson”) distribution, but the user can change to negative binomial (“nb”)
dist='poisson'
Default=0.05.
alpha<-0.05
Default =FALSE. If the user would like to incorporate other multi-omic data, see FIRE_plot() function.
plots<-FALSE
Default =FALSE. If the user would like more options specific to the limma package, see diff_interactions() function. Differential FIRE analysis of up to 2 samples with atleast 2 replicates per sample. Naming convention between replicates needs to be similar. Sample name is ordered alphabetically.
diff_fires<-FALSE
Using FIREcaller function, we call both FIREs and SuperFIREs for 22 autosomes of Hippocampus dataset.
FIREcaller(file.list, gb, map_file,binsize=40000, upper_cis=200000,normalized=FALSE, rm_mhc = TRUE,rm_EBL=TRUE, rm_perc=0.25, dist='poisson',alpha=0.05, plots=FALSE,diff_fires=FALSE)
In this case, two sets of files will be returned: one for FIREs and the other one for SuperFIREs. If multiple (n) prefix’s are specified in the prefix.list, there will be one file for FIREs and n SuperFIRE files.
# An example for FIRE output
FIRE_output = read.table("FIRE_ANALYSIS_40000_200000_poisson.txt", header = T)
head(FIRE_output)
## chr start end Hippo_norm_cis Hippo_neg_ln_pval Hippo_indicator
## 1 chr1 840000 880000 0.1867 0.0037 0
## 2 chr1 880000 920000 0.1808 0.0035 0
## 3 chr1 920000 960000 0.3123 0.0118 0
## 4 chr1 960000 1000000 0.2592 0.0073 0
## 5 chr1 1000000 1040000 0.4518 0.0363 0
## 6 chr1 1040000 1080000 0.3520 0.0165 0
# An example for SuperFIRE output
SuperFIRE_output = read.table("super_FIRE_call_Hippo.txt", header = T)
head(SuperFIRE_output)
## chr start end cum_FIRE_score
## 1 chr15 64280000 64480000 20.0705
## 2 chr5 139400000 139560000 20.1927
## 3 chr7 117360000 117520000 20.3722
## 4 chr5 124160000 124280000 20.6055
## 5 chr3 156000000 156200000 20.6582
## 6 chr5 14080000 14200000 20.7497