打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
i

0 job(s) in queue; 0 job(s) currently running

Supply regions or genes

  • analysis (~ 20 s) - only JASPAR motif collection.
  • analysis (~ 6 mins) - whole motif collection.

  • Please do not enter a single region nor gene here,
    but a list of related peaks/regions or genes.
    An explanation on supported input
    formats can be found here.
  • Or choose a file to upload:
  • Please make sure that the selected input type corresponds to the data pasted in the step 2.

  • Approx. time for the analysis using:
  • only database: 3-4 mins
  • only TFBS ChIP database: 1 min
  • all databases: 5-6 mins
Optional parameters

Comparative analysis

This tool allows comparison of motif enrichment results of 2 independent i-cisTarget analyses. In this analysis, it is possible to compare both results for the same species as well as for different species (if the same motif collection is used), e.g. enriched motifs found for active regions in mouse and drosophila heart.

The comparative analysis can be performed here.

Download BED files with candidate regulatory regions

Examples

See the Legend explaining the study types below.

Study type Study Species version Input Reference Link
The top 1000 GATA1 ChIP-seq peaks in K562 cell line (Encode) Human Peaks ENCODE Project Consortium, Nature (2012) Report
FLI1 ChIP-seq peaks in Ewing sarcoma (provided in the Supplementary Table 1 of the corresponding paper) Human Peaks Riggi et al., Cancer Cell (2014) Report
Genes up-regulated after GATA1 activation in G1E-ER4 mouse cells. Human orthologs used as input (as downloaded from MSigDB). Human Genes Welch et al., Blood (2004) Report
Genes down-regulated upon EWS-FLI1 knockdown in Ewing sarcoma Human Genes Riggi et al., Cancer Cell (2014) Report
Differentially expressed genes after TGFbeta treatment in A549 Human Genes Cieślik et al., Epigenetics Chromatin (2013) Report
The top 1000 less active regions (H3K27ac) in Ewing sarcoma upon EWS-FLI1 knockdown Human CRRs Riggi et al., Cancer Cell (2014) Report
100 less active peaks (H3K27ac) in Ewing sarcoma upon EWS-FLI1 knockdown (provided in the Supplementary Table 3 of the corresponding paper) Human Peaks Riggi et al., Cancer Cell (2014) Report
100 more active peaks (H3K27ac) in Ewing sarcoma upon EWS-FLI1 knockdown (provided in the Supplementary Table 3 of the corresponding paper) Human Peaks Riggi et al., Cancer Cell (2014) Report
The top 1000 active regions (ATAC-seq) in Ewing sarcoma after EWS-FLI1 activation Human CRRs Riggi et al., Cancer Cell (2014) Report
Heart positive VISTA enhancers Human Peaks Visel et al., Nucleic Acids Res (2007) Report
The top 500 Gata1 ChIP-seq peaks in MEL mouse cell line (Mouse Encode) Mouse Peaks Mouse Encode Report
Genes up-regulated after GATA1 activation in G1E-ER4 mouse cells (erythroid precursors engineered to express GATA1 upon addition of estradiol. HGNC symbols as downloaded from MSigDB converted to MGI symbols. Mouse Genes Welch et al., Blood (2004) Report
P300 ChIP-seq peaks in mouse heart Mouse Peaks downloaded from GEO Report
Genes related to GO term 'heart process' [GO:0003015] (170 genes, 375 annotations) Mouse Genes downloaded from MGI Report
9433 Zelda ChIP-seq peaks in the early Drosophila melanogaster embryo Drosophila Peaks Harrison et al., Plos Genetics 2011 Report
Conserved eye disc-specific genes (versus wing disc) Drosophila Genes Naval Sanchez et al., Genome Research 2013 Report
Set of co-expressed genes downstream of Dorsal (down in knockout), in the fly embryo Drosophila Genes Stathopoulos et al., Cell (2002) Report
Daphnia pulex heat shock signature Daphnia pulex Genes Becker et al. (unpublished data) Report
Daphnia magna genes upregulated after chronic treatment with microcystin-free cyanobacteria Daphnia magna Genes Schwarzenberger et al., BMC Genomics (2014) Report

Legend:

Study type Question Input Input type Output
Are motifs and ChIP-seq tracks of my ChIP'ped factor enriched? What are the co-factors of the TF? Which DHS/Faire/Histone tracks are correlated with my TF ChIP-seq peaks? Can we discriminate direct from indirect targets? TF ChIP-seq peaks Peaks The motifs and TF ChIP-seq tracks of the ChIP'ped factor and their direct target regions + motifs and ChIP-seq tracks of co-factors + correlated DHS/Faire/Histone marks in specific cell lines.
Which TFs bind to significantly more active/open/repressed regions between two conditions (normal vs cancer, non-treated vs treated, cancer subtype1 vs subtype2,...) based on e.g. H3K27ac/H3K27me3/Faire/ATAC differential analysis? a set of differentially active regions Peaks The most correlated motifs and TF ChIP-seq tracks and their direct target regions, the most correlated DHS/Faire/Histone tracks.
Which TF regulates the expression of the specific gene signature/gene module/co-expressed genes? a set of genes Genes The motifs and TF ChIP-seq tracks of the predicted upstream regulators, alongside with the non-TF regulatory tracks in specific cell types.

Application on TF ChIP-seq and MSigDB gene sets

See results

Overview

i-cisTarget enables:

  1. to detect transcription factor motifs in a set of peaks (e.g. differentially active peaks based on H3K27ac ChIP-seq between 2 conditions) or co-expressed genes.
  2. to detect overrepresented in vivo features (histone modifications, TF ChIP-seq, DHS, Faire) for gene signatures or peaks. These regulatory features help to improve motif discovery and candidate target gene prediction.
  3. to dissect a set of co-expressed genes into direct target genes of different transcription factor motifs or ChIP-seq tracks

The analysis is based on ranking conserved regions in the Human, Mouse or Drosophila genome and recovery curves (See Figure).

Some of the key features of i-cisTarget are:

  • over-represented motifs are predicted in the set of co-expressed genes, using entire intergenic and intronic sequences
  • 10 vertebrate spiecies are used for motif scoring in Human and Mouse version, 12 Drosophila species are used in Drosophila version
  • for motif scoring, the Cluster-Buster algorithm is used; kindly provided by Martin Frith
  • for cross-species comparisons, the cluster-buster scores of orthologous regions are used independently (e.g. no requirement of base pair alignment of D. melanogaster motifs in the Drosophila version), as described previously here
  • each significant motif and regulatory track results in an optimal subset of genes that are predicted as direct targets
  • for the predicted direct targets, i-cisTarget presents the predicted enhancer and binding site locations in the UCSC Genome Browser

A vast library of motifs and in vivo regulatory features was compiled for i-cisTarget. These features are:

  • 9713 position weight matrices (PWM) are used for the Human, Mouse and Drosophila version, from various sources such as Transfac, Jaspar, FlyFactorSurvey, and motif collections from Stark et al., Elemento et al., and Down et al.
  • 4305 regulatory data tracks from Encode and other sources (DHS, FAIRE, TF ChIP-Seq, TF histone marks) for the Human version
  • 565 regulatory data tracks from Mouse Encode for the Mouse version
  • 455 features from the modENCODE project,48 features from the BDTNP project, 33 ChIP based features from Eileen Furlong's lab (Zinzen et al., Nature (2009)), 2 * 30 chromatin states (derived from the S2 and the BG3 cell lines) taken from modENCODE Consortium et al., Science (2010) for the Drosophila version

Help

1. INPUT

The input for i-cisTarget can be:

  1. a set of co-regulated genomic regions, such as TF ChIP-seq or differential open chromatin peaks
  2. a set of co-expressed/co-regulated genes

Analysis of peaks

If we start from peaks/loci, e.g.:

  • TF ChIP-seq track to find co-factors or to detect false positive peaks that are not enriched by the ChIP'ped motif,
  • differentially active regions (e.g. H3K27ac) between two groups of samples or before/after treatment to find which TFs bind to the active regions,
  • open chromatin data (e.g. FAIRE, DHS or ATAC) between two groups to detect which TFs bind there,

then the peaks are mapped to the overlapping candidate i-cisTarget regulatory regions (described below).

Analysis of a gene signature

If we want to start from a set of genes such as a module, a gene signature, top mutated genes, a Gene Ontology category, etc. to find out for instance which upstream TF regulates this certain group of genes or which DHS track is correlated most, then the input genes will be linked to candidate regulatory regions. To do so, all non-coding regions located in the neighbourhood of a gene will be assigned to this gene. These regions include the promoter regions upstream and downstream to the transcription start site (TSS). The “space” around each gene is a parameter and can be selected. For human and mouse version the space 20 kb around TSS is used.

Human candidate regulatory regions

i-cisTarget analysis relies on candidate regulatory regions that we defined using publicly available regulatory data: General Binding Preference models, CpG islands, proximal promoters, conserved non-coding sequences, ultra-conserved elements, regulatory elements from OregAnno, VistaEnhancers, predicted cis-regulatory modules and DNAseI Hypersensitive (DHS) uniform clustered peaks across 125 cell lines from ENCODE.

Table 1. Publicly available regulatory datasets used to create human i-cisTarget candidate regulatory regions

  GBP CpG Proximal promoters CNS UCR Oreganno Vista Enhancers CRMs DHS
Number of regions 61550 27718 34722 232101 15931 23112 1339 123500 1281988
% of the genome 1.77 0.73 0.67 2.25 0.13 0.39 0.07 2.05 13.36

All these features were merged. In a first step, regions having an overlap of at least 20% or 80% with insulator elements in the genome or coding exons respectively were removed. Next, regions with an overlap smaller then 20% or 80% with insulators or exons are split and the regions containing the insulator or coding exons were removed. Remaining regions are then filtered based on size and regions < 30bp="" are="" removed.="" finally,="" any="" resulting="" regions="" shorter="" than="" 1000="" bp="" were="" extended="" if="" possible="" to="" 1000="" bp="" in="" a="" direction="" that="" prevents="" overlap="" with="" an="" insulator="" or="" exon.="" the="" complete="" procedure="" of="" creating="" candidate="" regulatory="" regions="" yielded="" 1.223.024="" regions="" (representing="" ~35%="" of="" the="" genome)="" with="" average="" size="" 818="">

Subsequently, all the candidate regulatory regions were scored and ranked for each feature (motifs and regulatory tracks) and ranking databases were created.

2. RECOVERY ANALYSIS

The ranking of the foreground set of the user input (input genes/regions mapped to the i-cisTarget regulatory regions) is considered per each feature (motif or regulatory track) and the Area Under the Curve (AUC) of these “foreground” regions is calculated. The areas for all features are normalized using a Normalized Enrichment Score (NES = (AUC-µ) / σ). Moreover, similar enriched motifs are clustered together using STAMP.

Extra and optional parameters

  • Region mapping (only when genes are used as input)
    the user choose the space around the TSS which should be used for mapping of genes into the candidate regulatory regions (all the regions in the certain space around TSS will be used in the analysis). For the Human and Mouse version only space 20kb around TSS is available now.
  • Fraction of overlap (only when peaks/regions are used as input)
    the user specify the fraction of overlap between the input peak and i-cisTarget candidate regulatory region. E.g. the fraction of overlap 0.4 (default) means, that only the i-cisTarget regions being overlapped at least from 40% by the input peaks will be used in the analysis.
  • Normalized enrichment score threshold
    only the enriched regions with normalized enrichment score higher than the threshold will be shown in the report, as well as the STAMP clustering of similar motifs will be performed only for the motifs with NES above this threshold.
  • Enrichment analysis
    enrichment analysis can be computed either within each database separately (default) when AUC distribution is generated per each database and NES are computed within each database, or over all databases when one AUC distribution is generated across the databases as well as NES.
  • ROC threshold for AUC calculation
    the cut-off for fraction of ranked regions (x-axis) at which to calculate AUC. This measure is then used for comparing all the motifs/tracks.
  • Threshold for visualization
    the cut-off for x-axis of AUC plot. If this is set to 20.000 then the recovery curve will be visualized for the top 20.000 regulatory regions.

3. OUTPUT

When the analysis is finished the results will appear on the webpage or the link to the results will be provided to the user's e-mail.

The report of results includes (see part A of the figure):

  1. Overview of the parameters used in the analysis and statistics, including:
    • total number of the features for which ranking was considered across all the databases.
    • number of enriched features for the specific NES threshold.
    • total number of ranked regions (candidate regulatory regions).
    • type of input query - gene symbols, BED file or i-cisTarget regions.
    • fraction of mapped input IDs (for genes and i-cisTarget regions).
    • overlap of input regions with i-cisTarget regions (when BED file with regions is used as input, a BED file with overlap of input regions with i-cisTarget regions is provided).
    • number of i-cisTarget regions considered in the analysis.
    • normalized enrichment score (NES) threshold.
    • AUC threshold (fraction/number of ranked regions).
    • recovery curve threshold (number of regions visualized in the AUC plots).
  2. Plot representing the AUC distribution for each database (each database represented by a different colour). In this case, the 'within each database separately' enrichment analysis was used.
  3. Recovery curves of the best feature for each database (each database is represented by a different colour), as well as the average recovery curve (a thicker curve of the corresponding colour represents the average number of recovered regions across all the features within the database).
  4. A table of the most enriched/correlated regulatory tracks and motifs for the input set of regions, where the features are ranked according to the NES.

    The table includes:
  5. Ranking of the feature.
  6. Name of the feature with its description (e.g. possible TFs for the enriched motif).
  7. Normalized enrichment score (NES) value.
  8. Logo of the enriched motif (only for the motif databases).
  9. Recovery curve for the enriched feature.
  10. A link to the list of candidate target regions with their region IDs and genomic coordinates (provided also as a BED file, which can be directly viewed in the UCSC genome browser).
  11. A link to the list of the regions from the input that are ranked among the top 20000 regions for the feature.
  12. Name of the database which contains the enriched feature.
  13. The top ranked enriched features are motifs. Notice that similar motifs are clustered together by the same colour (done by STAMP clustering).
  14. The first regulatory track is DHS on SK-N-MC cell line.
  15. The first TF regulatory track is POLR2A on SK-N-MC cell line.

  16. Subsequent analyses can be performed for the selected enriched features:
    • Use candidate target regions as filter and use as input for i-cisTarget analysis again.
      Get candidate target regions from selected features and use then as input for a new i-cisTarget analysis.
    • Scan candidate target regions of selected features either for multiple homotypic or heterotypic CRMs:
      The target regions are scanned for the selected motifs.
      A BED file with locations of the CRMs and motifs will be generated. The location of the CRMs and motifs can be viewed directly in the UCSC genome browser by using the BED file as a custom track (see part B of the figure, where the green arrowhead points out the predicted target region upstream the PAX7 gene and red arrowheads point out two regions around the CDYL gene. These regions are enriched with CRMs and motifs for EWSR1-FLI1).
    • Create SIF file for the selected features:
      Simple Interaction File (SIF) including names of the selected features, predicted target regions and the closest genes will be generated.
      The user can import the SIF file in Cytoscape where the gene regulatory network can be created. Part C of the figure represents the network generated for 1) the first enriched motif EWSR1-FLI1 (orange node), 2) the first enriched DHS track (green node), 3) the first enriched TF ChIP-seq track (purple node).
      The target genes are the closest genes to the predicted target regions and the edges represent the predicted i-cisTarget regulatory regions. The green and red arrowheads point out the PAX7 and CDYL genes - regulated by EWSR1-FLI1 via the regions represented in the UCSC genome browser screenshots above. Therefore there are two edges to CDYL while there is one edge to PAX7 leading from the EWSR1-FLI1 motif (marked in red). Moreover there are two regions around PAX7 enriched with a DHS track (two edges from the green node).

New features

Compared to the old version of i-cisTarget (2012):

  • The Human and Mouse version additionally to the Drosophila version.
  • A collection of candidate regulatory regions was determined for Human and Mouse.
  • The motif collections were updated and now include nearly 10.000 PWMs.
  • A large collection of regulatory data was added from ENCODE and Mouse ENCODE.

Table 2. Human regulatory tracks included in the databases

  ENCODE Epigenomics Roadmap Project Taipale lab Aerts lab
DHS 467 390 0 0 857
FAIRE 37 0 0 14 51
Histone ChIP-seq 402 1572 3 26 2003
TF ChIP-seq 1274 0 117 3 1394
2180 1962 120 43 4305

Table 3. Mouse regulatory tracks included in the databases

  ENCODE
DHS 150
Histone ChIP-seq 209
TF ChIP-seq 206
565

Citations

Supported input formats

Report for a single gene signature

Gene signatures must be supplied as a list of gene identifiers, separated by newline characters.

The following IDs are supported:

HGNC gene symbols for Human version MGI gene symbols for Mouse version FlyBase gene symbols for Drosophila version
KLF7
LOX
CLDN14
GAD1
HEY1
COL1A1
ZEB2
SNAI2
RUNX1
ETS1
Cdk6
Greb1
Runx1
Sox9
Cdh11
Mapk8
Ncam1
Sox2
Tbx1
Rab32
pros
CG13321
dpr7
mex1
lab
toy
bi
SPoCk
dpr7
CG11489

Batch support for multiple gene signatures

The GMT file format is a tab delimited file format that describes multiple gene sets. In the GMT format each row represents a gene set. Each gene set is described by a name, a description, and the genes in the gene set. These fields are separated by a TAB character; the gene identifiers need to be separated by a semicolon, colon or again a TAB character.

Signature1 Description BCL2;CDH1;ESRRA;GNAL;MITF;MYC;PTEN;VAT1;ZBTB10;TANGO2;POU3F2;HEY2;FAM210B;DCT
Signature2 Description ARTN;COL5A1;EPHA2;KLF7;JUN;BNC1;ELK3;RELN;DOCK2;CAV1;SAMD9;PLAU;E2F7;BCL3;SOX9

ChIP peaks

ChIP peaks must be supplied as BED file entries, specifying the locations of these peaks in the Human, Mouse or Drosophila genome. FASTA file input is not supported.

An example of a BED file:

chr1 161871 162031 MACS_peak_1
chr1 320427 320573 MACS_peak_2
chr1 1477181 1477397 MACS_peak_3
chr1 1717532 1717802 MACS_peak_4
chr1 2594610 2594771 MACS_peak_5
chr1 2600664 2600842 MACS_peak_6
chr1 2604404 2604568 MACS_peak_7
chr1 2634134 2634219 MACS_peak_8
chr1 3323228 3323380 MACS_peak_9
chr1 6257663 6257825 MACS_peak_10
chr1 8716309 8716377 MACS_peak_11
chr1 8921167 8921423 MACS_peak_12

Contact

If you have any question or problem related to i-cisTarget, please, inform us: lcbtools@kuleuven.be

Cite us

If you use i-cisTarget, please cite:

Imrichová,H., Hulselmans,G., Kalender Atak,Z., Potier,D. and Aerts,S. (2015) i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res. doi: 10.1093/nar/gkv395

Herrmann,C., Van de Sande,B., Potier,D. and Aerts,S. (2012) i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res. doi: 10.1093/nar/gks543

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
手把手教你设计 ChIP
文献分享-目前关于ATAC-seq分析的现状
这题我会--如何弄清转录因子调控的基因(Cistrome DB数据库)
FactorBook:人和小鼠转录因子chip_seq数据库
调控基因组
RNA-seq这样画图,国自然必得A
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服