i

0 job(s) in queue; 0 job(s) currently running

Supply regions or genes
1. Analysis type Quick analysis (~ 20 s) - only JASPAR motif collection. Full analysis (~ 6 mins) - whole motif collection.
2. Paste a list of region coordinates or gene symbols Please do not enter a single region nor gene here, but a list of related peaks/regions or genes. An explanation on supported input formats can be found here.
Or choose a file to upload:
3. For which species?
Gene annotation version
4. Input type Please make sure that the selected input type corresponds to the data pasted in the step 2.
5. Which database version?
6. What features do you want to analyze? Approx. time for the analysis using: only PWM database: 3-4 mins only TFBS ChIP database: 1 min all databases: 5-6 mins
7. Give a name for your job
8. Your E-mail address (optional)

Optional parameters
Region mapping
Minimum fraction of overlap
Normalized enrichment score (NES) threshold
Enrichment analysis
ROC threshold for AUC calculation
Threshold for visualization

Comparative analysis

This tool allows comparison of motif enrichment results of 2 independent i-cisTarget analyses. In this analysis, it is possible to compare both results for the same species as well as for different species (if the same motif collection is used), e.g. enriched motifs found for active regions in mouse and drosophila heart.

The comparative analysis can be performed here.

Download BED files with candidate regulatory regions

Human CRRs (hg19, RefSeq r45) - a set of 1.223.024 candidate regulatory regions for human i-cisTarget
Mouse CRRs (mm9, RefSeq r45) - a set of 938.376 candidate regulatory regions for mouse i-cisTarget
Mouse CRRs (mm9, RefSeq r70) - a set of 1.370.582 candidate regulatory regions for mouse i-cisTarget
Fly CRRs (dm3, FlyBase r5.37) - a set of 136.353 candidate regulatory regions for fly i-cisTarget

Examples

See the Legend explaining the study types below.

Study	Species version	Input	Reference	Link
The top 1000 GATA1 ChIP-seq peaks in K562 cell line (Encode)	Human	Peaks	ENCODE Project Consortium, Nature (2012)	Report
FLI1 ChIP-seq peaks in Ewing sarcoma (provided in the Supplementary Table 1 of the corresponding paper)	Human	Peaks	Riggi et al., Cancer Cell (2014)	Report
Genes up-regulated after GATA1 activation in G1E-ER4 mouse cells. Human orthologs used as input (as downloaded from MSigDB).	Human	Genes	Welch et al., Blood (2004)	Report
Genes down-regulated upon EWS-FLI1 knockdown in Ewing sarcoma	Human	Genes	Riggi et al., Cancer Cell (2014)	Report
Differentially expressed genes after TGFbeta treatment in A549	Human	Genes	Cieślik et al., Epigenetics Chromatin (2013)	Report
The top 1000 less active regions (H3K27ac) in Ewing sarcoma upon EWS-FLI1 knockdown	Human	CRRs	Riggi et al., Cancer Cell (2014)	Report
100 less active peaks (H3K27ac) in Ewing sarcoma upon EWS-FLI1 knockdown (provided in the Supplementary Table 3 of the corresponding paper)	Human	Peaks	Riggi et al., Cancer Cell (2014)	Report
100 more active peaks (H3K27ac) in Ewing sarcoma upon EWS-FLI1 knockdown (provided in the Supplementary Table 3 of the corresponding paper)	Human	Peaks	Riggi et al., Cancer Cell (2014)	Report
The top 1000 active regions (ATAC-seq) in Ewing sarcoma after EWS-FLI1 activation	Human	CRRs	Riggi et al., Cancer Cell (2014)	Report
Heart positive VISTA enhancers	Human	Peaks	Visel et al., Nucleic Acids Res (2007)	Report
The top 500 Gata1 ChIP-seq peaks in MEL mouse cell line (Mouse Encode)	Mouse	Peaks	Mouse Encode	Report
Genes up-regulated after GATA1 activation in G1E-ER4 mouse cells (erythroid precursors engineered to express GATA1 upon addition of estradiol. HGNC symbols as downloaded from MSigDB converted to MGI symbols.	Mouse	Genes	Welch et al., Blood (2004)	Report
P300 ChIP-seq peaks in mouse heart	Mouse	Peaks	downloaded from GEO	Report
Genes related to GO term 'heart process' [GO:0003015] (170 genes, 375 annotations)	Mouse	Genes	downloaded from MGI	Report
9433 Zelda ChIP-seq peaks in the early Drosophila melanogaster embryo	Drosophila	Peaks	Harrison et al., Plos Genetics 2011	Report
Conserved eye disc-specific genes (versus wing disc)	Drosophila	Genes	Naval Sanchez et al., Genome Research 2013	Report
Set of co-expressed genes downstream of Dorsal (down in knockout), in the fly embryo	Drosophila	Genes	Stathopoulos et al., Cell (2002)	Report
Daphnia pulex heat shock signature	Daphnia pulex	Genes	Becker et al. (unpublished data)	Report
Daphnia magna genes upregulated after chronic treatment with microcystin-free cyanobacteria	Daphnia magna	Genes	Schwarzenberger et al., BMC Genomics (2014)	Report

Legend:

Question	Input	Input type	Output
Are motifs and ChIP-seq tracks of my ChIP'ped factor enriched? What are the co-factors of the TF? Which DHS/Faire/Histone tracks are correlated with my TF ChIP-seq peaks? Can we discriminate direct from indirect targets?	TF ChIP-seq peaks	Peaks	The motifs and TF ChIP-seq tracks of the ChIP'ped factor and their direct target regions + motifs and ChIP-seq tracks of co-factors + correlated DHS/Faire/Histone marks in specific cell lines.
Which TFs bind to significantly more active/open/repressed regions between two conditions (normal vs cancer, non-treated vs treated, cancer subtype1 vs subtype2,...) based on e.g. H3K27ac/H3K27me3/Faire/ATAC differential analysis?	a set of differentially active regions	Peaks	The most correlated motifs and TF ChIP-seq tracks and their direct target regions, the most correlated DHS/Faire/Histone tracks.
Which TF regulates the expression of the specific gene signature/gene module/co-expressed genes?	a set of genes	Genes	The motifs and TF ChIP-seq tracks of the predicted upstream regulators, alongside with the non-TF regulatory tracks in specific cell types.

Application on TF ChIP-seq and MSigDB gene sets

See results

Overview

i-cisTarget enables:

to detect transcription factor motifs in a set of peaks (e.g. differentially active peaks based on H3K27ac ChIP-seq between 2 conditions) or co-expressed genes.
to detect overrepresented in vivo features (histone modifications, TF ChIP-seq, DHS, Faire) for gene signatures or peaks. These regulatory features help to improve motif discovery and candidate target gene prediction.
to dissect a set of co-expressed genes into direct target genes of different transcription factor motifs or ChIP-seq tracks

The analysis is based on ranking conserved regions in the Human, Mouse or Drosophila genome and recovery curves (See Figure).

Some of the key features of i-cisTarget are:

over-represented motifs are predicted in the set of co-expressed genes, using entire intergenic and intronic sequences
10 vertebrate spiecies are used for motif scoring in Human and Mouse version, 12 Drosophila species are used in Drosophila version
for motif scoring, the Cluster-Buster algorithm is used; kindly provided by Martin Frith
for cross-species comparisons, the cluster-buster scores of orthologous regions are used independently (e.g. no requirement of base pair alignment of D. melanogaster motifs in the Drosophila version), as described previously here
each significant motif and regulatory track results in an optimal subset of genes that are predicted as direct targets
for the predicted direct targets, i-cisTarget presents the predicted enhancer and binding site locations in the UCSC Genome Browser

A vast library of motifs and in vivo regulatory features was compiled for i-cisTarget. These features are:

9713 position weight matrices (PWM) are used for the Human, Mouse and Drosophila version, from various sources such as Transfac, Jaspar, FlyFactorSurvey, and motif collections from Stark et al., Elemento et al., and Down et al.
4305 regulatory data tracks from Encode and other sources (DHS, FAIRE, TF ChIP-Seq, TF histone marks) for the Human version
565 regulatory data tracks from Mouse Encode for the Mouse version
455 features from the modENCODE project,48 features from the BDTNP project, 33 ChIP based features from Eileen Furlong's lab (Zinzen et al., Nature (2009)), 2 * 30 chromatin states (derived from the S2 and the BG3 cell lines) taken from modENCODE Consortium et al., Science (2010) for the Drosophila version

Help

1. INPUT

The input for i-cisTarget can be:

a set of co-regulated genomic regions, such as TF ChIP-seq or differential open chromatin peaks
a set of co-expressed/co-regulated genes

Analysis of peaks

If we start from peaks/loci, e.g.:

TF ChIP-seq track to find co-factors or to detect false positive peaks that are not enriched by the ChIP'ped motif,
differentially active regions (e.g. H3K27ac) between two groups of samples or before/after treatment to find which TFs bind to the active regions,
open chromatin data (e.g. FAIRE, DHS or ATAC) between two groups to detect which TFs bind there,

then the peaks are mapped to the overlapping candidate i-cisTarget regulatory regions (described below).

Analysis of a gene signature

If we want to start from a set of genes such as a module, a gene signature, top mutated genes, a Gene Ontology category, etc. to find out for instance which upstream TF regulates this certain group of genes or which DHS track is correlated most, then the input genes will be linked to candidate regulatory regions. To do so, all non-coding regions located in the neighbourhood of a gene will be assigned to this gene. These regions include the promoter regions upstream and downstream to the transcription start site (TSS). The “space” around each gene is a parameter and can be selected. For human and mouse version the space 20 kb around TSS is used.

Human candidate regulatory regions

i-cisTarget analysis relies on candidate regulatory regions that we defined using publicly available regulatory data: General Binding Preference models, CpG islands, proximal promoters, conserved non-coding sequences, ultra-conserved elements, regulatory elements from OregAnno, VistaEnhancers, predicted cis-regulatory modules and DNAseI Hypersensitive (DHS) uniform clustered peaks across 125 cell lines from ENCODE.

Table 1. Publicly available regulatory datasets used to create human i-cisTarget candidate regulatory regions

	GBP	CpG	Proximal promoters	CNS	UCR	Oreganno	Vista Enhancers	CRMs	DHS
Number of regions	61550	27718	34722	232101	15931	23112	1339	123500	1281988
% of the genome	1.77	0.73	0.67	2.25	0.13	0.39	0.07	2.05	13.36

All these features were merged. In a first step, regions having an overlap of at least 20% or 80% with insulator elements in the genome or coding exons respectively were removed. Next, regions with an overlap smaller then 20% or 80% with insulators or exons are split and the regions containing the insulator or coding exons were removed. Remaining regions are then filtered based on size and regions < 30bp="" are="" removed.="" finally,="" any="" resulting="" regions="" shorter="" than="" 1000="" bp="" were="" extended="" if="" possible="" to="" 1000="" bp="" in="" a="" direction="" that="" prevents="" overlap="" with="" an="" insulator="" or="" exon.="" the="" complete="" procedure="" of="" creating="" candidate="" regulatory="" regions="" yielded="" 1.223.024="" regions="" (representing="" ~35%="" of="" the="" genome)="" with="" average="" size="" 818="">

Subsequently, all the candidate regulatory regions were scored and ranked for each feature (motifs and regulatory tracks) and ranking databases were created.

2. RECOVERY ANALYSIS

The ranking of the foreground set of the user input (input genes/regions mapped to the i-cisTarget regulatory regions) is considered per each feature (motif or regulatory track) and the Area Under the Curve (AUC) of these “foreground” regions is calculated. The areas for all features are normalized using a Normalized Enrichment Score (NES = (AUC-µ) / σ). Moreover, similar enriched motifs are clustered together using STAMP.

Extra and optional parameters

Region mapping (only when genes are used as input)
the user choose the space around the TSS which should be used for mapping of genes into the candidate regulatory regions (all the regions in the certain space around TSS will be used in the analysis). For the Human and Mouse version only space 20kb around TSS is available now.
Fraction of overlap (only when peaks/regions are used as input)
the user specify the fraction of overlap between the input peak and i-cisTarget candidate regulatory region. E.g. the fraction of overlap 0.4 (default) means, that only the i-cisTarget regions being overlapped at least from 40% by the input peaks will be used in the analysis.
Normalized enrichment score threshold
only the enriched regions with normalized enrichment score higher than the threshold will be shown in the report, as well as the STAMP clustering of similar motifs will be performed only for the motifs with NES above this threshold.
Enrichment analysis
enrichment analysis can be computed either within each database separately (default) when AUC distribution is generated per each database and NES are computed within each database, or over all databases when one AUC distribution is generated across the databases as well as NES.
ROC threshold for AUC calculation
the cut-off for fraction of ranked regions (x-axis) at which to calculate AUC. This measure is then used for comparing all the motifs/tracks.
Threshold for visualization
the cut-off for x-axis of AUC plot. If this is set to 20.000 then the recovery curve will be visualized for the top 20.000 regulatory regions.

3. OUTPUT

When the analysis is finished the results will appear on the webpage or the link to the results will be provided to the user's e-mail.

The report of results includes (see part A of the figure):

Overview of the parameters used in the analysis and statistics, including:
- total number of the features for which ranking was considered across all the databases.
- number of enriched features for the specific NES threshold.
- total number of ranked regions (candidate regulatory regions).
- type of input query - gene symbols, BED file or i-cisTarget regions.
- fraction of mapped input IDs (for genes and i-cisTarget regions).
- overlap of input regions with i-cisTarget regions (when BED file with regions is used as input, a BED file with overlap of input regions with i-cisTarget regions is provided).
- number of i-cisTarget regions considered in the analysis.
- normalized enrichment score (NES) threshold.
- AUC threshold (fraction/number of ranked regions).
- recovery curve threshold (number of regions visualized in the AUC plots).
Plot representing the AUC distribution for each database (each database represented by a different colour). In this case, the 'within each database separately' enrichment analysis was used.
Recovery curves of the best feature for each database (each database is represented by a different colour), as well as the average recovery curve (a thicker curve of the corresponding colour represents the average number of recovered regions across all the features within the database).
A table of the most enriched/correlated regulatory tracks and motifs for the input set of regions, where the features are ranked according to the NES.

The table includes:
Ranking of the feature.
Name of the feature with its description (e.g. possible TFs for the enriched motif).
Normalized enrichment score (NES) value.
Logo of the enriched motif (only for the motif databases).
Recovery curve for the enriched feature.
A link to the list of candidate target regions with their region IDs and genomic coordinates (provided also as a BED file, which can be directly viewed in the UCSC genome browser).
A link to the list of the regions from the input that are ranked among the top 20000 regions for the feature.
Name of the database which contains the enriched feature.
The top ranked enriched features are motifs. Notice that similar motifs are clustered together by the same colour (done by STAMP clustering).
The first regulatory track is DHS on SK-N-MC cell line.
The first TF regulatory track is POLR2A on SK-N-MC cell line.
Subsequent analyses can be performed for the selected enriched features:
- Use candidate target regions as filter and use as input for i-cisTarget analysis again.
  Get candidate target regions from selected features and use then as input for a new i-cisTarget analysis.
- Scan candidate target regions of selected features either for multiple homotypic or heterotypic CRMs:
  The target regions are scanned for the selected motifs.
  A BED file with locations of the CRMs and motifs will be generated. The location of the CRMs and motifs can be viewed directly in the UCSC genome browser by using the BED file as a custom track (see part B of the figure, where the green arrowhead points out the predicted target region upstream the PAX7 gene and red arrowheads point out two regions around the CDYL gene. These regions are enriched with CRMs and motifs for EWSR1-FLI1).
- Create SIF file for the selected features:
  Simple Interaction File (SIF) including names of the selected features, predicted target regions and the closest genes will be generated.
  The user can import the SIF file in Cytoscape where the gene regulatory network can be created. Part C of the figure represents the network generated for 1) the first enriched motif EWSR1-FLI1 (orange node), 2) the first enriched DHS track (green node), 3) the first enriched TF ChIP-seq track (purple node).
  The target genes are the closest genes to the predicted target regions and the edges represent the predicted i-cisTarget regulatory regions. The green and red arrowheads point out the PAX7 and CDYL genes - regulated by EWSR1-FLI1 via the regions represented in the UCSC genome browser screenshots above. Therefore there are two edges to CDYL while there is one edge to PAX7 leading from the EWSR1-FLI1 motif (marked in red). Moreover there are two regions around PAX7 enriched with a DHS track (two edges from the green node).

New features

Compared to the old version of i-cisTarget (2012):

The Human and Mouse version additionally to the Drosophila version.
A collection of candidate regulatory regions was determined for Human and Mouse.
The motif collections were updated and now include nearly 10.000 PWMs.
A large collection of regulatory data was added from ENCODE and Mouse ENCODE.

Table 2. Human regulatory tracks included in the databases

	ENCODE	Epigenomics Roadmap Project	Taipale lab	Aerts lab	∑
DHS	467	390	0	0	857
FAIRE	37	0	0	14	51
Histone ChIP-seq	402	1572	3	26	2003
TF ChIP-seq	1274	0	117	3	1394
∑	2180	1962	120	43	4305

Table 3. Mouse regulatory tracks included in the databases

	ENCODE
DHS	150
Histone ChIP-seq	209
TF ChIP-seq	206
∑	565

Citations

Carl Herrmann, Bram Van de Sande, Delphine Potier, and Stein Aerts i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res 2012 Jun 20; 10.1093/nar/gks543.
Delphine Potier, Zeynep Kalender Atak, Marina Naval Sanchez, Carl Herrmann, and Stein Aerts Using cisTargetX to Predict Transcriptional Targets and Networks in Drosophila Methods Mol Biol 2012; 786, 291–314.
Stein Aerts, Xiao-Jiang Quan, Annelies Claeys, Marina Naval Sanchez, Phillip Tate, Jiekun Yan, and Bassem Hassan Robust target gene discovery through transcriptome perturbations and genome-wide enhancer predictions in Drosophila uncovers a regulatory basis for sensory specification PLoS Biology 2010 Jul 27;8(7):e1000435.

Supported input formats

Report for a single gene signature

Gene signatures must be supplied as a list of gene identifiers, separated by newline characters.

The following IDs are supported:

HGNC gene symbols for Human version	MGI gene symbols for Mouse version	FlyBase gene symbols for Drosophila version
`KLF7 LOX CLDN14 GAD1 HEY1 COL1A1 ZEB2 SNAI2 RUNX1 ETS1`	`Cdk6 Greb1 Runx1 Sox9 Cdh11 Mapk8 Ncam1 Sox2 Tbx1 Rab32`	`pros CG13321 dpr7 mex1 lab toy bi SPoCk dpr7 CG11489`

Batch support for multiple gene signatures

The GMT file format is a tab delimited file format that describes multiple gene sets. In the GMT format each row represents a gene set. Each gene set is described by a name, a description, and the genes in the gene set. These fields are separated by a TAB character; the gene identifiers need to be separated by a semicolon, colon or again a TAB character.

                        Signature1	Description	BCL2;CDH1;ESRRA;GNAL;MITF;MYC;PTEN;VAT1;ZBTB10;TANGO2;POU3F2;HEY2;FAM210B;DCT
                        Signature2	Description	ARTN;COL5A1;EPHA2;KLF7;JUN;BNC1;ELK3;RELN;DOCK2;CAV1;SAMD9;PLAU;E2F7;BCL3;SOX9

ChIP peaks

ChIP peaks must be supplied as BED file entries, specifying the locations of these peaks in the Human, Mouse or Drosophila genome. FASTA file input is not supported.

An example of a BED file:

                        chr1    161871  162031  MACS_peak_1
                        chr1    320427  320573  MACS_peak_2
                        chr1    1477181 1477397 MACS_peak_3
                        chr1    1717532 1717802 MACS_peak_4
                        chr1    2594610 2594771 MACS_peak_5
                        chr1    2600664 2600842 MACS_peak_6
                        chr1    2604404 2604568 MACS_peak_7
                        chr1    2634134 2634219 MACS_peak_8
                        chr1    3323228 3323380 MACS_peak_9
                        chr1    6257663 6257825 MACS_peak_10
                        chr1    8716309 8716377 MACS_peak_11
                        chr1    8921167 8921423 MACS_peak_12

Contact

If you have any question or problem related to i-cisTarget, please, inform us: lcbtools@kuleuven.be

Cite us

If you use i-cisTarget, please cite:

Imrichová,H., Hulselmans,G., Kalender Atak,Z., Potier,D. and Aerts,S. (2015) i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res. doi: 10.1093/nar/gkv395

Herrmann,C., Van de Sande,B., Potier,D. and Aerts,S. (2012) i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res. doi: 10.1093/nar/gks543

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。