The VISION Project The VISION project conducted a ValIdated Systematic IntegratiON of epigenetic datasets across progenitor and differentiated blood cell types in mouse and human (Heuston et al. 2018, Xiang et al. 2020, Xiang et al. 2024). The project was carried out by an international group of scientists funded by the National Institute of Diabetes, Digestive, and Kidney Diseases of the National Institutes of Health (grant R24DK106766) and with intramural support from the National Human Genome Research Institute. Key products and results of the project can be visualized on the UCSC Genome Browser using this track hub. The project website provides other servers, databases, and data downloads.
Epigenetic states from IDEAS segmentation and annotation The Integrative and Discriminative Epigenome Annotation System (IDEAS; Zhang et al. 2016, Zhang and Hardison 2017) learns a model for the most commonly occurring combinations of epigenetic features (e.g., histone modifications, chromatin accessibility, CTCF occupancy) simultaneously in two dimensions — along chromosomes and across cell types. These common combinations of epigenetic features constitute discrete chromatin states. IDEAS then annotates the epigenome of the cell types by assigning each genomic interval (usually 200bp) to one of the chromatin states based on the similarity to the input epigenetic signals for that cell type. The input epigenetic data are analyzed as continuous variables (they are not binarized) so that the chromatin states can differ in signal intensity as well as composition. The system leverages epigenetic information from locally related cell types when assigning states in cell types with missing data (Zhang and Mahony 2019). Its Bayesian statistical framework allows the incorporation of epigenetic models from different studies and even different species; this latter feature was essential to learning a chromatin state model jointly in both human and mouse (Xiang et al. 2024). Each chromatin state is assigned a distinctive color, with warm colors (yellow and red hues) used for states associated with gene activation and cool colors (blues and grays) for states associated with gene repression, and shades of green for states associated with elongating transcription (a figure with the heatmap summarizing the composition of each state is provided on the Track Settings page for each of the two composite tracks). The major steps in IDEAS modeling are illustrated in Figure 1. The display of the output from IDEAS is a compact image of the chromatin landscape of genomic intervals across cell types. The chromatin state assignments provide a consistent, well-resolved, and informative annotation for each genomic interval.
Figure 1. Major steps in integrative and discriminative modeling of epigenomic signals using IDEAS. A. Gene models in a 100kb region centered on two complement receptor genes (position Chr7:16,190,001-16,290,000 in GRCm38/mm10). B. In four mouse cell types (G1E, MK, NEU, and B cells), the normalized signal for each of the eight epigenomic features was given a distinctive color (burgundy for ATAC-seq, purple for CTCF, red for H3K4me3, yellow for H3K4me1, orange for H3K27ac, green for H3K36me3, blue for H3K27me3, and gray for H3K9me3), and the eight tracks were overlaid for each cell type using the Track Collections tool of the UCSC Genome Browser (Haeussler et al. 2019). C. The grouping of cell types locally, based on their epigenetic profiles, is illustrated by distinctive background colors, mauve background for chromosomal segments that have similar profiles across all cell types and different colors in backgrounds for segments with differing profiles. D. The epigenetic feature profiles that occur most commonly are illustrated for three genomic positions as bar graphs representing the intensity of signal for each of the eight features (each with the distinctive color listed in B). Those combinations of quantitative signals define an epigenetic state, illustrated as a colored square. The epigenetic state at a given position can be constant or different across cell types. E.The frequencies of occurrence of the states at the three genomic positions are illustrated as pie diagrams; the colors in the pie diagrams represent particular states. Panels F, G, and H indicate steps for assigning genomic intervals to epigenetic states in each cell type and giving them informative colors. I. The resulting segmentation for the four cell types at this locus is shown as a track in dense mode for a genome browser.
The IDEAS states and annotation in this super track were generated from the input genome-wide signals for eight epigenetic features (chromatin accessibility from ATAC-seq, CTCF occupancy, and levels of the histone modifications H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, and H3K9me3) in multiple blood cell types, including both multi-lineage progenitor cells and differentiated cells, from both human and mouse (Figure 2). The normalized signals for these epigenetic features in each cell type can be viewed in the super track "Epigenetic signals."
Figure 2: Cell types and data sets used for systematic integration of epigenetic features of blood cells. (A) The tree on the left shows the populations of stem, progenitor, and mature blood cells and cell lines in human. The diagram on the right indicates the epigenetic features and transcriptomes for which genome-wide data sets were generated or collected, with distinctive icons for the major sources of data, specifically the Blueprint project (Martens and Stunnenberg 2013; Stunnenberg et al. 2016), Corces et al. (2016), abbreviated CMB, and St. Jude Children's Research Hospital (SJCRH, Cheng et al. 2021; Qi et al. 2021). (B) Cell types and epigenetic data sets in mouse, diagrammed as for panel A. Sources were described in Xiang et al. (2020). Abbreviations for blood cells and lines are: HSC = hematopoietic stem cell, MPP = multipotent progenitor cell, LMPP = lymphoid-myeloid primed progenitor cell, CMP = common myeloid progenitor cell, MEP = megakaryocyte-erythrocyte progenitor cell, K562 = a human cancer cell line with some features of early megakaryocytic and erythroid cells, HUDEP = immortalized human umbilical cord blood-derived erythroid progenitor cell lines expressing fetal globin genes (HUDEP1) or adult globin genes (HUDEP2), CD34_E = human erythroid cells generated by differentiation from CD34+ blood cells, ERY = erythroblast, RBC = mature red blood cell, MK = megakaryocyte, GMP = granulocyte monocyte progenitor cell, EOS = eosinophil, MON = monocyte, MONp = primary monocyte, MONc = classical monocyte, NEU = neutrophil, CLP = common lymphoid progenitor cell, B = B cell, NK = natural killer cell, TCD4 = CD4+ T cell, TCD8 = CD8+ T cell, LSK = Lin-Sca1+Kit+ cells from mouse bone marrow containing hematopoietic stem and progenitor cells, HPC7 = immortalized mouse cell line capable of differentiation in vitro into more mature myeloid cells, G1E = immortalized mouse cell line blocked in erythroid maturation by a knockout of the Gata1 gene and its subline ER4 that will further differentiate after restoration of Gata1 function in an estrogen inducible manner (Weiss et al. 1997), MEL = murine erythroleukemia cell line that can undergo further maturation upon induction (designated iMEL), CFUE = colony forming unit erythroid, FL = designates ERY derived from fetal liver, BM = designates ERY derived from adult bone marrow, CFUMK = colony forming unit megakaryocyte, iMK = immature megakaryocyte, MK_fl = megakaryocyte derived from fetal liver.
This collection of tracks of Epigenetic states is a super track. It provides access to two sets of results of the IDEAS segmentation and annotation, each of which is a composite track with the state annotations across the genomes of many blood cell types. The first is the result of jointly modeling epigenetic states in human and mouse blood cells. In this case, the states were learned in an iterative process that included input of states learned in one species into the modeling of states in the other species, leading to a 25 state model with the same states in each species (Xiang et al. 2024). The second is the result of IDEAS modeling only in human blood cell types.
In "dense" mode, the display gives a compact view of the epigenetic landscape in each cell type. In "pack" or "full" mode, the genomic intervals are annotated with the number of the chromatin state (see right side of the Figure) in each cell type, which can distinguish between states with similar colors.
Much of the input data were downloaded from the data portal of the BLUEPRINT Project (Adams et al. 2012); all data sources are provided in Xiang et al. 2024. The genome-wide signals for the epigenetic features were normalized across cell types using the S3V2 version of S3norm in the pipeline S3V2-IDEAS (Xiang et al. 2020 and 2021). These normalized data were used as input into IDEAS. The joint modeling across species is described in Xiang et al. 2024.
The data normalization and IDEAS segmentation across species were done by Guanjue Xiang. The data downloads, re-mapping and processing, generation of the tracks displayed, and development of the track hub were done by Belinda Giardine.
Adams D, Altucci L, Antonarakis SE, Ballesteros J, Beck S, Bird A, Bock C, Boehm B, Campo E, Caricasole A, Dahl F, Dermitzakis ET, Enver T, Esteller M, Estivill X, Ferguson-Smith A, Fitzgibbon J, Flicek P, Giehl C, Graf T, Grosveld F, Guigo R, Gut I, Helin K, Jarvius J, Küppers R, Lehrach H, Lengauer T, LernmarkA, Leslie D, Loeffler M, Macintyre E, Mai A, Martens JH, Minucci S, Ouwehand WH, Pelicci PG, Pendeville H, Porse B, Rakyan V, Reik W, Schrappe M, Schübeler D, Seifert M, Siebert R, Simmons D, Soranzo N, Spicuglia S, Stratton M, Stunnenberg HG, Tanay A, Torrents D, Valencia A, Vellenga E, Vingron M, Walter J, Willcocks S. BLUEPRINT to decode the epigenetic signature written in blood. Nat Biotechnol. 2012 Mar 7;30(3):224-6. doi: 10.1038/nbt.2153. PMID: 22398613.
Cheng L, Li Y, Qi Q, Xu P, Feng R, Palmer L, Chen J, Wu R, Yee T, Zhang J, Yao Y, Sharma A, Hardison RC, Weiss MJ, Cheng Y. Single-nucleotide-level mapping of DNA regulatory elements that control fetal hemoglobin expression. Nat Genet.2021 Jun;53(6):869-880. doi: 10.1038/s41588-021-00861-8. Epub 2021 May 6. PMID:33958780; PMCID: PMC8628368.
Corces MR, Buenrostro JD, Wu B, Greenside PG, Chan SM, Koenig JL, Snyder MP, Pritchard JK, Kundaje A, Greenleaf WJ, Majeti R, Chang HY. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat Genet. 2016 Oct;48(10):1193-203. doi: 10.1038/ng.3646. Epub 2016 Aug 15. PMID: 27526324; PMCID: PMC5042844.
Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, Lee CM, Lee BT, Hinrichs AS, Gonzalez JN, Gibson D, Diekhans M, Clawson H, Casper J, Barber GP, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 2019 Jan 8;47(D1):D853-D858. doi: 10.1093/nar/gky1095. PMID: 30407534; PMCID: PMC6323953.
Heuston EF, Keller CA, Lichtenberg J, Giardine B, Anderson SM; NIH Intramural Sequencing Center; Hardison RC, Bodine DM. Establishment of regulatory elements during erythro-megakaryopoiesis identifies hematopoietic lineage-commitment points. Epigenetics Chromatin. 2018 May 28;11(1):22. PMID: 29807547; PMCID: PMC5971425.
Martens JH, Stunnenberg HG. BLUEPRINT: mapping human blood cell epigenomes. Haematologica. 2013 Oct;98(10):1487-9. doi: 10.3324/haematol.2013.094243. PMID: 24091925; PMCID: PMC3789449.
Qi Q, Cheng L, Tang X, He Y, Li Y, Yee T, Shrestha D, Feng R, Xu P, Zhou X, Pruett-Miller S, Hardison RC, Weiss MJ, Cheng Y. Dynamic CTCF binding directly mediates interactions among cis-regulatory elements essential for hematopoiesis. Blood. 2021 Mar 11;137(10):1327-1339. doi: 10.1182/blood.2020005780. PMID: 33512425; PMCID: PMC7955410.
Stunnenberg HG; International Human Epigenome Consortium; Hirst M. The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery. Cell. 2016 Dec 15;167(7):1897. doi: 10.1016/j.cell.2016.12.002. Erratum for: Cell. 2016 Nov 17;167(5):1145-1149. doi: 10.1016/j.cell.2016.11.007. PMID: 27984737.
Xiang G, Keller CA, Heuston E, Giardine BM, An L, Wixom AQ, Miller A, Cockburn A, Sauria MEG, Weaver K, Lichtenberg J, Göttgens B, Li Q, Bodine D, Mahony S, Taylor J, Blobel GA, Weiss MJ, Cheng Y, Yue F, Hughes J, Higgs DR, Zhang Y, Hardison RC. An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. Genome Res. 2020 Mar;30(3):472-484. PMID: 32132109; PMCID: PMC7111515.
Xiang G, Keller CA, Giardine B, An L, Li Q, Zhang Y, Hardison RC. S3norm: simultaneous normalization of sequencing depth and signal-to-noise ratio in epigenomic data. Nucleic Acids Res. 2020 May 7;48(8):e43. doi:10.1093/nar/gkaa105. PMID: 32086521; PMCID: PMC7192629.
Xiang G, Giardine BM, Mahony S, Zhang Y, Hardison RC. S3V2-IDEAS: a package for normalizing, denoising and integrating epigenomic datasets across different cell types. Bioinformatics. 2021 Sep 29;37(18):3011-3013. doi:10.1093/bioinformatics/btab148. PMID: 33681991; PMCID: PMC8479670.
Xiang G, He X, Giardine BM, Isaac KJ, Taylor DJ, McCoy RC, Jansen C, Keller CA, Wixom AQ, Cockburn A, Miller A, Qi Q, He Y, Li Y, Lichtenberg J, Heuston EF, Anderson SM, Luan J, Vermunt MW, Yue F, Sauria MEG, Schatz MC, Taylor J, Göttgens B, Hughes JR, Higgs DR, Weiss MJ, Cheng Y, Blobel GA, Bodine DM, Zhang Y, Li Q, Mahony S, Hardison RC. Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes. Genome Res. 2024 Aug 20;34(7):1089-1105. PMID: 38951027; PMCID: PMC11368181.
Zhang Y, An L, Yue F, Hardison RC. Jointly characterizing epigenetic dynamics across multiple human cell types. Nucleic Acids Res. 2016 Aug 19;44(14):6721-31. doi: 10.1093/nar/gkw278. Epub 2016 Apr 19. PMID: 27095202; PMCID: PMC5772166.
Zhang Y, Hardison RC. Accurate and reproducible functional maps in 127 human cell types via 2D genome segmentation. Nucleic Acids Res. 2017 Sep 29;45(17):9823-9836. doi: 10.1093/nar/gkx659. PMID: 28973456; PMCID: PMC5622376.
Zhang Y, Mahony S. Direct prediction of regulatory elements from partial data without imputation. PLoS Comput Biol. 2019 Nov 4;15(11):e1007399. doi: 10.1371/journal.pcbi.1007399. PMID: 31682602; PMCID: PMC6855516.
These data are available for use without restrictions.