ValIdated Systematic IntegratiON of hematopoietic epigenomes

VISION Deliverables and Resources

The VISION project began in summer of 2016. This description and list of deliverables is what we plan to accomplish. Some resources are available now, and links to them are provided on the VISION home page.

Overall approach

The VISION project (ValIdated Systematic IntegratiON of hematopoietic epigenomes) is under consideration for funding via the FOA on Collaborative Interdisciplinary Team Science in NIDDK Research Areas. The problem we address is how to utilize the enormous amounts of emerging epigenetic data effectively both for basic research and precision medicine. We will consolidate hundreds of epigenomic datasets and apply integrative approaches to generate robust candidate functional assignments to DNA segments. These assignments, coupled with gene target predictions and results of genome editing experiments, will be the input to machine-learning approaches that will generate quantitative models for how each candidate CRM contributes to the regulation of its target gene. Importantly, these models will be rigorously tested and validated by targeted genome editing in reference loci, and then applied genome-wide. Furthermore, we will expand resources to enable more accurate translation of regulatory insights between mouse and human.

The data and resources generated in our VISION project are intended to enable better research by a large community of investigators. The deliverables from our project will harvest the truly valuable information within the flood of epigenomic data, and provide the results of the integrative analysis, modeling, and experimental validations in a manner readily used by the larger community. Each member of our investigative team is committed to the goal of building resources to help the larger community find answers to enduring questions in hematopoiesis and accelerate improvements in therapy for hematological disorders.

Thus, we embrace the imperative that the data and resources be released to the public rapidly and in a form that is both understandable and usable by the wider community. Furthermore, we are fully aware of the need for transparency and accuracy at all levels of data acquisition and analysis, from metadata describing each sample analyzed (mouse strain, cell type and how it was isolated, experimental procedures used, sequencing methodology used, etc.) to the pipelines used for mapping and analyzing sequencing reads to integrative analyses. Some of the PIs are heavily involved in the Galaxy project, which not only provides a computational platform enabling sophisticated analysis of large datasets by a wide community, but also is designed to insure transparency and reproducibility in analysis. Several of the PIs are active in the ENCODE projects and have been active in large scale sequencing projects. Deliverables

This project will deliver three categories of information, each supported by web-based resources:

  1. Comprehensive catalogs of cis regulatory modules (CRMs) utilized during hematopoiesis, built by integration of compiled epigenetic, transcriptome, and chromatin interaction data, and validated by extensive experimental tests.
    Specific components include:
    1. The CODEX and SBR-Blood resources with comprehensive expression and epigenetic data across hematopoietic cell types, enhanced with clear metadata, quality metrics, and assessments of reproducibility
    2. New data on EP300 binding, rates of transcription and degradation from nascent RNA-seq approaches, and other features
    3. Predictions of enhancers and other functional cis-acting DNA segments using IDEAS and other integrative approaches
    4. Very deep HiC sequencing data in erythroblasts and HSPCs
    5. The 3D Genome Browser for user-generated views of topological maps in register with genomic and epigenomic data
    6. Experimental results on genetic perturbations of candidate CRMs and their predicted impact on target genes
  2. Quantitative models for gene regulation built by sophisticated machine learning approaches, extensively tested by genome editing approaches in ten reference loci, with predictions applied genome-wide.
    Specific components include:
    1. The IDEAS approach packaged as a computational tool
    2. Segmentations on genomes of hematopoietic cell types in mouse and human, via IDEAS
    3. Modeling tools that use novel methods to generate network models that identify and quantify the impact of specific TFs and CRMs in establishing levels of expression and changes in expression
    4. Data-driven, quantitative models for TFs and CRMs in regulation in specific cell types and in differential expression across cell types
  3. A guide for investigators to translate insights from mouse models to human clinical studies.
    Specific components include:
    1. A database (and query interface) that records, for each orthologous pair of human and mouse genes, the contribution to expression variance from species and from tissue
    2. A database (and query interface) that records, for each human and mouse CRM, its category of epigenomic evolution
    3. Experimental results on genetic perturbations of to test the accuracy of mappings guided by resources 3a and 3b

The modes of delivery are readily accessible, web-based platforms including customized browsers, databases with facile query interfaces, and data-driven on-line tools. The links on the homepage take you to existing interfaces.