Preclinical informatics: HDinHD
In our role as a collaborative enabler, CHDI has made a concerted effort to disseminate both preclinical and clinical data to the wider HD scientific community. In support of preclinical research, CHDI has deposited primary data, much of it unpublished, in community databases and developed a website, Huntington’s Disease in High-Definition (HDinHD) to:
- provide HD-related primary scientific data;
- share analyses and computational models derived from such data;
- provide browsing and data interrogation tools that facilitate data exploration and hypothesis generation; and
- establish a forum for HD researchers to highlight their data, tools, know-how and insight to the community.
CHDI continues to submit a substantial dataset of gene and protein expression data, across a number of tissues and ages, primarily from the Mouse Htt Allelic Series project. The data is deposited into databases maintained at the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI). Gene expression data can be found at NCBI’s Gene Expression Omnibus (GEO). Protein expression data can be found at EBI’s PRoteomics IDEntifications (PRIDE) database.
The Mouse Htt Allelic Series Project—a collaboration between CHDI, Massachusetts General Hospital and PsychoGenics—is a longitudinal study that has generated a coherent dataset from a series of mHtt knock-in mice (Langfelder et al., 2016) with increasingly long CAG repeats (ranging from 18 to ~175 CAGs), thereby coding for increasingly longer polyglutamine stretches within the mHtt protein. As in other triplet repeat diseases, the age of onset of HD is inversely correlated to the length of repeat expansion. By studying mice with varying CAG repeat lengths at different ages, we are looking to identify disease-related changes that correlate with length of disease-causing CAG repeat expansion. By 2016, both processed and raw next-generation sequencing data from roughly 2,800 mouse central and peripheral tissue samples were deposited into NCBI’s GEO and Sequence Read Archive (SRA). Similarly, mass-spectrometry proteomics data from mouse striatal tissue was deposited into PRIDE. In 2017, proteomics data from 4 additional tissues from Htt allelic series mice were deposited into PRIDE. Currently, PRIDE contains proteomics data from around 970 tissue samples.
Currently, HDinHD highlights the availability of the Htt allelic series data, provides additional processed proteomics data not distributed through PRIDE, and hosts Htt allelic series mouse behavioral data generated by PsychoGenics (since there is currently no best-practice community repository for such data). Registered users can also find a master sample annotation report that provides key meta-data for all tissue samples and maps all transcriptomics, proteomics and behavioral results back to individual Htt allelic series mice. This report provides context, allowing researchers to perform their own integrative analyses over molecular and behavioral results from Htt Allelic Series mice.
HDinHD describes and distributes causal models and model simulation results developed by GNS Healthcare on multi-modal data generated from Htt Allelic Series mice. Datasets are distributed in several different formats to be compatible with best-practice open-source life science toolkits, enabling users to interrogate, visualize and explore networks, pathways and other biological data.
In light of growing evidence from human genome-wide association studies in HD gene-expansion carriers that DNA damage pathway genes contribute to modifying aspects of the disease (Lee et al., 2015), HDinHD provides a comprehensive literature review, gene lists, and visual and computable pathway maps corresponding to four pathways of the DNA damage response mechanism: base excision repair, nuclear excision repair, mismatch repair, inter-strand crosslink repair.
HDinHD also provides access several web-based tools:
- Huntingtin Interactome (HINT), a database and query tool that allows interrogation of >1,800 unique proteins that interact with the huntingtin protein. This collection, curated from >100 publications, is a substantial compilation of huntingtin interactors.
- REPAIR, a gene expression query tool operating over >280 HD-related gene expression studies.
- HD Proteome Base, a query and visualization tool allowing interrogation of longitudinal proteomics profiling results from the Mouse Htt Allelic Series Project. The user-friendly web portal allows the researcher to query for proteins and to visualize their expression across the CAG repeat-length series and across different brain and peripheral tissues.
- ASViewer, a visualization tool highlighting longitudinal transcriptomic and proteomic expression across brain and peripheral tissues.
- HTT Protein Viewer, a visualization tool depicting sequence-based features (e.g. post-translational modifications, single-nucleotide polymorphisms, etc.) along the huntingtin protein.
HDinHD also links out to several other HD research websites, including:
- Genetic Modifiers of Huntington’s Disease (GeM-HD) Consortium website, which provides results from human genome-wide association studies looking to identify genetic modifiers of HD.
- BioGemix 3D, a database of polyglutamine- and age-dependent behavior of gene expression in Htt Allelic Series mice.
There are several items under development that will be released on HDinHD during 2Q2018, including:
- HDSigDB, a curated compilation of HD and HD-related gene sets that provide HD context to standard gene set enrichment analyses.
- Proteomics data generated from heart and skeletal muscle of Htt Allelic Series mice.
The HDinHD website is a collaboration between CHDI and Giovanni Coppola at UCLA.
Frequently Asked Questions
Do I need to register for an account to use HDinHD?
Yes, you must register to access the full site.
Does CHDI make all of its data public?
No, see CHDI’s Data, Reagents, and Biomaterials Sharing Policy.
Our lab has produced data and tools that would be useful to the HD research community. Can I contribute these data and tools to HDinHD?
I have some suggestions on how to improve HDinHD, how do I share these with you?
Some of the datasets available on HDinHD are complex. Is there someone we can speak to at CHDI to help provide further background and information?
Yes, email us at HDinHD@chdifoundation.org and we’ll be happy to help.
There is a lot of material on HDinHD. How can I tell what is new since I last looked?
Substantial new features, both data and tools, are highlighted on the New in HDinHD page.
Suggested further reading
- Langfelder P et al. MicroRNA signatures of endogenous Huntingtin CAG repeat expansion in mice. PLoS One (2018) 13:e0190550.
- Langfelder P et al. Integrated genomics and proteomics define huntingtin CAG length-dependent networks in mice. Nat Neurosci. (2016) 19:623
- Alexandrov V et al. Large-scale phenome analysis defines a behavioral signature for Huntington’s disease genotype in mice. Nat Biotechnol. (2016) 34:838
- Lee JM et al. Identification of genetic factors that modify clinical onset of Huntington’s disease. Cell (2015) 162:516