Documentation & FAQ
RARe-SOURCE™ - Integrated Bioinformatics Resource for Rare Diseases
NCATS has partnered with the Advanced Biomedical Computational Science group at NCI-Frederick to collect & integrate multiple data sources to develop an integrated bioinformatics resource to address the challenges in rare disease research.
Goals
- Identify bioinformatics databases
- Establish an accessible and searchable resource
- Discover commonalities among inherited disorders
- Advance translational research to improve diagnosis and treatment
Challenges

Data, Data, Everywhere & Nowhere
- Accessibility
- Integration
- Analysis
- Interpretation
- Dissemination
Objectives

- Develop an innovative application and searchable interface for data mining
- Establish tools for analyzing OMICS data from disease cohorts and public/private data sources
- Connect human genotype-phenotype- molecular associations with disease model systems data

Browse Rare Disease Information
RARe-SOURCE™ integrates multiple data sources to provide researchers and visitors the ability to view detailed rare disease information presented in an efficient and easy to navigate layout.
Features:
- Information on rare diseases with genetic etiology
- Disease synonym information
- Publications related to selected rare disease
- Disease IDs with links to other rare disease information sources
- Links to related gene details in RARe-SOURCE

Browse Gene Information
RARe-SOURCE™ integrates multiple data sources to provide researchers and visitors the ability to view detailed gene information presented in an efficient and easy to navigate layout.
Features:
- Information on genes associates with rare diseases
- Details on genomic variants identified in the gene
- Manually curated variant annotations for SLC6A8
- 2-dimensional protein structure provided by ProtVista
- 3-dimensional protein structure provided by MolArt
- Integrated protein feature and variant details
- Publications related to selected gene
- Links to related disease details in RARe-SOURCE

Browse Literature
RARe-SOURCE™ implemented Artificial Intelligence (AI) algorithms for identifying disease and gene mentions in titles or abstracts of published literature.
Features:
- Information on published literature for rare diseases and associated genes
- Integration of primary rare disease names and aliases
- Integration of gene symbols, aliases, descriptions, keywords and other naming conventions
- Integrated searches with rare diseases and their associated genes, so literature where both rare diseases and their associated genes have been mentioned can be easily referenced
- Show trends on the publications over the years
- Display top journals where the literature has been published
- Link outs to PubMed, to access the details on any article

Q: What Rare Diseases does RARe-SOURCE™ provide information on?
A: RARe-SOURCE™ provides details on rare diseases with genetic etiology and
their associated genes. Information on the diseases and their associated
genes is obtained from genetic and rare diseases (GARD) database and in the current
version does not include all
rare diseases with genetic etiology.
Q: How reliable are the literature AI details?
A: RARe-SOURCE™ implements artificial intelligence (AI) algorithms to identify rare
disease and associated gene mentions in the titles and/or abstracts of published
literature. The algorithms do not yet identify all possible mentions and might miss out
some valid results. We are actively working on validating and improving the algorithms
and the search results are expected to get better over time.
The results obtained from the AI algorithms are combined with the information on
different disease and gene aliases to obtain articles where the disease or gene terms
might be mentioned using a different naming convention. Although, this allows
RARe-SOURCE™ to find more articles related to the rare disease or gene of interest, it
also has an increased probability of finding results that might not be relevant. In the
current version, RARe-SOURCE™ leans towards finding more articles than losing any that
might be relevant.
RARe-SOURCE™ automatically prioritizes results within each publication year, so that
the most relevant results are floated to the top. The titles for each of the results are
also prominently displayed and the articles are linked to PubMed, so any of the results
can be quickly reviewed and verified.
Q: How reliable are the variant
details?
A: RARe-SOURCE™ implements a combination of variant data integration and manual
curation for providing genomic variant details. Variants for all genes in
RARe-SOURCE™ are obtained by integrating public variant databases and
annotating them using OpenCravat. Manual curation of published literature is
performed for specific diseases and associated genes. These details are in
the final stages of validation for SLC6A8 (X-linked creatine transporter
deficiency), mid-stage for ASAH1 (Farber disease) and initial stages of
curation for FKTN1 (alpha dystroglycanopathy)
Q: Do I need to request an account or
special access in order to view rare disease information?
A: At this time, RARe-SOURCE™ allows users to access rare disease and gene
information without logging in. In the future, RARe-SOURCE™ may require
certain users to be authenticated to access sensitive data, save dashboard
information or share research with other users of the resource.
A: RARe-SOURCE™ focuses on genotype-phenotype correlations for rare diseases and integrating related data from a variety of databases with the goal of:
- Providing easy access to published literature on rare diseases without having to know and/or search for all associated disease and gene names and aliases.
- Mapping variants on the three-dimensional structure to assist researchers in investigating structure function relationships and the impact of the variant on the protein’s ability to interact with signaling partners.
A: RARe-SOURCE™ allows for data export in multiple formats:
- All Tabular data can be copied to system clipboard or exported in CSV or Microsoft Excel compatible formats.
- 2-D protein structure visualization data can be exported by ProtVista in JSON format.
- 3-D protein structure visualizations can be exported or downloaded in PNG graphical image format.
Q: Can I upload my own Gene or Rare
Disease information to RARe-SOURCE™?
A: Currently, RARe-SOURCE™ integrates data from multiple internal and external
sources that are frequently updated. This ensures that we present the most
up-to-date information on Rare Diseases and Gene Variants. Future plans
include features which will allow users to input their own variant or rare
disease information.