The Grid template

Proteomic Query Submission

Human Proteome Version:
Protein Ensembl/UniProt ID: (e.g. ENSP00000000233)
Sequence Position: (e.g. 18)
Variant Allele:
 
(Click Here) Plain Version(Click Here) Query by Genomic Location
 
 

For batch query, please contact us via email. Heavy users, without any permission, will be blocked until further notices.

Overview

The recent advances in genome sequencing have revealed an abundance of nonsynonymous polymorphisms among human individuals; subsequently it is of immense interest and importance to predict whether such substitutions are functional neutral or have deleterious effects. The accuracy of such prediction algorithms depends on the quality of the multiple sequence alignment, which is used to infer how an amino acid substitution is tolerated at a given position. Due to the scarcity of orthologous protein sequences in the past, the existing prediction algorithms all include sequences of protein paralogs in the alignment, which can dilute the conservation signal and affect prediction accuracy. However we believe that, with the sequencing of a large number of mammalian genomes, it is now feasible to include only protein orthologs in the alignment and improve the prediction performance.

We have developed a novel prediction algorithm, named SNPdryad, which only include protein orthologs in building a multiple sequence alignment. Among many other innovations, SNPdryad uses different conservation scoring schemes with Random Forest as a classifier. We have tested SNPdryad on two human divergence and diversity datasets: HumDiv, HumVar. We found that SNPdryad consistently outperformed other existing methods in several performance metrics, which is attributed to the exclusion of paralogous sequence. We have run SNPdryad on the complete human proteome, generating prediction scores for all the possible amino acid substitutions.

FAQ

  1. What is Deleterious Prediction Score (DPS) of SNPdryad ?
  2. The SNPdryad DPS for a non-synonmous SNP (nsSNP) is the weighted vote aggregated from the 100 decision trees in the Random Forest model trained on the HumDiv dataset. It has been normalized to the range [0,1]. The higher the DPS, the more deterious is the nsSNP.
  3. Is there any demo query ?
  4. By default, this webpage has already initialised example data into the query form (i.e. 18th position of ENSP00000000233). You can just click the 'submit' button for a demo. In particular, this webpage has been carefully designed to be single-column (for mobile devices) and dynamic (updates results without any page refresh). Once a result has been returned, you may click on the protein sequence for interactive queries.
  5. Why my query is so long / never stops ?
  6. Please kindly check if your sequence position is larger than the actual sequence length of the protein you have indicated by the Ensembl/UniProt ID. If the problem still persists, you are advised to use the plain query version which does not have too many complicated web technology. If you still encounters the problem in the plain version, then our last resort is to advise you to download the raw predictions in the "Batch Downloads" section and parse the results.
  7. How does SNPdryad work ?
  8. Given a nsSNP input: (1) SNPdryad extracts the input-nsSNP-contraining protein sequence as well as its orthologous sequences from mammals (computed by Inparanoid). (2) MUSCLE alignment program is used to align the sequences. (3) PhyML is used to build a phylogenetic tree from the sequence alignment profile. (4) SNPdryad builds features from the input-nsSNP-containing column of the alignment profile and the phylogenetic tree. (5) SNPdryad inputs the features into the Random Forest model (trianed on HumDiv) and get the deleterious prediction score (DPS) for the input nsSNP.
  9. More questions ?
  10. Please email Ka-Chun Wong.

Citation

Ka-Chun Wong and Zhaolei Zhang: SNPdryad: Predicting Deleterious Non-synonymous human SNPs Using Only Orthologous Protein Sequences. Bioioformatics 2014, 30 (8): 1112-1119