The JSI splice site prediction tool predicts changes in the quality of splice sites at and in close proximity to a site of genetic variation. For this, several scores are calculated, among them scores taking into account not only the overall quality of the known splice motif but also the probability of the respective sequence being present as a known splice site throughout the whole genome. The JSI splice site prediction tool has been trained with approximately 200 000 splice sites, i.e. the known splice sites throughout the whole genome (GRCh37).
Additionally, the MaxEntScan scoring tool for human splice sites has been integrated and the respective scores are displayed alongside the JSI scores for comparison. We would like to thank Gene Yeo for his friendly approval to integrate the MaxEntScan scoring algorithm into our software.
5' spliceSITE scores | |
Score | Total score indicating the overall quality of a (possible) splice site. A positive value predicts a functional splice site. A negative value predicts that this position is not a functional splice site. Negative values below -1000 are shown as "neg.". |
BCS | Base Consensus Score: summed up likelihood for single bases 3 bp upstream to 4 bp downstream of GT |
MCS | Motif Consensus Score: calculation based on the likelihood and frequency of the complete 9mer sequence of the donor splice site (5'<3bp>GT<4bp>3') |
5' MaxEntScan scores | |
ENT | Maximum Entropy Model by MaxEntScan. |
MDD | Maximum Dependence Decomposition Model by MaxEntScan. |
MM | First-order Markov Model by MaxEntScan. |
WMM | Weight Matrix Model by MaxEntScan. |
3' spliceSITE scores | |
Score | Total score indicating the overall quality of a (possible) splice site. A positive value predicts a functional splice site. A negative value predicts that this position is not a functional splice site. Negative values below -1000 are shown as "neg.". |
BCS | Base Consensus Score: summed up likelihood for single bases upstream AG. |
MCS | Motif Consensus Score: Sequence upstream of AG is divided to 6mer oligos. The score is calculated based on the frequency of each 6mer at a given position. |
CAGG | Base Consensus Score, indicates the splice site quality by summing up the likelihood of the occurrence of the bases flanking the AG. |
BP | Branch Point: sequence for branch point should match to yTnAy Branch point must be located in the AGEZ (AG Exception Zone). |
BPPos | Branch Point Position: optimum is 23 bp downstream of 3' splice site. |
PPT | Polypyrimidine tract score, indicates C/T content of the polypyrimidine tract between Branch Point and 3' splice site. |
U2BE | U2 binding energy score, indicates the quality of binding capacity of the branch point motif regarding the U2 splicesosomal unit. |
3' MaxEntScan scores | |
ENT | Maximum Entropy Model by MaxEntScan. |
MM | First-order Markov Model by MaxEntScan. |
WMM | Weight Matrix Model by MaxEntScan. |
Splice Sites were predicted for the wild type genomic reference sequence of several genes.
For every GT and AG in the sequence, the scores for a 5' or 3' splice site were calculated, respectively. A score above zero indicates that the site is predicted to be a splice site, below zero means it is not predicted to be a splice site.
As the authentic and alternative splice sites are known for the gene, the predicted results can then be compared with the actual splice sites.
Additionally, the MaxENT algorithm was tested in the same manner to compare the predicted results. For MaxENT, a score above 3.5 was interpreted as a predicted splice site.
For authentic (including alternative) splice sites, the JSI tool correctly predicted more (usually all)
splice sites than MaxENT. In cases where MaxENT also predicted all authentic splice sites of a gene
correctly, the JSI tool's results were the same.
Hence, the JSI splice site prediction tool has a lower false negative rate.
Inversely, the JSI splice site tool is configured to be more sensitive, resulting in the prediction of more cryptic splice sites with positive scores than MaxENT. Due to the higher detection rate, the possibility to miss authentic splice sites is reduced. The higher number of false positive calls is adjusted afterwards. The scores of each called splice site are compared between the reference and the altered sequence. Thus, calls of non-authentic splice sites with positive scores can easily be filtered out in this step if their score is the same for reference and altered sequence.
Please refer to appendix 1 for the detailed result data of five example genes.
For the second test, the JSI Splice Site Prediction Tool was used to predict the effects of known splicing variants. All obvious splicing variants with alterations of the GT (5') or AG (3') of authentic splice sites are always displayed.
It was then intensely recherched for thoroughly validated splicing variants that are not caused by alterations of the highly conserved GT or AG bases. Compared to the GT or AG-affecting splice sites, fewer validated variants were found. This may be due to the fact that variants not directly affecting the GT or AG of a splice site were often not suspected to be splice-altering and therefore not examined for such an effect, along with the general expense of a thorough evaluation.
The JSI Splice Site Prediction Tool prove to be very accurate when being used to predict effects of
variants not altering the consensus GT or AG.
Especially for 3' splice sites, the prediction was more accurate than the one of MaxENT. This is due to
the fact that the JSI Splice Site Prediction Tool analyses a longer sequence than MaxENT. Whilst MaxENT
only analyses 20 bases upstream of the AG, the JSI Splice Site Prediction Tool includes a wider range of
upstream bases for the additional features of a 3' splice site. Thereby, it is able to detect changes in
the polypyrimidine tract, branch point and AG exception zone.
Moreover, the JSI Splice Site Prediction Tool compares the scores of all possible splice sites surrounding the variant. Thus, it can predict when a new or cryptic splice site becomes a competitor of an authentic splice site.
Furthermore, the tool prove capable of detecting alterations in splice site strength (example 8 in appendix 2). There, the score was decreased due to the variant, but without becoming lower than the scores of surrounding cryptic splice sites. The prediction of this "medium decrease" in score matches the actual effect of a weakened splice site that results in lowered amounts of mRNA transcript.
Appendix 2 contains examples from the list of validated variants that were used to compare predicted and actual effect. To view the detailed scores, also of MaxENT, please click the given link on the gene name, which will open another browser tab with the splice site prediction query for the respective variant.
Splice Sites were predicted for the wild type genomic reference sequence of several genes.
For every GT and AG in the sequence, the scores for a 5' or 3' splice site were calculated, respectively. A score above zero indicates that the site is predicted to be a splice site, below zero means it is not predicted to be a splice site.
As the authentic and alternative splice sites are known for the gene, the predicted results can then be compared with the actual splice sites.
Additionally, the MaxENT algorithm was tested in the same manner to compare the predicted results. For MaxENT, a score above 3.5 was interpreted as a predicted splice site.
Exons: 11
Genomic Sequence: 25722 bp
5' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 10 | 9 |
Falsely predicted authentic SS | 0 | 1* |
Correctly predicted cryptic SS | 1058 | 1124 |
Falsely predicted cryptic SS | 184 | 118 |
3' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 10 | 9 |
Falsely predicted authentic SS | 0 | 1* |
Correctly predicted cryptic SS | 1674 | 1831 |
Falsely predicted cryptic SS | 328 | 171 |
* Details of false predictions for authentic splice sites:
Splice Site Type | Position | JSI SSP Score | MaxENT Score |
5' | IVS6+1 | 1171 | 2.59 |
3' | IVS7-2 | 888 | 3.21 |
Exons: 24
Genomic Sequence: 81189 bp
5' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 23 | 22 |
Falsely predicted authentic SS | 0 | 1* |
Correctly predicted cryptic SS | 3818 | 4006 |
Falsely predicted cryptic SS | 581 | 393 |
3' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 22 | 21 |
Falsely predicted authentic SS | 1* | 2* |
Correctly predicted cryptic SS | 4894 | 5500 |
Falsely predicted cryptic SS | 1219 | 613 |
* Details of false predictions for authentic splice sites:
Splice Site Type | Position | JSI SSP Score | MaxENT Score |
5' | IVS6+1 | 1160 | 3.23 |
3' | IVS1-2 (5'UTR) | -465 | 4.90 |
3' | IVS7-2 | 725 | 2.82 |
3' | IVS13-2 | 797 | 1.93 |
Exons: 51
Genomic Sequence: 18351 bp
5' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 50 | 50 |
Falsely predicted authentic SS | 0 | 0 |
Correctly predicted cryptic SS | 771 | 798 |
Falsely predicted cryptic SS | 89 | 62 |
3' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 50 | 50 |
Falsely predicted authentic SS | 0 | 0 |
Correctly predicted cryptic SS | 981 | 1054 |
Falsely predicted cryptic SS | 216 | 143 |
Exons: 9
Genomic Sequence: 41770 bp
5' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 8 | 7 |
Falsely predicted authentic SS | 0 | 1* |
Correctly predicted cryptic SS | 1965 | 2046 |
Falsely predicted cryptic SS | 271 | 190 |
3' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 8 | 8 |
Falsely predicted authentic SS | 0 | 0 |
Correctly predicted cryptic SS | 2139 | 2475 |
Falsely predicted cryptic SS | 730 | 394 |
* Details of false predictions for authentic splice sites:
Splice Site Type | Position | JSI SSP Score | MaxENT Score |
5' | IVS8+1 | 761 | 1.98 |
Exons: 63
Genomic Sequence: 146619 bp
5' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 59 | 58 |
Falsely predicted authentic SS | 2* | 3* |
Correctly predicted cryptic SS | 7104 | 7327 |
Falsely predicted cryptic SS | 854 | 631 |
3' Splice Sites | JSI SSP | MaxENT |
Correctly predicted authentic SS | 61 | 58 |
Falsely predicted authentic SS | 1* | 4* |
Correctly predicted cryptic SS | 7748 | 8842 |
Falsely predicted cryptic SS | 2432 | 1338 |
* Details of false predictions for authentic splice sites:
Splice Site Type | Position | JSI SSP Score | MaxENT Score |
5' | IVS32+1 | -500 | -2.26 |
5' | IVS35+1 | -207 | 0.90 |
5' | IVS58+1 | 677 | 3.39 |
3' | IVS12-2 | -40 | 2.57 |
3' | IVS33-2 | 676 | 2.46 |
3' | IVS39-2 | 605 | 3.03 |
3' | IVS49-2 | 650 | 1.87 |
The JSI Splice Site Prediction Tool was used to predict the effects of known splicing variants.
# | Gene | HGVS | Correct prediction | JSI SS Prediction summary | Actual Effect | Disease | Reference |
1 | CDKN2A ⇒ | NM_000077 c.457+1G>T |
Possible loss of function for authentic splice site at c.457+1. Score for cryptic Splice Site now highest score at c.384. |
ss abolished | Pancreatic cancer/melanoma syndrome | Mucaki EJ, Shirley BC, Rogan PK: Prediction of Mutant mRNA Splice Isoforms by Information Theory-Based Exon Definition. Hum Mutat. 2013; 34(4): 557–565. | |
2 | BRCA1 ⇒ | NM_007294 c.212+1G>A |
Possible loss of function for authentic splice site at c.212+1, score for cryptic splice site at c.212+13 now highest score. Alternative Splice Site may be activated at c.191l. | ss abolished, cryptic ss 22 nt upstream activated, deletion of 22 nucleotides from exon 4 |
Breast Cancer | Mucaki EJ, Ainsworth P, Rogan PK: Comprehensive prediction of mRNA splicing effects of BRCA1 and BRCA2 variants. Hum Mutat. 2011; 32(7): 735–742. | |
3 | POLH ⇒ | NM_006502 c.490G>T |
Possible loss of function for authentic 5' splice site at c.490+1. (3' splice sitec at c.490+11: Score became positive.) |
ss abolished | Xeroderma pigmentosum,variant type | Iniu H. et al., Xeroderma pigmentosum-variant patients from America, Europe, and Asia. J Invest Dermatol. 2008; 128(8): 2055-2068. | |
4 | PLP1 ⇒ | NM_000533 c.173A>G |
Possible new splice site at c.173 with higher score than authentic splice site at c.191+1. | ss abolished, cryptic ss 19 nt upstream activated, deletion of 19 nucleotides from exon 3 |
Pelizaeus-Merzbacher disease | Bonnet-Dupeyron MN, Combes P, Santander P, et al.: PLP1 splicing abnormalities identified in Pelizaeus-Merzbacher disease and SPG2 fibroblasts are associated with different types of mutations. Hum Mutat. 2008; 29(8): 1028–1036. | |
5 | PLP1 ⇒ | NM_000533 c.454-10A>G |
Score for cryptic splice site at c.454-43 now highest score. Possible new splice site at c.454-11 with
higher score than authentic splice site at c.454-2. Possible loss of function for authentic 3' splice
site at c.454-2. (5' splice site at c.454-14: Score became positive.) |
retention of last 9bp of exon due to cryptic exon 4 skipping, intron retention | Pelizaeus-Merzbacher disease | Bonnet-Dupeyron MN, Combes P, Santander P, et al.: PLP1 splicing abnormalities identified in Pelizaeus-Merzbacher disease and SPG2 fibroblasts are associated with different types of mutations. Hum Mutat. 2008; 29(8): 1028–1036. | |
6 | XPC ⇒ | NM_004628 c.413-9T>A |
Score for cryptic splice site at c.413-61 now highest score. Possible new splice site at c.413-9 with higher score than authentic splice site at c.413-2. Possible loss of function for authentic splice site at c.413-2. | ss abolished, de novo ss created | Xeroderma pigmentosum | Khan SG, Metin A, Gozukara E, et al., Two essential splice lariat branchpoint sequences in one intron in a xeroderma pigmentosum DNA repair gene: mutations result in reduced XPC mRNA levels that correlate with cancer risk. Hum Mol Genet. 2004; 13(3): 343-352. | |
7 | CERS3 ⇒ | NM_001290343 c.609+1G>T |
Score for cryptic splice site at c.540 now highest score. Possible loss of function for authentic splice site at c.609+1. | ss abolished | Autosomal recessive congenital ichthyosis 9 | Radner et al., Mutations in CERS3 cause autosomal recessive congenital ichthyosis in humans. PLoS Genet. 2013 Jun;9(6). | |
8 | SMAD4 ⇒ | NM_005359 c.1448-6T>C |
Medium decrease of score for authentic splice site at c.1448-2 (17.20%) | ss weakened, reduced amount of transcript |
Hereditary pulmonary arterial hypertension | Nasim MT, Ogo T, Ahmed M, et al.: Molecular genetic characterization of SMAD signaling molecules in pulmonary arterial hypertension. Hum Mutat. 2011; 32(12): 1385–1389. | |
9 | ABCA1 ⇒ | NM_005502 c.4465-34A>G |
Possible new splice site at c.4465-35 with higher score than authentic splice site at c.4465-2. | 33bp of intron 31 included in transcript | Tangier disease | Fasano T, Pisciotta L, Bocchi L, et al.: Lysosomal lipase deficiency: molecular characterization of eleven patients with Wolman or cholesteryl ester storage disease. Mol Genet Metab. 2012; 105(3): 450–456 | |
10 | ABCA1 ⇒ | NM_005502 c.1195-27G>A |
Possible new splice site at c.1195-27 with higher score than authentic splice site at c.1195-2. Possible loss of function for authentic splice site at c.1195-2. | 25bp of intron 10 included in transcript | Tangier disease | Fasano T, Pisciotta L, Bocchi L, et al.: Lysosomal lipase deficiency: molecular characterization of eleven patients with Wolman or cholesteryl ester storage disease. Mol Genet Metab. 2012; 105(3): 450–456 | |
11 | CFTR ⇒ | NM_006846 c.1820+53G>A |
Score at c.1820+55 becomes higher than score for authentic splice site at c.1820+1. | retention of 54bp of intron 19 with normal protein expression | Netherton Syndrome | Lacroix M, Lacaze-Buzy L, Furio L, et al.: Clinical expression and new SPINK5 splicing defects in Netherton syndrome: unmasking a frequent founder synonymous mutation and unconventional intronic mutations. J Invest Dermatol. 2012; 132(3 Pt 1): 575–582 | |
12 | SPINK5 ⇒ | NM_006846 c.1820+53G>A |
Score at c.1820+55 becomes higher than score for authentic splice site at c.1820+1. | retention of 54bp of intron 19 with normal protein expression | Netherton Syndrome | Lacroix M, Lacaze-Buzy L, Furio L, et al.: Clinical expression and new SPINK5 splicing defects in Netherton syndrome: unmasking a frequent founder synonymous mutation and unconventional intronic mutations. J Invest Dermatol. 2012; 132(3 Pt 1): 575–582 |