BoostDM

BRCA1 BCL7A BAP1 FOXO1 NOTCH2 MAPK1

Snapshots corresponding to gene models for genes that were not previously covered by high-quality models. Local feature explanations of the driver predictions corresponding to the catalog of unique variants observed in the MANE Select regions of interest.

A. BRCA1 (CANCER) is a model driven by nonsense consequence type

B. BCL7A (NHL) is a model driven by the enrichment of variants in the Pfam domain BCL_N.

C. BAP1 (CCRCC) casts a more nuanced signal explained by a mix of consequence type, PhyloP and enrichment in the Pfam domains Peptidase_C12 and UCH_C.

D. FOXO1 (NHL) is a model driven by PhyloP.

E. NOTCH2 (CANCER) is dominated by NMD skipping nonsense mutations, but there are also a few driver mutations explained by linear clustering and splicing consequence type.

F. MAPK1 (CANCER) is driven by 3D clustering.

VHL (RCC) FOXA1 (PRAD) ALK (NBL) GNAQ (UM) EZH2 (LNM)

A. Comparison of VHL (RCC) models from v2024 and v2021. Left: Needle plots represent the distribution of driver (red) and nondriver (gray) observed mutations in v2024 (top) and v2021 (bottom). The internal plots represent (top) the distribution of driver mutation along the protein sequence and (bottom) the values of mutational features used to train both versions of models. The tracks are colored pink for v2024, blue for 2021, and purple if they represent the overlap of the values of the features in both models. Right: comparison of SHAP explanations cast by models v2024 vs v2021 across the set of unique variants observed in v2024. Top horizontal bar represents v2024 (red) and v2021 (pink)

B. Comparison of FOXA1 (PRAD) models from v2024 and v2021. Left: Needle plots represent the distribution of driver (red) and nondriver (gray) observed mutations in v2024 (top) and v2021 (bottom). The internal plots represent (top) the distribution of driver mutation along the protein sequence and (bottom) the values of mutational features used to train both versions of models. The tracks are colored pink for v2024, blue for 2021, and purple if they represent the overlap of the values of the features in both models. Right: Comparison of SHAP explanations cast by models v2024 vs v2021 across the set of unique variants observed in v2024. Top horizontal bar represents v2024 (red) and v2021 (pink).

C. Comparison of ALK (NBL) models from v2024 and v2021. Left: Needle plots represent the distribution of driver (red) and nondriver (gray) observed mutations in v2024 (top) and v2021 (bottom). The internal plots represent (top) the distribution of driver mutation along the protein sequence and (bottom) the values of mutational features used to train both versions of models. The tracks are colored pink for v2024, blue for 2021, and purple if they represent the overlap of the values of the features in both models. Right: comparison of SHAP explanations cast by models v2024 vs v2021 across the set of unique variants observed in v2024. Top horizontal bar represents v2024 (red) and v2021 (pink).

D. Comparison of GNAQ (UM) models from v2024 and v2021. Left: Needle plots represent the distribution of driver (red) and nondriver (gray) observed mutations in v2024 (top) and v2021 (bottom). The internal plots represent (top) the distribution of driver mutation along the protein sequence and (bottom) the values of mutational features used to train both versions of models. The tracks are colored pink for v2024, blue for 2021, and purple if they represent the overlap of the values of the features in both models. Right: Comparison of SHAP explanations cast by models v2024 vs v2021 across the set of unique variants observed in v2024. Top horizontal bar represents v2024 (red) and v2021 (pink).

E. Comparison of EZH2 (LNM) models from v2024 and v2021. Left: Needle plots represent the distribution of driver (red) and nondriver (gray) observed mutations in v2024 (top) and v2021 (bottom). The internal plots represent (top) the distribution of driver mutation along the protein sequence and (bottom) the values of mutational features used to train both versions of models. The tracks are colored pink for v2024, blue for 2021, and purple if they represent the overlap of the values of the features in both models. Right: comparison of SHAP explanations cast by models v2024 vs v2021 across the set of unique variants observed in v2024. Top horizontal bar represents v2024 (red) and v2021 (pink).