Show figure caption
Left: In silico saturation mutagenesis of all coding variants represented in their relative
position of the gene coding sequence. Observed mutation frequencies (top track) are represented with
colors indicating whether the mutations are predicted as drivers or passengers. For each possible mutation
in the gene, boostDM computes a score reflecting the likelihood that the mutation is a driver (second track).
Potential driver mutations appear in red and potential passenger mutations in gray. Driver mutations are
represented for each consequence type and each tier of confidence (high/low confidence) that the mutation
is a driver (third track). Relevant functional protein domains are represented within
each gene body. The feature values used for the learning are represented at the bottom.
Right: For random subsets of samples taken from the tumor type cohort, the number of unique mutations
mapping to the gene can be counted. The bending of the best inverse exponential fitting curve to the
subsampling data is informative of how close the current pool of mutations is from representing all the
possible mutations in this gene and tumor type context. From the bending of the curve we can derive a
so-called discovery index, the higher (closer to 1) the more the curve bends.